In 2019 Lyrebird AI was bought by Descript (a product that makes podcast editing ridiculously simple) and became the AI research division in the company.

Before joining Descript, Lyrebird’s technology allowed for the replication of human voice for use in a TTS (Text-to-Speech) interface and an API (Application Program Interface). Now their tech has been repurposed as Descript’s new Overdub feature.

Lyrebird’s tech goes hand-in-hand with Descript’s product perfectly, but what about Lyrebird‘s pre-existing customers who used their Text-to-Speech product for creative and software projects?

Let’s go through the options available with their pros and cons.

Before we dive in, full disclaimer, I am not a developer. While this article is written from a creative's perspective, I will lightly touch on all aspects of the product to help any developers out there.


One could be forgiven for thinking there are not many options when it comes to Text-to-Speech AI voice products; however, there is quite a number.

The big players

Companies like Google, Amazon and Microsoft have vast resources so it should come as no surprise that both have their own TTS products available for use.

Amazon Polly

Currently, Amazon Polly comes with a standard AWS account. AWS has a vast amount of services, however, I found Polly easily enough using the search feature.

Amazon Polly's simple, user-friendly interface.

Pros

  1. Easy to use, simple interface.
  2. High-quality sounding voices.
  3. Lexicons (allows you to customise the pronunciation of words)
  4. Multiple languages
  5. API
  6. SSML (Speech Synthesis Markup Language)

Cons

  1. Can’t save individual projects
  2. You must to be a developer to leverage speech controls (volume, speed, pitch etc) via the API.
  3. Cannot create your own custom voice*

*Unless you want to pay the big bucks for a brand voice and plan to implement it using an API.


Google Cloud Text-to-Speech

Google Cloud Text-to-Speech is free to try and it's easy to sign up, however, I had some initial trouble finding where to access the TTS interface once logged in. I even used the search to no avail.

I then tried 'Text-to-speech' as my search query and found what I was looking for.

Come on Google Cloud, where's your correction suggestion like your Google's search engine?

Google Cloud’s "Text-to-Speech" interface

Interestingly, on the Google Cloud Text-to-Speech landing page there is a user-friendly UI that allows you to type text, choose a speaker and listen to it.

I wonder why they decided not to put this in the product...

Anyway, let's go through the pros and cons.

Pros

  1. High-quality sounding voices
  2. Can save multiple projects
  3. Multiple languages
  4. API
  5. SSML

Cons

  1. No speech controls (volume, speed, pitch etc) without accessing the API using Speech Synthesis Markup Language (SSML).
  2. Can't create your own custom voice - unless you want to pay the big bucks for a brand voice and plan to implement it using an API.
  3. No user-friendly UI which makes it difficult to get started if you're not a developer.
  4. Cannot create your own custom voice*

*I'll admit that I'm not 100% certain if it's possible to create your own custom voice. After searching I could not find any easy method to do so.


Microsoft Azure

Not one to be left out, Microsoft Azure has also jumped on board the AI voice train with their own TTS feature which comes with a free account.

You'll need to search 'Text-to-speech' in order to find what you're looking for. I found it under 'Cognitive Services'.

So the next thing I see is this Cognitive Services page. Naturally, I click 'Create cognitive service'.

This takes me to a Marketplace. So I assume their TTS feature must be in here somewhere and I start looking.

TTS where you at?

I can't seem to find any reference to Text-to-Speech. So I searched in the 'Search the Marketplace' field and, no results were appearing. Confused, I then searched 'Text-to-Speech' again in the main search field at the top of the screen and noticed links to documentation. One of which referred to the API.

After reading through this page in the Azure documentation it would appear that in order to use the TTS feature you must be a developer and know how to plug into the Azure API 😔

Okay, so what about creating my own custom AI voice? This wasn't very clear as to how or where to do this so I searched the documentation. I did find where to initiate the process but each time I tried I got some error message that might as well be in another language as far as I'm concerned.

I then tried the 'Create a subscription' button, filled out the form and was able to proceed to create my custom voice, or so I thought.

The next screen I was faced with was just as confusing.

"Your deployment is underway".

Cool... but what does that mean?

Okay, I know I'm a naive designer but all I want is an interface to play with. Is that such a big ask? Let me in and let me use your product straight away please 🙂

Having gone as far as I could as a non-developer, let's go through the pros and cons.

Pros

  1. High-quality sounding voices*
  2. It would appear you can save multiple projects
  3. Multiple languages
  4. API
  5. SSML

*Judging by the samples on the Azure Text-to-Speech landing pages, these voices are of a high quality.

Cons

  1. Requires you to provide and verify your phone number
  2. Requires a credit card (even for the free trial)
  3. Free account is only available to 'New' accounts. If you don't fall under this category then you'll have to upgrade in order to use Azure's TTS.
  4. No speech controls (volume, speed, pitch etc) without accessing the API using Speech Synthesis Markup Language (SSML).
  5. Appears that you can create your own custom voice but for non-technical people this may be difficult.
  6. No user-friendly UI which makes it difficult to get started if you're not a developer.

The startups

Okay, now let's take look at the startups. Because there are many startups in the space of AI Voice I'm going to throw them all into one pot and talk generally about their pros and cons.

Therefore, this does not mean that each pro and con applies to all startups so you will need to try each product out for yourself.

Pros

Most have:

  1. An easy sign-up process with free trials.
  2. A simple user interface that allows you to get started right away. 
  3. Speech controls that you can access directly in the user interface without their API.
  4. The ability to customise your own voice-this means you can record your voice or your actor's voice for use in your projects.
  5. An API
  6. SSML

Cons

  1. Some have lower quality voices
  2. Not all of them have an API or SSML
  3. Some are still in closed beta

Replica's TTS interface

Much like most of the other startups, Replica has a user-friendly TTS interface that allows you to quickly experiment with voices, export them and use them in your projects.

Another great feature Replica has is the ability to craft a scene's dialog and listen to it as a whole with our script prototyping UI. Curiously, this is a feature most other products do not have.

Replica's Text-to-Speech user interface

Summary

To summarise, there are pros and cons, no matter which product you choose. Choosing the product that's right for you depends on two factors.

1. What kind of project do you need AI voice for?

  • Is your project one that requires an API to create a dynamic voice experience? 
  • Do you need an ultra-realistic voice that no one can tell isn't real?
  • Do you need narration or a character voice for fictional content?
  • Do you need to replicate your voice or your actor's voice?

2. What is your skillset?

As mentioned earlier, I'm not a developer. However, I've tried to keep this article non-biased towards creatives and at the very least, analysed the pros and cons regardless of your skillset. With that said, the narrative I must take is that of a non-technical person.

Because I'm not a developer, this means the projects that I create are of a more creative nature, so I'm looking for an interface, not an API. I want instant gratification, the ability to experiment with a script, write out a scene, export audio and tweak in my DAW (Digital Audio Workstation).

Here's some examples of the type of projects I've been creating with Replica.

I created these projects with Replica using standard voices that come with the product, and I sourced the foley from freesounds.org. Some additional background voices and sounds my partner and I did ourselves.

What's that? You are a developer?

I was hoping you'd say this because Replica has an API as well. We're currently working on SSML which should be ready by the end of the second quarter of 2020. In the meantime, why not sign up for a free account and take her for a spin to hear what's possible?

Get started for free

Whether you're a developer or creative, if you're looking for AI Voice for your project, you'll find what you need at replicastudios.com