Amazon has introduced a service create a personalized speech synthesis system

Amazon has launched a service of creation of systems of speech synthesis with the voice of a specific person on the basis of samples of his speech. Amazon offers to use the service brands that are associated with a particular person or way. For example, she created the KFC algorithm for speech synthesis Colonel Sanders.

Development of algorithms for sound synthesis, such as WaveNet, attracted to this region the attention of researchers and companies, with the result that in recent years there are many voice assistants and systems of speech synthesis, which developers can use in their applications. But almost always, the TTS system from one company can speak with one or at most a few votes, and they usually do not belong to famous people. There are exceptions, for example, the voice of John legend in Google Assistant, but in General, while major developers of voice assistants and systems of speech synthesis, until recently, did not allow to create an algorithm that talking voice of a particular person.

Amazon, which already provides application developers with a service Polly for speech synthesis in different languages and with different voices, launched in the framework of the service function of creating a custom voice. The service is available in the form of voice skills voice assistant, Alexa, and a separate API receive text and issuing a file with the audio recording, which can be used in any way.

Primarily it is aimed at companies that want to use their services well-known voice representative of the brand. As an example, Amazon showed the result of working with KFC, which for its canadian branch created voice model of the symbol of the company of Colonel Sanders:

There has to be audio, but something went wrong.

The company did not disclose the cost and the details of the operation of the service, but, probably, it is based on the algorithm described in the article, employees of Amazon in 2019. The algorithm takes the data of a particular person and adds them to the generalized neural network model trained on other data. As a result, training the model requires much fewer samples of speech than other approaches, but the quality of the synthesis is high.

While one of the most realistic and large-scale systems used for speech synthesis remains Google Duplex. This function works in the US and New Zealand, and allows you to book a table at the restaurant or take another action, asking Google Assistant. After this, the algorithm automatically finds the right information, including phone call and inform the user about the result. The system was so realistic. after launching Google had to teach her in the beginning of the call to clarify that geworfenheit, and not man.

Leave a Reply

Your email address will not be published.