During audiozone often have a short interruption due to which the interlocutor’s speech seem choppy and unnatural. Google has developed a neural network algorithm that analyzes the latest snippets of speech and fills the pause realistic synthetic voice. The company several months have tested this feature on smartphones Pixel 4, and now make it available on other models, reported in the blog Google AI.
During audiozone via the Internet the signal can overcome many networks in different countries. This leads to the fact that even if high-quality compression algorithms and matching of fragments of sound in the service, the final connection quality may be low because one or more intermediate nodes of the packages is lost.
This problem is known for a long time, so almost all the programs one could apply some algorithm masking packet loss (PLC). Typically, the PLC algorithm repeats the last recorded fragment, or is a new sound with the basic characteristics of the movie, and the short pauses of the scale of 10-20 milliseconds, this may give acceptable quality. But if you lost more packets and pause increased to several tens of milliseconds, the algorithm becomes clearly visible.
The developers under the direction of Olga Sharonova (Olga Sharonova) from Google and Tom Walters (Tom Walters) of DeepMind have created an algorithm that is able to create realistic replacement for lost packets, even if the pause lasts several tens of milliseconds. The algorithm is based on neural networks for sound synthesis WaveRNNcreated by the developers of these companies in 2018.
New WaveNetEQ algorithm consists of two main parts: Avtandilovich and conditional network. Contingent network is responsible for preserving the prosody of the voice and analyzes the spectrogram of the last few hundred milliseconds before a pause. Avtandilovich is responsible for the synthesis of sound and gets a small last piece of a few tens of milliseconds, and data from conventional networks.