Yandex.The interpreter mastered the Chuvash language

The service Yandex.The translator added another small language — Chuvash. To do this, the developers had to collect a small parallel corpus of more than 250 thousands of examples of phrases in the Chuvash andn languages, to train the neural network on it, and then to add to the translation system putyrsky model that takes into account similarities between other Turkic languages, and synthetic examples for training. Details about the algorithm can be read in article on habré.

Classical machine translation based on statistical models requires a large amount of data — parallel corpus which contains original and translated into the language of the source texts. This approach, however, do not always: for many languages, including the so-called small (the languages of small Nations), the data quality for statistical translation is not enough.

In order to circumvent this limitation, developers use different ways. For example, in 2018 Facebook taught machine translation to do without parallel corpora in General: this translation works thanks to a vector representation of words in unrelated texts. Another option is to use information from related but more the frequency of languages: this approach successfully uses Yandex in the translation, e.g., English into Uzbek through one additional step — the translation into Turkish, which belongs to the Turkic group of languages (more about this you can read in our article “Run all”).

The same approach, the service decided to use for translation Chuvash. To begin with, however, the developers have collected a relatively small parallel corpus of 250 thousand phrases in our countryn with translation into Chuvash: for these data, the trained neural network translation model, and then hooked her pan-Turkic model that learns to translate from English to several Turkic languages, including Tatar, Kirghiz, Bashkir and Azerbaijan.

Additionally, the developers used the synthetic examples of translatedn into Chuvash: it is the system learned the correct reverse translation from Chuvash ton, which (in the case that translations into Chuvash were qualitative) is able to distinguish the correct models of language matching and the order of words in a sentence.

Translations into Chuvash and back in service is available for all 97 languages.

According to 2010, the number of native Chuvash language in our country — just over a million people, while the language itself has the status of vulnerable. Other languages of small peoples of you can read in our series “Languages of”.

Leave a Reply

Your email address will not be published.