What methods are used by statistical translators

Machine translation for translators

Tip 3: understand machine translation

As mentioned earlier, you can position yourself as a post-editing expert by knowing as much as possible about machine translation. You should know how the various systems work and how companies can prepare texts for machine translation in advance.

Currently, neural machine translation is increasingly used, but it still makes sense to be familiar with the basic principles of rule-based and statistical systems.

Rules-based machine translation

The rule-based approach is the classic MT method. It is very costly and time-consuming to develop a system based on rules, since every linguistic characteristic has to be recorded manually. Therefore, this approach is used less and less nowadays. Nevertheless, it should be mentioned that rule-based machine translation provides good terminological suggestions, as the system is specially trained with company terminology. In addition, the translations are always complete and the results are predictable. The big disadvantage is that the translations are very mechanical and the sentence order is not well represented.

Statistical machine translation

Statistical machine translation is based on the approach of creating translations based on the calculation of probabilities. The information required for this is extracted from bilingual corpora. Since sentence structures and terminology are different in each corpus, this is reflected in a lack of consistency in the output, which can impair the flow of reading. The system can also produce incomplete translations or incorrectly add information. The system also produces capitalization and spelling errors.

Neural machine translation

Neural machine translation is based on an artificial neural network (ANN) that mimics the neural connections in the brain. Parallel corpora are also analyzed for translation. The difference is that in the ANN the grammatical connections of the sentences are implicitly included. The texts are not translated on phrase level, but on sentence level, which greatly improves readability. The biggest challenge of neural machine translation so far is the still limited vocabulary that can be processed in the models (currently between 50,000 and 80,000 words). In post-editing, you should pay more attention to the lexicon than to the grammar. The disadvantages of the system are the same as those of statistical machine translation. However, translations generated with NMÜ read extremely fluently and that can be problematic because errors are overlooked more quickly. However, neural machine translation currently delivers the best results, so in most cases the effort to control the text more at the lexical level is worthwhile.