Scientists have found that the speech recognition system of the largest companies handle the speech of black Americans is worse than light-skinned and men of African-Americans worse than women. This is due to African American English: worse than the algorithms tuned on the understanding of the peculiarities of pronunciation and intonation of the dialect, the authors of the work. Article published in the journal Proceedings of the National Academy of Sciences.
The speech recognition system should be accessible and universal. It is necessary to train the algorithms on samples of the speech of the people of all populations equally, but this is not always possible, and any kind of discrimination, which result from the irregularity of the initial data, become visible, and sometimes dangerous. So, some researchers have noted racial disparities in systems of face recognition, risk assessment in criminal justice, advertising on the Internet. There are suspicions about racial discrimination in speech recognition, because for each language there are many dialects and accents that affect the recognition quality.
Koeneke Allison (Allison Koenecke) from Stanford University and her colleagues assessed racial disparities in systems of speech recognition five companies: Amazon, Apple, Google, IBM and Microsoft. To this end, the algorithms are applied to samples of speech from the extensive collections of interviews of black and white Americans. The text that was the result of automatic recognition, compared with decryption experts, and estimate frequency error.
The researchers assessed how the level of recognition errors associated with dialect speakers. In the samples of speech counted the frequency of occurrence of phonetic and grammatical features of African American varieties of English.
To understand why racial inequality, tested two hypotheses. The first is that the speech recognition algorithms are configured on the vocabulary and grammar of standard English and African-American worse learn distinctive design. The authors estimated the number of words which were found in the samples of speech, but was absent in the dictionaries of the algorithms, and then compared the level of complexity of the language of speakers of different races. The speech recognition system predict each next word in a sentence, and the more surprising for them is the phrase, the less likely that the algorithm will correctly record it.
The second hypothesis — that the problem lies in the acoustic sensitivity of programs: the algorithm knows the words and lexical structures, but cannot hear them. To test this hypothesis, the researchers compared the recognition accuracy of identical sentences that were uttered by the people of different races.
The frequency of errors in speech recognition of African Americans was on average twice that of white women. It is concerned more black men (41 percent correctly recognized words), than women of 30 percent). On the chart of the distribution of the average frequency error of the difference are more pronounced: 25 percent of African Americans, the level of inaccuracies was higher than 50 percent. Less than two percent of the samples of the speech of white Americans has been recognized with such frequency errors.