The neural network is trained to recognize the speech activity of the cerebral cortex of man. The algorithm was trained and tested on a limited set of 50 sentences, recognition accuracy averaged 97 percent. Training on additional data reduced the number of errors, which suggests that the algorithm is not just memorizing sentences and finds patterns. Article published in the journal Nature Neuroscience.
Modern neural interfaces are good at decoding brain activity that is associated with the movements, so the person can manage, for example, artificial limbs. But to directly translate the electrical signals of the brain to speech until now was impossible, so commonly used scheme in which the person controls the cursor and print text on the computer screen. But even if the accuracy of control of the cursor was ideal to talk this way anyway, would be obtained very slowly — the maximum speed would be like when printing a single finger.
There were efforts to decode speech from brain activity directly, but so far had managed to understand only the individual phonemes or sounds, and recognize speech, which consisted of a limited set of 100 words, scientists have managed with the accuracy of less than 40 percent.
Scientists from the University of California in San Francisco under the leadership of Joseph Macina (Joseph Makin) have used machine learning techniques in order to decode the speech activity of the brain. Four volunteers wrote down electrocorticogram analogue electroencephalogram, in which electrodes applied directly to the cortex (electrodes patients set for medical reasons). The grid of electrodes covers the area of the cortex around the lateral sulcus — this area includes language centers.
Participants read aloud sentences from two databases: one that 30 sentences and 125 unique words, the second 50 proposals and 250 unique words. Each set of sentences was read at least three times (two for training and one test).
From electrocorticogram allocated high frequency (70-150 Hz) and used this sequence as input for the neural network Avtandilovich. These algorithms translate the incoming signal in the internal code, and then decode it. The neural network would translate the information first on the “your language”, and then in the required format.
In the first phase the algorithm allocates a small sequence of the electrical activity of the brain that were typical for certain speech component was repeated in different samples. The neural network was then filtered the rest of the information and passed to the next element, only the remaining sequences.
As the encoder, the authors used a recurrent neural network that can process the chain of events, regardless of their length and uses its internal memory. The encoder at this stage, as predicted and encoded sound characteristics of the words, it was an additional hint when training algorithm. The second recurrent network is then translated internal consistency in the text, the unit of transcription was the word (not sound as in the previous models). In addition, the input to the decoder used their own prediction of the next word in the sentence.
The researchers suggested that the neural network may simply memorize a set of sentences and choose the most suitable. To test this hypothesis, the algorithms that used a limited set of data and a speech recognition with a large number of errors, additionally trained on a different set of sentences, and also used the brain activity of another person. In addition, for comparison of its paradigm with the previous authors tried to decipher the same samples of speech using a hidden Markov model that can be used as the unit only sounds, but not words.
To find out which anatomical region makes a greater contribution to the work of the algorithm, scientists programmatically change the data of the individual electrodes and tested how to increase the number of errors in the recognized text.
The accuracy level of transcripts of the speech was about 97 percent. If the decoding is not used the audio features of speech, the number of errors was increased several times, but the level of accuracy remained acceptable (below 25 percent). Hence, the algorithm can be trained, even if the person could not say the sentences aloud. Training on a different set of sentences or data from another person significantly reduced the number of errors (p < 0.001). This suggests that the neural network is not just memorizing a limited set of proposals, but also highlights the overall patterns of speech and language. The level of error in the model, which was not deciphered the words, and the sounds were above 33 percent that means, the approach of this work more effective.
The greatest contribution of the algorithm was made ventral sensorimotor cortex and the superior temporal sulcus region that are associated with the creation and perception of speech.
In the future we can train the model on a larger number of samples of speech, so she could recognize the speech that is not restricted by a set of 50 sentences, including in patients who are unable to speak and move.
For the first time such a model the scientists introduced last year: the paper has been used deals with a large set of words, but the level of recognition errors were between 7 and 11 percent. Interestingly, this algorithm worked and when participants pronounced the words silently, even though the recognition accuracy in this case was reduced to 30-60 percent.