Apple engineers teach the Raspberry Pi to recognize the events in the house by the sound

American programmers have developed an algorithm for smart columns, allowing them to learn to recognize the steps in the owner’s house sound almost not bothering him. The algorithm analyzes the sounds, remembering the location of their source, and splits them into clusters and when the cluster has enough similar sounds, he asks what the act is. Development engineers from Apple, and Carnegie Mellon will be represented at the conference CHI 2020, and the article about it published on the University website.

Many large IT companies such as Apple, Google or Amazon, develop large smart home system on the basis of their voice assistants. In an ideal scenario, people should be able to control with voice commands any devices in the house, and get feedback from them.

In many device classes already have a model that supports direct integration with smart home using wireless communication, but such models is a minority. For some types of devices there is a compromise solution, for example, a smart outlet that supplies a current team, or infrared remotes to control your TV or other supporting such a channel by the device. Some of the things in the house, in principle, impossible to connect to the smart home system, because it has no electrical components.

Anyway, today there is a smart home system that can create a full and comprehensive relationship between the voice and all objects or events in the house. Engineers under the leadership of Jerada Laputa (Gierad Laput) from Apple and the University Carnegie Mellon has developed a relatively simple method that allows smart column self-learning recognition events in the house almost without the help of the user.

The proposed method assumes that the column is usually in the same room in the same place, and the surrounding interior, which, incidentally, is also rarely change their location tend to change every few years and during this time make similar sounds. Thus, the voice assistant there are two quite reliable and almost invariable parameter for recognition: the acoustic characteristics of the event and its location, which can be calculated using an array of multiple microphones.

Engineers have implemented a prototype of a smart column on the basis of the microcomputer Raspberry Pi, an array of four microphones connected via the speaker wire and connected via Wi-Fi powerful computer for data processing. The device is in listening mode, but for a full treatment get only audio items that were not filtered at a minimum volume and clipping of background noise. Of the sampled fragment of the sound is based Mel-spectrogram is fed to a convolutional neural network, trained on dataset YouTube-8M with the complementary data of dataset professional sound effects. The data collected from the penultimate layer of the network to get them more low-dimensional representation.

These views fall into the common space of data and are processed by the algorithm using hierarchical agglomerative clustering, which are slowly, the algorithm generates clusters of similar audio fragments. At the stage of clustering in addition to the data about the sounds, the algorithm also takes into account information about the direction of the sound source, and during classification of new audio first used direction.

Leave a Reply

Your email address will not be published.