American developers have taught a neural network to replace the background in the video with a higher accuracy than the counterparts. The feature of the algorithm is that in addition to the video, it requires one frame in which a person or other object left the field of view, and thus showed the background. The article will be presented at the conference CVPR 2020.
As a rule, for shooting videos with the replacement of the background using green screen in the background (the screen may be of a different color, however, this title is historically entrenched behind this technology). Thanks to the bright and uniform screen surface it is easy to separate from the foreground. However, this pattern of shooting and mounting may not be suitable for all rollers.
If you need to remove on the street or in the premises without the green screen, but background replacement is needed, for example, during a video call, you can use the algorithms of separation of the background. They determine the area of the frame with the man in form, color, contrast or other characteristics, and is applied around a given background. For home use the quality of such algorithms is sufficient, but they still have a noticeable visual artifacts, especially when working with specific objects. For example, they are much worse cope with the separation background on the background of hair.
Researchers from the University of Washington under the guidance of Irina Kemelmacher Shlizerman (Ira Kemelmacher-Shlizerman) proposed a new architecture of neural network and method of its training, allowing an almost perfect replacement of background. The algorithm takes a video with a person or another object and is conventionally a fixed dimension (the algorithm is able to work with videos taken with the hands), as well as a photo from the same angle before or after the man came out of the frame.
There was supposed to be a video, but something went wrong.
In addition to the current video frame, the algorithm receives the input of the photo with the background, and without one, a mask on which a person is separated from the background using the segmentation algorithm, as well as adjacent frames. Mask with a dedicated person is needed in order to actually specify the neural network on the main object in the frame that does not need to be cut. And the neighbouring frames helps improve the accuracy of cutting due to the fact that people, even in a static condition are still a bit moving.
As a result, the neural network takes into account the different data channels and efficiently separates the person from the background. At the output of the neural network produces two images: a color image with a carved human mask in the alpha channel, combining together with the background frame can be obtained almost identical to the original frame.
To teach the developers have chosen a dual scheme. First, they trained a neural network on a copy of Adobe dataset Matting, which consists of several hundred pairs of the cut of the object and the corresponding alpha mask. These pairs, the researchers combined with background images of dataset MS-COCO.