Researchers from OpenAI introduced the GPT-3 algorithm, which can perform different tasks for writing text on the basis of a few examples. The new version uses the same architecture as in the previous algorithm GPT-2, but the developers have increased the number used in the model parameters up to 175 billion, having trained model 570 gigabytes of text. In the end, the GPT-3 can answer the questions read the text, write poetry, solve anagrams, solve simple math problems and even to translate — and for this she needs a little (10 to 100) examples of how to do it. Detailed description of the algorithm the researchers have posted on arXiv.org.
An important limitation of current algorithms, NLP (natural language processing) is a dependency context: many algorithms can perform only those tasks to which they are trained. For example, the algorithm that writes poetry, they must be trained on a large corpus of poems — preferably in the style, which should be final. If the training is successful, the algorithm will be able to produce something similar to the verse, but to answer a question or to make a list of words for a crossword puzzle he can not.
How much data is needed for training NLP algorithm on a specific task depends on the algorithm predobar: if the system is well-known all the requirements of grammar, and to generate meaningful sentences he initially, specifically for learning a task need not much data. The challenge, therefore, is to make pridorozhny NLP-algorithm is universal such that it actually knew how to do using for training the minimum amount of data.
To solve this problem, a team of researchers from the company OpenAI under the leadership of Tom brown (Tom Brown) introduced the GPT-3. This NLP algorithm based on the previous version, presented in February of last year: GPT-2, one of the most used and advanced NLP models trained on 40 gigabytes of text, and its metadata is to predict the next word in the text. Like its predecessor the GPT, the GPT-2 is based on the architecture Transformer.
To train the algorithm, the researchers collected dataset the 570 gigabytes of text, which included project data Common Crawl, Wikipedia, two dataset with books and the second version of the WebText dataset containing texts from web pages (the first version of the WebText used to teach the GPT-2). The researchers taught eight different models of GPT-3: they differ in the number of parameters that the model established during the training (number of parameters, in turn, depended on the number of layers in this architecture has used one and the same). Within the simple model used 125 million parameters, and in the final GPT-3 — 175 billion.
A task that GPT 3 had to do, was to answer the question or complete the task. It might be, for example, “write a poem”, “to disassemble the anagram of” or “to read the text and answer the question.” Pridorozhnoy GPT-3 to complete the job (total jobs was 42), in addition to the wording of the task was given either a single example or a few examples (classically 10 to 100 — as long as the model will be necessary, although in some tasks the model was missing and five examples).
Despite the fact that the accuracy of each method, training the model increased with the number defined in the model parameters, trained on several examples was the most effective: all the 42 jobs accuracy at 175 billion parameters made up almost 60 percent. For example, when training on a 64 examples of dataset TriviaQA, which is designed to train the models to understand the text and answer the questions on reading material, GPT-3, with 175 billion of the parameters were accurate in 71.2% of cases — this is slightly more accurate than the SOTA model, which is trained solely to answer questions on TriviaQA.
Several examples preduzecima GPT-3 can write texts on a given topic, come up with poems in a certain style, solve anagrams, solve simple arithmetical problems, to answer questions on the text. In addition, the model is able to translate into several languages: in the collection of data, scientists have not limited the language of the texts, therefore, seven percent of all dataset — texts in foreign languages, which are also used as a model for translation on several examples.
As in the case of GPT-2, the researchers in the paper expressed their concern that the developed model can be used to the detriment — so they have yet to offer. On page of developers on GitHub you can find a piece of dataset and examples of tasks that were used in the work.
In recent years, OpenAI succeeded not only in NLP-algorithms: last year, the company’s developers have presented the algorithms that can invent new music and to collect the Rubik’s cube.