New metrics will help to find meaning in “hahaha” and “yaaay”

American linguists have developed metrics for quantitative analysis of the longer words used when communicating on social networks, such as “hahaha”, “goooaaaallll” or “yaaay”. The statistics of the use of such words by scientists quantitatively described by two independent parameters: the elasticity and balance of the word. This approach can be used to analyze language in different applications and effects of limitations and the services, write the scientists in PLoS ONE.

To give your statement emotional coloring, in oral speech it is possible to raise or lower the voice, or to add appropriate intonation. In a written literary language come to the aid of punctuation and the verbal description of emotions, but when dealing on the Internet operate slightly different laws. Besides punctuation (which don’t always work the same way as other methods of communication), for the expression of emotions people use Emoji or stickers, but in the framework of verbal communication has its own techniques: for example, you can write only capital letters, or stretch the word, repeating it one or more letters.

When social networking is the use of stretched forms of a word are no longer a rarity — they can be repeated as vowels and consonants, with different emotional overtones attached to the cue: the repetition of the letters can demonstrate the joy, anger, irony or compassion. Therefore, linguists have been trying to find the relationship between the length of the stretched words and their emotional overtones and understandhow quickly one loses the connection with the original simulation of the stretching of the syllables in the spoken language. Limitation of all these studies is that the standard tools of mathematical linguistics (in particular the methods of natural language processing) with great difficulty transferred to the language of communication on the Internet, and a special universal metrics for its analysis, virtually no.

American linguists from University of Vermont under the guidance of Peter Sheridan Dodds ‘ (Peter Sheridan Dodds) have proposed such metrics for analysis of long words. To do this, scientists analyzed a random sample of English tweets from 2008 to 2016. Scientists have analyzed about 100 billion tweets in English, which was used in a stretched form words.

For every stretched words linguists have identified the core — that is, the initial form in which the repetition of characters there. According to the authors, this kernel can be extended in several ways: if repeats each of the symbols (i.e. “goal” becomes “ggggoooaaaaaalllll”), again only some of the letters of a word (e.g. the vowels: “goal” → “goooooaaaaal”), repeat elements consisting of several characters (“ha” → “hahahhahaa”) or mixed type, combining several principles.

