In 2011, this author lived in the Middle East and learned that his Egyptian friends often texted, emailed, and Facebook chatted using latin characters. They wrote in Arabic, but they didn’t use Arabic letters. Instead they transliterated Arabic words into the same alphabet that English uses. Many shops already wrote their signs in English. Did this portend a decline of the use of the Arabic script?
New research suggests that the digital future of Arabic is secure. But thousands of other languages may never make the leap into the digital age. A full 96% of the world’s 6,000+ languages appear to be dead when it comes to use on cell phones, laptops, and tablets, meaning that the Internet could be to languages what a certain comet was to the dinosaurs.
It is common to use the term “evolution” to describe the changes to everything from football teams to presidencies. But when academics describe the evolution of languages, they literally mean that languages parent distinct offshoots, compete for usage, and die out like biological organisms.
For this reason, UNESCO maintains an atlas of endangered and extinct languages that uses a classification system similar to that behind the Endangered Species List. Researchers generally measure the vulnerability of languages to extinction by metrics such as the number of native speakers. In a new research paper, “Digital Language Death,” mathematical linguist András Kornai asks what languages are endangered online and in usage on electronic devices.
Making the Jump Online
Three main warning signs alert concerned researchers about the danger to a language:
First, there is loss of function, seen whenever other languages take over entire functional areas such as commerce. Next, there is loss of prestige, especially clearly reflected in the attitudes of the younger generation. Finally, there is loss of competence, manifested by the emergence of ‘semi-speakers’ who still understand the older generation, but adopt a drastically simplified (reanalyzed) version of the grammar.
Kornai notes that the same foreshadowing applies to language use on digital devices. But whereas researchers are used to watching for languages in decline, in the digital case, the question is whether languages can undergo the opposite process and establish themselves as viable options for digital use. Is it possible to fully communicate online in that language? Is it seen as a digital language? Can one become a digital native within that language?
People around the world speak over 6,000 languages, so the challenge facing Kornai was how to measure whether each language was making the leap into the digital age. Some of the means he used to measure languages’ online presence and vitality include crawling public online text to get at the size of online material in each language, measuring the number of Wikipedia entries in each language, and looking for the level of software support in each language - from Apple support to spell checkers to its presence in the Unicode standard and other databases that allow electronic devices to actually recognize a language.
By selecting several languages as representative of five different classes - thriving, vital, borderline, heritage ( the language’s online presence is purely the work of linguists archiving the language), and still (not present online and on digital devices) - and using some nifty machine learning that interested readers can learn about here, Kornai found how many languages had made the leap to the digital age.
A language’s Wikipedia presence was one of the most important indicators of its ability to leap into the digital age. The graph shows the ratio of the number of speakers of a language to the size of its pages on Wikipedia (on a logarithmic scale). Source: PLOS ONE.
Linguists note the extinction of languages with some pessimism: around 2,500 out of 7,000 languages spoken today are endangered. But the picture Kornai emerges with is much more alarming. Only about 170 languages, or 2%, are vital or thriving online. Another 140 (1.7%) are borderline cases. The remaining 96% (over 6,000) are still or “digitally dead.” And in Kornai’s opinion, given the prerequisites of a strong publishing infrastructure that includes technical tools to make it possible to use a language on digital devices and broad use by the younger generation, those 6,000+ languages have no hope of making the leap into the digital future.
The domination of online life by a small subset of the world’s languages can be seen as inevitable, unifying, or keeping out nonspeakers. But it does entail a loss. Kornai quotes UNESCO:
Each language reflects a unique worldview and culture complex, mirroring the manner in which a speech community has resolved its problems in dealing with the world, and has formulated its thinking, its system of philosophy and understanding of the world around it. In this, each language is the means of expression of the intangible cultural heritage of people, and it remains a reflection of this culture for some time even after the culture which underlies it decays and crumbles, often under the impact of an intrusive, powerful, usually metropolitan, different culture. However, with the death and disappearance of such a language, an irreplaceable unit in our knowledge and understanding of human thought and world-view is lost forever.
Oct. 15, 2014 · 19,136 views
Mechanical Turk lets you pay people small amounts to complete "microtasks". Now, academics are using it to make their research faster, cheaper and -- arguably -- more accurate.