Arabic Wikipedia: Why it lags behind

Introduction

Wikipedia is among the most visited web sites on the internet, it ranks as the fifth most visited website on the internet and receives over 500 million unique visitors monthly, a huge achievement considering the website is completely ad-free and run by a non-profit charitable organization.[1] Another attractive aspect of Wikipedia is that it has always been viewed as a multi-lingual project, and currently the encyclopaedia has over 200 language versions, which vary in size and in quality.

The Arabic version of Wikipedia is currently celebrating its tenth birthday (the first edit was made on the 9th of July 2003), and although it has come a long way in that time, it still lags behind other versions, with less than 250,000 articles. This may seem like a large number, but given the number of Arabic speakers globally, the size of the Arabic Wikipedia is still relatively small. The table below compares the number of articles between select Wikipedia language versions, and the number of speakers of the respective language.

 

 

Language Version

No. of articles No. of speakers Article per 1000 speakers
Persian 310,924 110,000,000 2.8
Arabic 235,385 255,000,000 0.9
Hebrew 148,018 5,055,000 29.3
Finnish 326,930 5,009,390 65.3

Source

Of course this is a rudimentary analysis, as this is only focusing on the number of articles, not the quality or length of such articles, having said, the Arabic Wikipedia lags behind in other areas as well, like the total number of edits or the number of active users.

 

figure 1
Fig 1: This map shows the large variation in edits in the MENA region (on all language versions of Wikipedia), with the height representing the number of edits. Why does the Arab region lag behind Israel and Iran? (Source)

 

So why does Arabic Wikipedia still lag behind? Below is list of reasons, which to varying degrees, may provide an answer.

Technical Reasons

Part of the explanation is straightforward. The Arabic world still suffers from high illiteracy rates, as well as low internet participation rates, and of course both are needed to be able to edit Wikipedia. Internet penetration rates can explain a large part of the variation in number of Wikipedia edits from any given country, and it seems that Broadband specially, has a very strong positive correlation with Wikipedia editing, and again, broadband availability is not high in most Arab countries.

Another basic reason which has been cited is the difficulty of communicating in Arabic online, partly because not all devices come with Arabic keyboards. This is led to the development of a new Arabic chat alphabet, which is a character encoding of Arabic to the Latin script and Arabic numerals (used for some of the letters that do not exist in the basic Latin script). The lack of Arabic Keyboards has meant that many young Arabs have grown to be more familiar with the Latin keyboard layout and find it hard to write on an Arabic Keyboard.

 

 

 

Diglossia and Language issues

The Arab world is peculiar in that at least two varieties of Arabic exist in any given region, the classical Arabic (often called fuṣḥá) which is the written form of Arabic, and the colloquial Arabic, which is the spoken form, and differs from region to region. This phenomenon, known as Diglossia in linguistics, is one barrier to writing. As one Wikipedia editor said: “Users are not used to talking in Arabic as it’s written down, so they feel that they cannot contribute,”. [2]

This point is very much linked to the education system in the Arab world. Although Classical Arabic the primary language of instruction in primary schools in most of the Arab world, higher education is increasingly being taught in foreign languages, specifically French and English. For example, in Tunisia, high school is taught almost entirely in French, and at University level throughout the Arab world, English is the language of instruction for many subjects, especially the sciences, which are taught almost exclusively in English. The only exception to this is perhaps Syria, which still advocates a strong Arabisation policy.

The language of instruction at Higher education is particularly important, as most Wikipedia editors tend to relatively young, with the median age being 18 years of age.[3] This creates a problem for Arabic Wikipedia, as increasingly young Arabs are not studying in their mother tongue, which in turn makes it harder for them to contribute in Arabic.

This leads us onto another reason which helps explains the lack of Arabic contributors, which is the prevalence of the two main colonial era languages in the Arab world, namely French and English.

 

figure 2
Fig 2: This chart shows which language version is visited most in Arabic countries, with blue representing Arabic, red representing English and green representing French. *Appears to be unreliable data (Source).

 

The figure shows that the Arabic Wikipedia is overall, a minority choice for Arabic readers, with some exceptions such as Yemen and Iraq. French is also very prevalent in former French colonies.

National Identity

Figure 1 shows how the number of edits varies significantly between Middle Eastern countries, with Israel being the stand out country, followed by Iran. So do Arab countries lag behind their neighbours? Part of the explanation is National identity, as Wikipedia is in some sense a barometer of the strength of national identity.

For example, the Catalan Wikipedia has a very active community which has produced more than 400,000 articles; the Spanish Wikipedia on the other hand reached the one million article milestone earlier this year. The key difference is that 420 million people speak Spanish in the world, where as the number of Catalan speakers is under 10 million. This strong presence is a reflection of a strong Catalan identity which is aiming to promote its language, and specifically, to compete with Spanish. [4] The Arab world by contrast, is a region which has already witnessed a rise and decline in nationalism, and does not feel that its culture is marginalised or under threat. Perhaps if Arabic Wikipedia existed in the day of Nasser it would have been in a different situation.

Another factor which has helped the Catalan Wikipedia community is that it is a geographically compact community, which means organising “Wiki meet-ups” and other events are easier, and this ultimately helps the development of the encyclopaedia. In the Arab region, such meet-ups are much harder to arrange. Again this can help explain why Israel and Iran have better representation, the communities can physically meet each other frequently and develop the encyclopaedia is a coherent manner.

Country specific problems

Another feature that Figure 1 shows is how most edits come from the Arab Mashriq (specifically Egypt and Saudi Arabia) as opposed to the Maghreb. This can be explained by the stronger colonial influences in the Maghreb region, which perhaps have diminished the status of Arabic to a certain degree.

Having said that, countries in the Arab Mashriq have also underperformed, for a number of reasons. In Syria, the government frequently blocked access to Arabic Wikipedia, Iraq has of course endured decades of war and internet access is poor. Another Arab country, Sudan, is also very poorly represented, with internet being notoriously difficult to access.[5]

Solutions

The poor state of the Arabic Wikipedia has led to some initiatives which have attempted to increase the quantity of articles on the website. The first significant initiative was started by Google in 2009, and the stated aim was raising and the quantity of articles in Arabic by mobilising a group of volunteers to translate articles from the English Wikipedia, with the aid of Google Translate. Although hundreds of articles were translated, the Arabic Wikipedia community decided to halt the project after a short period, as the quality of the articles were poor, but also because it was viewed by the community as a cynical way for Google to gain a large corpora of human-translated to feed into its Translation service, which is ultimately how the service is improved.[6]

More recently, Qatar Foundation has also started a similar project, also based on translating English articles into Arabic, but this was to be done by professional translation companies, which were to be paid for their services. The latter point was a contentious one for the Arabic Wikipedia Community, as being paid for writing articles goes against the ethos of Wikipedia, which has always been written by volunteers. But quality was also an issue, and although the translations were of better quality than the Google project, the content of the articles were not relevant to the Arab reader and were poorly presented, as the translators were not familiar with the Wiki markup language.

 

The solution to improving the Arabic Wikipedia is simply to attempt to attract more dedicated editors to the website, as this is ultimately how other versions have flourished. Some recent initiatives have attempted to tackle this issue, and the early results seem encouraging. The Wikimedia Foundation, the body responsible for running Wikipedia, created an Arabic Language Initiative and have run a number of programmes to increase the number of editors, focussing on university students. Such initiatives are much more likely to succeed, as only an increase in human effort can improve the encyclopaedia, something that machine translations cannot replace.

Conclusion

The Arabic Wikipedia is currently growing at a faster rate than many other major languages, yet it is simply playing catch up. The small size of Arabic Wikipedia is part of wider problem with Arabic content online, which represents only 3% of content online.[7] With languages being increasingly represented online, it is essential all languages make a smooth transition to the internet. Wikipedia is not a traditional encyclopaedia, is it more of an “agglomerator” that collects relevant resources about a given subject into one place, and its development is essential for improving the quality and quantity of Arabic content on the internet.

 

[1] Wikimedia projects reach more than 500 million people per month http://blog.wikimedia.org/2013/04/19/wikimedia-projects-500-million/

[2] Spreading the wiki footprint http://news.bbc.co.uk/1/hi/technology/7519830.stm

[3] Wikipedia Survey – Overview of Results http://www.wikipediastudy.org/docs/Wikipedia_Overview_15March2010-FINAL.pdf

[4] Catalan Wikipedia Reaches 400,000 Article Milestone http://globalvoicesonline.org/2013/04/19/catalan-wikipedia-how-to-place-a-stateless-nation-on-the-global-network/

[5] East Africa was the last part of the world to get fibre-optic broadband Internet, for more see here.

[6] From San Francisco to Cairo and back again: Collaborating across cultures http://ethnographymatters.net/2012/06/07/from-san-francisco-to-cairo-and-back-again-collaborating-across-cultures/

[7] Google announces ‘Arabic Web Days’, a month-long initiative to boost Arabic Web content http://thenextweb.com/google/2012/11/20/google-launches-arabic-web-days-a-month-long-initiative-to-boost-arabic-web-content/

3 responses to “Arabic Wikipedia: Why it lags behind

  1. Thank you for this very helpful article! I have long been wondering about the reasons for the current level of education and literature in the Arab countries. The three-(or more)-level use of local Arabic, high Arabic and English/French in different situation explains a lot. Historically seen, the Western education system was in a similar situation when the language of learning was Latin. In addition, the Arab letter system is highly complicated – as I understand, even for pupils in higher classes, a lot of effort goes into learning to perfect the writing system.

    As someone not burdened by deep knowledge about Arab language or culture, I would think the most efficient way of changing this would be to use the local dialect for basic education, respecting it as full language, and using a latinized version for it. During a short visit in Morocco some weeks ago, I was surprised that the majority of wall graffities was written using latin letters, not arabic ones. Call it a “grassroots letter revolution”.

    I think it would be worth while to start a crowdfunding campaign to fund writers for the small Moroccan Arabic Wikipedia Incubator project (or maybe the Tunisian version), avoiding the stigma of corporate or state financing. The resulting articles could also be published on paper or ebook / SD card, to be cheaply distributed to students with lack of internet access. What do you think about such a project?

  2. I’m sorry to say I would not endorse such a project. There is a minority voice within the Arabic Wikipedia community that wants more content written in local dialects, but such an approach would create a wealth of problems.

    I am primarily against the move to write in local dialects because 99% of publications (books, websites, etc.) in the Arab World use standard classical Arabic. We do not yet have references in most of these dialects, and standardising spelling and grammar would be near impossible.

    I am aware not everyone is comfortable writing in classical Arabic, but splitting the community’s (limited) efforts into smaller projects will not solve the problem.

  3. Insightful article, thought slightly off the mark. I think the following maybe interesting to note:

    1. Standard Arabic versus accents (dialects as you prefer to call them): this is highlighting a deeper issue of education in the Arabic speaking world, versus the Standard/accents argument. I fully agree with your point and response in these aspects.
    2. Technical reasons and literacy: your submission that illiteracy in the Arab world is high is not supported by UNESCO statistics. As to literacy, 2015 UNESCO figures show only Yemen, Morocco, Egypt and Sudan as the only countries with a literacy rate below 80%. World-wide median is 86%. As to internet proliferation, 2016 Internet usage status show that 4.8% of all Internet users are Arabic speakers, while representing 5.2% of the World’s populous.
    3. You entirely missed the point that the so-called “Arabic Wikipedia community” has wholly and completely monopolised the editing of pages into a handful of admins, and I do mean a very small handful. Edits are suspended by default; editors are few, which leads to both overloading as well as dogma and personality clashes, and that’s saying nothing of going against the spirit of Wikipedia. More significantly, subject matter experts are not able to construct pages and have them peer-reviewed by the wider population of readers. This latter point, together with the dogmatic nature of editors and bureaucrats mentioned earlier, has led to some outright stupid administrativa practices, as well as some really silly and poorly constructed pages, even when translated. Another shock is that editor privileges requests are almost always blanket denied, reflecting the monopolistic nature of the admin community. What’s further compounded this problem is the Wiki Foundation’s inability and unwillingness to intervene and marshal these points, instead choosing to stick its head in the sand and pretend that this massive blocker will be miraculously unblocked on its own.

    In as far as solutions are concerned, I genuinely think that the key to unravelling the problem with Arabic contributions lies in addressing the terrible state of the Wikipedia administration, which in turn starts, I think, by closely scrutinising current admins, weeding out offenders and opening the door to more nominations and the flow of healthy new blood. Another area is to openly encourage Subject Matter Experts to contribute articles and establishing policies to encourage them to become regular contributors, through education in Wikimarkup and Wiki etiquette, rather than throwing the Wiki rules book, sanctioning and fast removal of articles.

    Just my 2-pennies worth 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

All writers' views in articles are their own and do not necessarily represent the opinion of the Asfar team.

Published by Asfar in London, UK - ISSN 2055-7957 (Online)