Thoughts on Wikipedia and its Language Challenges

This was written soon after Wikimania 2010 for the New Indian Express.

(With inputs from Arun Ram and Wikimania 2010 attendee, Srinivas Gunta)

A common point of discussion in matters regarding the global Internet is the somewhat inequitable distribution of content by languages with a skew towards English and languages of the traditional geographies of the Global North. Wikipedia is not immune to these inequalities either and this was a major point of discussion at the 2010 edition of Wikimania which recently concluded at Gdansk in Poland. Wikimania is an annual gathering, organized by the Wimikedia Foundation, of Wikipedians, as those who contribute to Wikipedia are called, who meet to discuss the state of various Wikipedia projects and to chart a course for the year ahead.

What stood out was the scale at which the Wikimedia Foundation is thinking. Its strategy plan aims to increase reach to 680 million unique visitors globally by 2015 (from the current 388 million). The aim is to achieve a 12% annual growth in the Global South, and 4% annual growth in the Global North; in other words most of the growth will be in Wikipedias in the multiple languages of the Global South.

Jimmy Wales’ keynote address at Wikimania this year focused on countries of the Global South and he did video interviews with active Wikipedians from the Bangla  and Tamil Wikipedia buttressing the importance of the Foundation focusing on the smaller languages and varied geographies as represented within Wikpedia projects.  Among other things, the Foundation’s strategy plan aims to foster the growth of smaller Wikipedias – by 2015, the aim is to have 100 Wikipedia language versions with more than 120 thousand “significant articles” each. To this end, the Foundation also aims to bootstrap community programs in key geographies: India, Brazil, the Middle East/North Africa.

Two presentations highlighted the challenges and the possibilities ahead. Achal Prabhala, a Wikimedia Advisory Board Member, spoke about the need for local representative bodies of the Wikimedia projects, or Chapters, in countries which were linguistically underrepresented. Achal’s larger point is that there is a distinct relationship between local growth and the existence of local chapters and that geographies in the South present enormous prospects for growth. They also present prospects for an increase in scope – which could mean, in turn, new ways for Wikimedia to grow the world over. On a cautionary note, Harel, from Wikipedia Israel, spoke of his experiences, that have been contrary to expectations, where local Wikimedia Chapters may find themselves in adversarial relationships with local Wikipedian communities and that there if often a trust deficit between the two sides. Harel spoke of the need for local chapters to treat editing communities as peers and equals. Chapters are meant to do outreach, he cautioned, while editing is the preserve of the community and that this is something that the community must be left to do without chapter interference.

Given this inequitable distribution of linguistic content within Wikipedia projects, external organizations have seen this as a possible gap to fill and there were presentations on translation toolkits and machine translations of content to populate otherwise sparse language Wikipedias. This is a route that has met with some resistance. An example is a translation toolkit that Google had introduced to blend both computer aided, or machine, translation with human translation. Users of this tool have been translating popular English language articles in to various local languages with a varying degree of success. However, ironically, the size of existing active user base in each of these Wikipedias may itself determine how successful these efforts will be. Translators using the tool needed lot of hand-holding and overseeing and after initial hiccups, Tamil Wikipedia has been able to engage with Google such that their contributions now fulfil quality parameters too, thanks to availability of more active users.

It is interesting to see how multiple approaches are being deployed to solve one common problem – a lack of linguistic diversity that matches the proportion of Internet users online. It is to be expected that there are some tensions between organic community lead translation efforts and efforts that are focused on automated translation and Wikimania provided both sides a venue at which to engage with each other to resolve their differences and work collaboratively.

Here’s what needs to be kept in mind while steaming ahead in India: English, with 225 million speakers in India, is also an Indian language. Several Indian editors already contribute to the English Wikipedia. So the emphasis needs to be on boosting contributions in all Indian languages, including English – rather than just an ‘Indic languages’ vs ‘English’ paradigm. Innovative ways to boost edits – and bring in new editors – include holding Wikipedia academies across the country; finding low-cost ways to create public access to Wikipedias in places like public libraries and removing technological obstacles related to scripts, keyboards etc.

With the Foundation’s new thrust on the creation of local Chapters and with the India Chapter in the final stages, one can expect a greater
deal of focus on these issues both within India and other under-represented areas of the world.

Advertisements

About gkjohn

Recovering lawyer, erstwhile entrepreneur, pretend polymath, hopeful zookeeper and future dilettante and farmer of organic strawberries. Work at @aksharadotorg and @klpdotorg. Previously at @prathambooks. Was a @tedfellow.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s