Data | Among languages mostly confined to a State, Tamil leads with 1.5 lakh Wikipedia articles

Among languages mostly confined to a State, Tamil dominates with over 1.5 lakh articles

Updated - June 23, 2023 05:38 pm IST

Published - June 23, 2023 04:43 pm IST

Wikipedia logo: A globe featuring glyphs from various writing systems

Wikipedia logo: A globe featuring glyphs from various writing systems

In India, if we consider non-English language Wikipedia, the highest number of articles are available in Urdu, Hindi and Tamil. A non-English language Wikipedia is not a translation of English articles. It is self-sustaining: active users and moderators create and moderate content in their languages. Among languages which are mostly confined to a State, Tamil leads by a wide margin, with 1.6 times more articles than the second-best, Marathi, followed by Malayalam and Telugu.

Understandably, when all the global languages are considered, English leads the list with 66,71,236 articles (Chart 1).

Chart 1 | The chart lists the 320 languages in which Wikipedia articles are available. The bigger the size of the bubble, the more the number of articles.

Charts appear incomplete? Click to remove AMP mode

Interestingly, Cebuano, a regional language spoken widely in the Philippines, has the second-highest number of articles in Wikipedia (61,23,197). The Cebuano entries are written in Latin alphabets. However, news reports show that many entries were made in Cebuano by a bot.

German (around 28.1 lakh), Swedish (25.6 lakh), French (25.3 lakh) and Dutch (21.2 lakh) are the other prominent languages in which a considerable number of Wikipedia articles are maintained. There are relatively few articles in Chinese and Cantonese (13.6 lakh articles and 1.3 lakh, respectively) despite the fact that many more people speak these languages.

Chart 2 | The chart lists the 23 languages spoken in India in which Wikipedia articles are available. 

Urdu, Hindi, and Tamil lead with 1.5 lakh-2 lakh articles each, followed by Bangla, spoken widely in West Bengal and Bangladesh, with 1.4 lakh articles. Among other languages confined to a State, Marathi, Malayalam, Telugu, and Punjabi dominate, with 0.5 lakh-1 lakh articles each. There were around 12,000 articles in Sanskrit and around 15,000 in Sindhi.

Clickto subscribe to our Data newsletter

There are no Wikipedia articles in two of the 22 languages in the Eighth Schedule of the Constitution: Bodo and Dogri. On the other hand, Bhojpuri, Bishnupriya, and Tulu (with just 1,884 articles and featuring last) are the non-scheduled languages in which Wikipedia articles are available. Of them, interestingly, there were over 25,000 articles in Bishnupriya, which had 79,646 recorded speakers as per the 2011 Census. The number of articles in Bishnupriya is just 5,000 less than the entries in Gujarati and Kannada.

Chart 3 | The chart shows the number of Wikipedia administrators available in each language, who can delete and undelete pages, block users, edit protected pages, and grant rights to others. They have been given extra editing privileges by the Wikipedia community. 

English language administrators dominate (898), while German and French are a distant second and third (Chart 3). Among the Indian languages, Tamil leads with 35 administrators, followed by Malayalam (15) and Bangla (14). Hindi has six administrators and Sanskrit, three.

Chart 4 | The chart shows the number of Wikipedia users. A user is one who has created an account on the site. 

Those who browse Wikipedia without registrations are not considered users. English dominates with over 4.5 crore users, while all the other languages have less than 1 crore users (Chart 4). Among the Indian languages, Hindi dominates with 7.6 lakh users, and among languages mostly confined to a State, Tamil leads with 2.2 lakh.

Chart 5 | The chart shows the number of active Wikipedia users.

 An active user is a registered user who has performed an action in the last one month, which includes editing an article or taking part in page discussions. The dominant languages of active users were similar to that of the users.

Source: Wikimedia Statistics and Census of India

Also read | Wikipedia Asian Month: Voice for the under-represented

Listen to our data podcast: How Turkey’s economic and political trajectory compares to India

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in


Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.