The persistence of the digital divide in the era of widespread computing probably poses the biggest challenge to the realisation of the promise that the Internet would offer deliverance to society at large.

In simple terms, the problems can be condensed in two sets. The first is the problem of access to computing — not to hardware but in the ability to handle computers in the language you already know. The second set of problems arises from the lack of content, or ‘knowledge' as stated in more fashionable terms.

Basically, access to computers — understood not merely as having a device to use, but as one which people can actually use in the language they know — arises from the ability to make the machine understand what you want it to do, in your own language. The initial innovators in Indian language computing designed their own fonts, but quickly realised that a more universal “language” was needed in order to be more accessible.

The adoption of Unicode as a universal standard for localisation was a “fundamental enablement” because it enables a font to be recognised in any format, irrespective of platforms and operating systems, says Meghashyam Karanam, project manager for localisation at Microsoft India. The Unicode has quickly grown to become a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems. Its latest version, Unicode 6.1, consists of a repertoire of more than 110,000 characters covering 100 scripts.

Realising that its own interests were tied to the wider adoption of Indian language computing, Microsoft undertook a drive towards localising its own software in Indian languages, soon after Bill Gates visited India in 1998.

By 2000, it started supporting Hindi; it now supports 12 Indian languages. Many of its software packages, including its popular Office Suite, now support the use of at least some Indian languages.

Pradeep Parappil, senior lead product manager, Windows, Microsoft India, says the company now offers bilingual and trilingual dictionaries, available as free downloads.

Content generation

While access is the most basic of the problems, content generation in Indian languages is still an issue. The problems are twofold, explains Mr. Karanam. The first arises from the huge backlog of undigitised content in Indian languages. The second is the question of translation of content — from Indian languages as well as from English and other languages.

Machine translation, which utilises software to translate text or speech from one natural language to another, also requires a sufficiently large body of texts in the native languages to become viable, says Mr. Karanam. The lack of a large enough body of digitised text in the Indian languages is thus a problem, which impairs the wider dissemination of “knowledge” in the Indian languages, he avers.

Microsoft has developed WikiBhasha, a multilingual content creation tool specifically designed for Wikipedia so that its users access English versions and translate them into the Indian languages.

Mr. Karanam believes that “wider collaboration” among industry, academia and the original generators of content in the Indian languages is necessary to bridge the content deficit. “Language technologies,” he says, are being developed, for instance by academics at Indian Institute of Technology–Mumbai, Patiala University, Ravenshaw University (Orissa) and several others. Referring to the enormous volume of content being generated in the Indian languages, in movies, television and blogs, apart from printed newspapers and books, he says, “It is only logical that the digital world will realise the need for greater momentum.”