Despite having Unicode membership, Tamil Nadu government shuns open standard 

Though the T.N. government has spent $200,000 in membership fees for the global text standard, it still does not use Unicode for the Tamil documents it publishes, making them harder to access and search

October 18, 2023 11:22 pm | Updated 11:53 pm IST - NEW DELHI

Tamil Nadu is a ‘Supporting Member’ at the Unicode Consortium. This privilege allows the State a half vote on proposals at the Unicode Technical Committee, where it has not voted in years. Picture shows the logo of Unicode Consortium. Photo: X/@unicode

Tamil Nadu is a ‘Supporting Member’ at the Unicode Consortium. This privilege allows the State a half vote on proposals at the Unicode Technical Committee, where it has not voted in years. Picture shows the logo of Unicode Consortium. Photo: X/@unicode

Though Tamil Nadu is one of only three government members of the Unicode Consortium that sets digital standards for scripts and emojis, the State government has largely stayed away from actually using the standard, The Hindu has found.

The State government has spent $200,000 over a 16-year period in membership dues to the Consortium, but has not attended any meetings for the last decade, a response to a Right to Information (RTI) query and meeting minutes from the Unicode Consortium show.

Press releases, government orders and gazette notifications usually use custom Tamil fonts that are incompatible with the Unicode standard and can harm the accessibility of government documents and their discovery online.

String of gibberish

In June 2021, the State government issued an order stating that many departments continued to use ‘proprietary’ fonts that are incompatible with Unicode. Since PDF files allow users to display any font on any machine, which can open that file format, these problems may not be immediately visible. However, while copying the text from these files or using a screen reader — as hearing impaired people might — random characters replace the Tamil text, making it hard for search engines to index the files and, subsequently, for people to find them. A Tamil sentence in a PDF produced by a government body, when copied into a Unicode machine, reads as a string of gibberish.

Tamil Nadu is a ‘Supporting Member’ at the Unicode Consortium. This privilege allows the State a half vote on proposals at the Unicode Technical Committee, where it has not voted in years. The governments of Bangladesh and Oman are the only others who are currently paying for this membership. Their fees have ranged from $12,000 to $14,000 a year.

A senior official disputed the characterisation that the State government was insufficiently addressing the issue, stating that meetings were taking place to transition to Unicode and the State was “actively taking part”, including distributing free tools that convert older proprietary fonts into Unicode.

Tamil script standards

Minutes of meetings reviewed by The Hindu show that individual contributors from around the world had contributed to the fine-tuning of Unicode’s Tamil script standards. The Tamil Nadu government’s last published contribution in the Consortium’s published records — by way of letters exchanged with the consortium in absentia — was more than six years ago, in 2016.

The board of the Tamil Virtual Academy (TVA) — which represents the State government at the Consortium and is supposed to carry out Unicode-related activities — should be headed by the State’s Information Technology Secretary. However, records reviewed by The Hindu show that since 2021, it has been chaired by Finance Secretary T. Udhayachandran, who also heads TVA’s general body.

In the 1990s and the 2000s, the State government actively deliberated on issues with Unicode’s implementation of the Tamil script due to concerns including how the system treated individual diacritics as separate characters, thus increasing the amount of computer memory required for each character.

However, the increase in computing power and storage capacity in mobile phones and computers in the past couple of decades, coupled with tweaks to the Unicode standard over the past few years, has diminished the significance of computer memory as a problem.

While Unicode likely faced such efficiency issues across other languages, it is largely accepted as a standard on most operating systems, including Windows, Android, iOS, and Linux-based software. However, the Tamil Nadu government has not entirely moved to the standard yet, causing issues that are not usually present in other non-English documents produced by organisations around the world.

Proprietary fonts

On top of not implementing Unicode, the Tamil Nadu government has spent money on proprietary Tamil fonts that are incompatible with Unicode, according to two people with knowledge of the issue. It has also required government bodies to pay licensing fees for each computer where these proprietary fonts were installed.

It is unclear if such fees are still being paid for the fonts, but if officials and government employees were to switch to Unicode, they would have to wean themselves off these proprietary solutions after years of getting used to their custom keyboard layouts, one of the two people said.

(With inputs from Sanjay Vijayakumar)

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.