Challenges of making Telugu language Internet friendly

‘Encoding all symbols under the Unicode system is far from over'

October 01, 2011 03:00 pm | Updated 03:00 pm IST - MILPITAS(California):

Linguistic and computer experts grappled with the various challenges of making Telugu language Internet-friendly at the three-day International Telugu Internet Conference under progress here.

The task of encoding all the symbols of Telugu under the Unicode system is far from over, they felt.

Unicode Consortium president, vice-president, and chairperson of Unicode Technical Committee Lisa Moore said the consortium was encoding several Indic scripts. There were still some Telugu symbols that needed to be encoded.

Prof. Peri Bhaskara Rao of the Research Institute for the Languages and Cultures of Asia and Africa, an Institution of the Tokyo University of Foreign Studies, and chairman of the first Telugu Internet Conference outlined several problems in making Telugu a more Internet-friendly language. He said the need to drop a few symbols of the Telugu language that were “spoofable” ( used to imitate other characters) was under examination.

He said there were several problems in developing language editors, spell checker, and text-to-speech (TTS) systems for Telugu.

Prof. G. Uma Maheswara Rao of the Centre for Applied Linguistics and Translation Studies, University of Hyderabad, said developing a spell checker for Telugu was challenging because it was an “agglutinating language with a very complex morphology coupled with prolific sandhi (also known in linguistic terms as morphophonemics).

“Designing a spell checker for Indian languages such as Telugu poses many new challenges not found in English. In Telugu, inflectional elements (which include different kinds of auxiliary verbs, postpositions, particles, and case-makers) are always bound to the stem resulting in highly synthetic word forms.

The number of possible verb forms for a verb stem in Telugu, therefore, is very high running into millions, aggravating the task of the morph analyzer (of the spell checker),” he said.

A team of experts at the Hyderabad University were trying to address all these problems, he said.

Vasudeva Verma from the Search and Information Extraction Lab, IIIT-Hyderabad, outlined the efforts being made to develop a Cross Language Information Access (CLIA) in Telugu. He said that CLIA could be considered as an extension to Cross Language Information Retrival (CLIR) systems. This would help in making accessible the huge amount of information available in different languages, mostly English, to people who know only Telugu.

Rich phonetics

He said a team in IIIT was working on developing CLIRs in the domains of health and tourism with funds from the Government of India. CISCO systems architect Kolichala Suresh said it was good time to think about reforms in Telugu script.

He suggested inclusion of some new symbols to preserve the rich phonetics of the language. He said living languages constantly evolved and particularly at the time when technology used to write or print them changes. It is known to all that script of Telugu is rounded because palmyra leaf was used as writing material.

A few symbols were dropped and changed when typing technology came up because it did not allow horizontal staking, he said.

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.

Challenges of making Telugu language Internet friendly

‘Encoding all symbols under the Unicode system is far from over'

Rich phonetics

Related Topics

Top News Today

Comments