‘Encoding all symbols under the Unicode system is far from over'

Linguistic and computer experts grappled with the various challenges of making Telugu language Internet-friendly at the three-day International Telugu Internet Conference under progress here.

The task of encoding all the symbols of Telugu under the Unicode system is far from over, they felt.

Unicode Consortium president, vice-president, and chairperson of Unicode Technical Committee Lisa Moore said the consortium was encoding several Indic scripts. There were still some Telugu symbols that needed to be encoded.

Prof. Peri Bhaskara Rao of the Research Institute for the Languages and Cultures of Asia and Africa, an Institution of the Tokyo University of Foreign Studies, and chairman of the first Telugu Internet Conference outlined several problems in making Telugu a more Internet-friendly language. He said the need to drop a few symbols of the Telugu language that were “spoofable” ( used to imitate other characters) was under examination.

He said there were several problems in developing language editors, spell checker, and text-to-speech (TTS) systems for Telugu.

Prof. G. Uma Maheswara Rao of the Centre for Applied Linguistics and Translation Studies, University of Hyderabad, said developing a spell checker for Telugu was challenging because it was an “agglutinating language with a very complex morphology coupled with prolific sandhi (also known in linguistic terms as morphophonemics).

“Designing a spell checker for Indian languages such as Telugu poses many new challenges not found in English. In Telugu, inflectional elements (which include different kinds of auxiliary verbs, postpositions, particles, and case-makers) are always bound to the stem resulting in highly synthetic word forms.

The number of possible verb forms for a verb stem in Telugu, therefore, is very high running into millions, aggravating the task of the morph analyzer (of the spell checker),” he said.

A team of experts at the Hyderabad University were trying to address all these problems, he said.

Vasudeva Verma from the Search and Information Extraction Lab, IIIT-Hyderabad, outlined the efforts being made to develop a Cross Language Information Access (CLIA) in Telugu. He said that CLIA could be considered as an extension to Cross Language Information Retrival (CLIR) systems. This would help in making accessible the huge amount of information available in different languages, mostly English, to people who know only Telugu.

Rich phonetics

He said a team in IIIT was working on developing CLIRs in the domains of health and tourism with funds from the Government of India. CISCO systems architect Kolichala Suresh said it was good time to think about reforms in Telugu script.

He suggested inclusion of some new symbols to preserve the rich phonetics of the language. He said living languages constantly evolved and particularly at the time when technology used to write or print them changes. It is known to all that script of Telugu is rounded because palmyra leaf was used as writing material.

A few symbols were dropped and changed when typing technology came up because it did not allow horizontal staking, he said.