Chennai team taps AI to read Indus Script

The Indus script has long challenged epigraphists because of the difficulty in reading and classifying text and symbols on the artefacts. Now, a Chennai-based team of scientists has built a programme which eases the process.

Ronojoy Adhikari of The Institute of Mathematical Sciences and Satish Palaniappan, who is at Sri Sivasubramaniya Nadar College of Engineering, have developed a “deep-learning” algorithm that can read the Indus script from images of artefacts such as a seal or pottery that contain Indus writing.

Scanning the image, the algorithm smartly “recognises” the region of the image that contains the script, breaks it up into individual graphemes (the term in linguistics for the smallest unit of the script) and finally identifies these using data from a standard corpus. In linguistics the term corpus is used to describe a large collection of texts which, among other things, are used to carry out statistical analyses of languages.

The algorithms come under a class of artificial intelligence called “deep neural networks.” “These have been a major part of the game-changing technology behind self-driving cars and Go-playing bots that surpass human performance,” says Satish Palaniappan. The deep neural network mimics the working of the mammalian visual cortex, known as convolutional neural network (CNN), which breaks the field into overlapping regions. The features found in each region are hierarchically combined by the network to build a composite understanding of the whole picture.

The process consists of three phases: In the first phase, the input images are broken into sub-images that contain graphemes only, by trimming out the areas that do not have graphemes. The grapheme-containing areas are further trimmed into single-grapheme pieces. Lastly, each of these single graphemes is classified to match one of the 417 symbols discovered so far in the Indus script.

Indus script

The Indus valley script is much older than the Prakrit and Tamil-Brahmi scripts. However, unlike the latter two, it has not yet been deciphered because a bilingual text has not yet been found.

A bilingual text has in many other cases aided archaeologists in understanding ancient scripts, for example, the Rosetta stone. This stone which was found in the eighteenth century carries inscriptions of a decree, issued in 196 BCE, in three parts, the first two in ancient Egyptian hieroglyphic and the Demotic scripts, while the bottom is in Ancient Greek. Since the decree was the same, the Rosetta stone provided the key to deciphering Hieroglyphs. For the lack of such a “Rosetta stone,” the Indus script remains undeciphered today.

It is a major effort to even build a standard corpus of the language and decode the writing on existing artefacts and map them to this standard corpus. The most widely accepted corpora of Indus scripts was brought together by the efforts of Iravatham Mahadevan, noted Indian epigraphist, from the 3,700 texts and 417 unique signs collected so far.

When asked about the relevance of this work, Dr Mahadevan says, “It [the algorithm] represents a significant advance in the computerised study of the Indus Script. I wish I had this software 40 years ago when I compiled the Indus concordance.”

Chennai team taps AI to read Indus Script

The algorithm uses ‘deep neural networks’ which are also used in self-driving cars

Indus script

Related Topics