All reporters harbour dream projects. But many don’t get to execute them due to reasons such as lack of access, absence of critical information, and tight-lipped sources.

The Hindu Data Team also has many such dream projects. For instance, for a long time, we wanted to access the India Meteorological Department (IMD)’s gridded weather data, but we did not have the coding skills to do this. Two months ago, we struck gold when we found a Python library which required minimal coding and helped us access the IMD data by downloading it as a comma-separated values file. Having access to this data gave us at least three stories. In one story, we found that Kullu and Chandigarh recorded their wettest-ever days, in July this year. This helped readers understand through data that the rainfall in north India this year was truly unprecedented.

Around the same time, an intern who had joined the team said she knew Machine Learning (ML) and was keen to use it to tell stories. The data team wanted to test the possibility of getting another long-forgotten story done through ML. We wanted to measure Muslim representation over the years in the Lok Sabha. As there is no official record of the religious identities of MPs, we considered using the names of the MPs to identify their religion. But the task of manually classifying over 500 MPs across all the Lok Sabhas sounded laborious, and we kept postponing the exercise.

When we finally got to it, we first looked at Muslim names on websites that listed the names of Indian babies. The idea was to use that list as a base and identify possible Muslim MPs. However, this idea backfired as Asaduddin Owaisi’s name did not figure on any of these lists. We abandoned the exercise.

The intern claimed that she could train the machine to predict the religious identities of MPs. To identify Muslim MPs, character-based ML models developed by Rachana and Sugat Chaturvedi in their 2023 paper, ‘It’s all in the name: A character-based approach to infer religion’, were implemented. The ML algorithm accurately predicted Mr. Owaisi’s religious identity. However, it claimed that former Jammu and Kashmir MP Mehbooba Mufti was not a Muslim. The model missed the name of Abdul Rahman Antulay, the former Chief Minister of Maharashtra who served as an MP in the 14th Lok Sabha, as it appeared as A.R. Antulay in the Lok Sabha records. We faced the same issue with S.S. Samadali, another former MP. False positives too crept up. An MP whose name was Khan was not a Muslim, but the ML model identified him as one. And so, despite employing ML, we were back to square one.

However, given that we had already put in a lot of effort into this story, we were not ready to give up. Three interns manually checked most of the potential false positives and false negatives and made corrections. We also called up our reporters in each State to confirm whether the number of Muslim MPs corresponded with what they knew anecdotally or could confirm using sources.

The story, ‘No Muslim MPs even in States with high Muslim presence,’ published on July 6, was mostly a product of old-fashioned journalistic practices: gathering data manually, checking with sources, writing a story based on known facts, and keeping readers informed of what they may not know. ML gave us the much-needed push.

By the end of this exhaustive exercise, what became clear to us was that tools cannot replace journalists. It is true that journalists who use these tools have an advantage, but the fear that Artificial Intelligence and ML could replace journalists is exaggerated, at least as of now.

