The data story

How the data team and the Internet team source and track COVID-19 numbers

A report in this newspaper, “India may have undercounted cases” (June 14), based on a serological survey by the Indian Council of Medical Research (ICMR), has reignited the debate on the accuracy of COVID-19 data. While it is impossible for any news organisation to authenticate numbers put out by official sources, let me assure readers that The Hindu data team follows certain guidelines to document the daily numbers. I say this as some readers have cited different sources to question the figures published by the newspaper. For instance, Zaheeb Ajmal sent us a number of screenshots and PDFs to contest the numbers published by The Hindu. His source is Worldometer. Another reader, Somak Basu, from Trichy was unhappy with the newspaper’s interpretations of doubling time and felt that they showed India in a poor light.

The newspaper sources its data from State governments, the Central government, the World Health Organization (WHO) and respected academic organisations. It tries to avoid data from websites where the data sources are suspect. The Hindu is transparent about its sourcing of data. This does not mean that data furnished by agencies cannot be questioned or subjected to a closer scrutiny.

Tracking data

The Hindu data team uses two trackers for page 1 news and for other COVID-19-related stories. At times, there may be some variations in the total number of cases and deaths as some States (Tripura and Delhi, for example) tend to release their numbers late. The newspaper’s robust reporter and desk network works round the clock to update a master sheet on daily COVID-19 cases across States by sourcing numbers from the health departments of the respective States. Normally data from most States are available by 10.30 p.m. These are the numbers the newspaper uses to update the daily count on page 1. The full data on the master sheet are also used to generate maps and updates for its live page.

Data related to testing/samples, etc. are also gleaned every day by The Hindu data team which maintains a more detailed break-up of State-wise COVID-19 numbers here. The aim of this tracker is to provide the infection and fatality curves of each State and compare the testing and positivity rates of these States on a regular basis.

Why does the data team prefer to use State trackers instead of Central government data? The reason is simple. The State governments release data earlier than the Union Health Ministry. This helps the newspaper stick to deadline. Initially, The Hindu reported both State and Central numbers on page 1. But when the data team realised that the State tracker was stable and, more importantly, available before the print deadline, the newspaper abandoned reporting the Union Health Ministry’s numbers. For page 1 updates, the data team also uses the ICMR’s daily release of the number of total samples tested. This is published along with the State-wise updates on cases and fatalities.

Reliable data

For comparison across countries, the data team uses the ‘COVID-19 Global Cases’ tracker provided by the Center for Systems Science and Engineering at Johns Hopkins University (JHU). The Hindu data team prefers the JHU tracker for many reasons. One, it is one of the earliest trackers that listed worldwide information on cases and deaths. Two, it provides an API interface and hosts the data on an open source GitHub account from where base data can be easily sourced. Three, it relies on data from official sources from the respective countries and lists these data properly. Four, it provides up-to-date data faster than, say, the WHO’s tracker.

Many readers cited the disputed figures from the website,, to contest The Hindu’s headline “India’s COVID tally fifth highest in world” (June 7). On the day of reporting, both the JHU tracker and the WHO showed that Spain had close to 2.4 lakh cases. When India’s tally crossed the mark in JHU’s table, The Hindu team used that for the headline. The JHU’s number matched the official tally of Spain’s Health Ministry, whereas the Worldometer’s tally did not.

Besides stories related to testing, infection rates, doubling time and fatalities, the data team has also looked at the impact of the lockdown on the economy. Each of these stories cited the sources from where the base data was taken, analysed, and reinterpreted.

Jul 4, 2020

