Beyond the statistical soundbites: why data matter

We seem fascinated with statistical soundbites, vacillating between the bizarre and the expedient. In October 2021, major newspapers featured stories that claimed that over 100 million Indians owned cryptocurrencies. This put Indians at the top of the global cryptocurrency such as Bitcoin, Dogecoin, etc. Strangely these stories exhibited no scepticism.

When Coin Crunch, a cryptocurrency news magazine, dug deeper into the source of this data, they discovered how hollow these claims were. They traced the source to data from market research, where 2,000 to 12,000 in each country were asked to complete online surveys. Generalising from Internet survey respondents to the Indian population takes quite a stretch of the imagination.

However, these outlandish statistics are not the only ones that get uncritical attention. On the release of the factsheets from the fifth round of the National Family Health Survey (NFHS-5), conducted in 2019-20, some headlines focused on increasing Severe Acute Malnutrition (SAM) in India. Between 2015-16 and 2019-21, children who are too thin for their height, identified as suffering from severe wasting, called SAM, increased from 7.5% of the population to 7.7%, although stunting (low height for age) decreased from 38.4% to 35.5%. This slight increase in SAM would be a cause of concern since these children are most at risk for nutritional failure.

Differences in methodology

However, both in the press and in the presentation of the data by researchers, differences in methodology between NFHS-4 and NFHS-5 received little attention. A paper published in the journal Plos One by Robert Johnston and others compared NFHS-3 and NFHS-4 as well as several other nutrition surveys and found that due to diarrhoea and other diseases, children are thinner in interviews conducted during the monsoon season. Studies in other countries have made similar observations. My calculations suggest that while only 12% of the NFHS-4 surveys were conducted between July and October, 40% of the NFHS-5 surveys were conducted over these months due to pandemic-related fieldwork restructuring. When we compare SAM for children surveyed during monsoon, and outside of monsoon months, we find that for each period, the prevalence of SAM was slightly lower for NFHS-5 than for NFHS-4 (7.3% in NFHS-4 vs 7% in NFHS-5 outside the monsoon period; and 8.9% vs 8.6% during monsoon interviews). This change is only minor in magnitude, but the difference between a slight increase in malnutrition and a slight decrease changes the tone of the discourse.

These challenges are not unique to India. For over a decade, the popular narrative about maternal mortality in the U.S. suggested that while globally maternal mortality was declining, in the U.S. between 2000 and 2014, it increased by 26%. It was not until Marion MacDorman and her colleagues at the National Center for Health Statistics (NCHS) carefully analysed how maternal mortality statistics were collected that a different conclusion emerged. The NCHS studies found a decrease rather than an increase in the maternal mortality rate over time. The apparent increase was entirely due to how the maternal mortality data were collected.

These examples highlight the challenges researchers, journalists, the informed public and policymakers face. We live in a world where data collectors and researchers are expected to provide data in a rapid cycle with little time to interpret the results and explore anomalies. Media rely on the data presented to them to file stories with statistical soundbites, often even uncritically accepting information provided by market research firms commissioned by industry bodies with a vested interest. In some cases, a political predisposition allows some data to be accepted uncritically while others are scrutinised extensively.

What is a way out of this conundrum? How can we build sensible public discourse and rational, evidence-informed policy design? It will require a substantial redesign of our data and evidence infrastructure with a troika of improved data collection, interpretation and reporting infrastructure.

Independent oversight

While collecting official statistics will always be a purview of state institutions, they must be governed by independent governing bodies that can ensure scientific integrity and broad oversight. The term of the National Statistical Commission (NSC) expired a few months ago. Reappointing the NSC is urgently needed. Moreover, it is also essential that publicly funded but independent data collection also find space in our statistical infrastructure. Consistent experimentation is required in a rapidly changing society and growing technical infrastructure for data collection. In most countries, publicly funded experiments in data collection are carried out by universities or research institutions.

Data collectors must develop the capacity to interpret their data carefully and responsibly. Users often do not know the sampling strategy or minute operational details of data collection. Hence, data collectors must help interpret the data they collect and provide good documentation to users. Today, the National Statistical Office has no data analytical wing. National Family Health Survey reports are simple tabulations without any information about standard errors or attempts at interpretation. These institutions must be strengthened and fully funded to provide data quality analysis and explore their results’ implications.

Researchers and journalists must develop the self-discipline to use and report only reliable data and be cautious of for-profit institutions with a vested interest in providing statistics and reports. The cryptocurrency example above is sobering but not the only example. Lancet, one of the most reputable journals, was forced to withdraw papers based on Hydroxychloroquine trials because the for-profit company that supplied the data was unwilling to share it for verification. Academic publishing and deadline-driven journalistic pressures must be balanced with the responsibility of not misleading the public discourse with inadequately documented information that is either unavailable for verification or is so expensive that it is effectively out of reach of most researchers.

Most importantly, we must develop professional ethics that demand sincere efforts at collecting, interpreting and reporting evidence and institutional infrastructure and public funding that makes this arduous task feasible. As any self-aware data collector and researcher will acknowledge, errors will occur even with the best efforts. Data collectors are not perfect and statistical techniques continue to evolve. However, unless we put thoughtful processes in place for evidence required to support sound policy design, we have no hopes of minimising misdirection.

(Sonalde Desai is Professor and Director of NCAER National Data Innovation Centre and Distinguished University Professor at the University of Maryland. Views are personal.)

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.

Beyond the statistical soundbites: why data matter

While collecting official statistics will always be a purview of state institutions, they must be governed by independent governing bodies that can ensure scientific integrity and broad oversight

Differences in methodology

Independent oversight

Related Topics

Top News Today

Comments