Whenever I think about ‘Data’, I think of Brent Spiner. The android of Star Trek was self-aware, sapient, sentient, and had striven for his own humanity. Today, ‘Data’ is already ‘Big’ and ever-expanding and has the potential to influence every bit of human lifestyle. However, “There is terror in numbers,” as Darrell Huff wrote in How to Lie with Statistics . The task of statisticians is to churn the data and obtain summary measures, diagrams and figures, rankings and indices, and make conclusions. Is this the much-desired ‘human chip’ to make ‘Data’ human?
Proper understanding of statistics
In reality, statisticians are often like the blind men of the parable, standing in front of an elephant. And inadequate or partial analysis of data may lead to an incorrect portrayal of the elephant. As H.G. Wells is known to have said: “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” Yes, understanding the meaning of statistical and probabilistic conclusions is very important. This was exemplified by the case of Stephen Jay Gould who explained how the statistic that peritoneal mesothelioma, the form of cancer with which he was diagnosed, has a “median survival time of eight months” is misleading given the distribution of that data, and relevant data regarding his individual prognosis. Gould showed a positive outlook to beat the odds. Some of the fighting spirit, he proposed, was the result of his proper understanding of statistics. For once, he argued, statistics manifested itself as a source of optimism, rather than the sterile methodology that most people associate with the term.
Misleading statistics maybe produced due to limitations of the concerned statisticians, or it may even be deliberate, or both. “Misinforming people by the use of statistical material might be called statistical manipulation,” Huff wrote. Huff pointed out seven common tactics to knead statistical data into ‘dough’, which include polling a non-representative group, small sample sizes, and averaging values across non-uniform populations. Huff illustrated how statistical graphs could be used to distort reality. If the bottom of a line or bar chart is truncated, differences look larger than they are. Also, the proportion between the ordinate and the abscissa is sometimes changed for this purpose. With the help of several real examples, Huff also discussed the ‘post-hoc fallacy’, which incorrectly asserts a direct correlation between two findings. In his 2001 book, Damned Lies & Statistics , Joel Best also used fascinating examples from leading newspapers and television programmes to unravel the use, misuse, and abuse of statistical information.
The goal of statistics is to search for ‘truth’ amid the randomness of nature. “Uncertain knowledge + Knowledge of the amount of uncertainty in it = Usable knowledge,” wrote C.R. Rao in his book Statistics and Truth: Putting Chance to Work . Prof. Rao discussed how statistics can be used to judge whether a newly discovered poem is composed by Shakespeare or to mix blood samples from different persons together to test for certain rare diseases to reduce the number of tests.
Need for innovation
Churning for truth from the ocean of data sometimes demands finer statistical expertise. It also needs innovation. During the communal riots in Delhi after Independence, many people from a minority community took refuge in Red Fort, and some in Humanyun Tomb. The government had no exact count of the refugees, and contractors responsible to feed them charged high amounts. A team from the Indian Statistical Institute was asked to estimate the number. They estimated the number of persons inside a given area without having an opportunity to look at the concentrations of persons inside the area and without using any known sampling techniques for estimation or census methods. In fact, based on an idea suggested by J.M. Sengupta, they divided the quantities of rice, pulses, and salt used per day to feed all the refugees, as quoted by the contractors, by the respective per capita requirements of rice, pulses, and salt known from consumption surveys, and got three widely different estimates of the number of refugees. The estimate obtained by salt was the smallest and the estimate from the rice was the largest. As rice was the most expensive, its quantity was probably exaggerated. They proposed the quantity obtained from salt as an estimate of the number of refugees. The method was verified to provide a good approximation in the Humayun tomb.
The lesson is clear. In order to extract ‘truths’ by using statistics, one needs expertise and innovation from the concerned statisticians. Ideal statistical thinking and proper understanding of statistics of the common people, of course, is no less important. A pinch of salt is needed, indeed.
Atanu Biswas is Professor of Statistics, Indian Statistical Institute, Kolkata