It’s good that we’re asking questions about opinion polls. The problem is we’re asking the wrong questions.

Every time I write about a study that involves a sample survey, I can count on at least one comment that is some variation of the following: “So you think if someone talks to 2,000/ 20,000, 200,000 people, you can make conclusions about the whole city/ state/ country? As they say, lies, damn lies and statistics…”

One of the reasons I get this comment is because I try to make sure I mention sample sizes. But people pay such little attention to the actual math of surveying that you end up getting more scepticism with full disclosure than you do with none at all.

Which brings me to the larger problems with opinion polls in India, and they’re not the reactionary ones the Congress has. (For the purpose of this post, opinion polls will refer only to those that are used for election forecasting.) Why are they so hard to get right?

The first issue is sampling. Despite what the popular perception is, the real issue is the dispersion of the sample rather than the size of it. As the CSDS’s Sanjay Kumar explains it: “We tell people – to know that rice is cooked, you don’t have to try every grain of rice.” If you interviewed 1,000 people in a 5,000-strong village but they were all upper caste, you’d get results that would surely point you in the wrong direction. For a sample to be truly representative in India takes a lot of effort because the most accessible people tend to be richer, male and more empowered. CSDS essentially does a census of a particular village and then draws a representative sample from it, they told me. Then, you need to decide how many villages and towns in a particular geographical area you want to cover.

The second is the survey methodology. Since poll surveys are almost exclusively commissioned by the media, cost is a severe constraint. Some pollsters make a trade-off by reducing the number of constituencies they sample but conducting all their interviews in person, others keep the number high but turn to phones. In the U.S., polling is conducted so often thanks to phones that the basic sampling frame is in place – it’s not a far stretch to say that the U.S. doesn’t really need to vote any more, given how extraordinarily accurate polling has got.

The quality of the questionnaire and whether enumerators are well-trained or not is another incredibly important factor. (For instance, India’s former chief statistician has said that insufficiently trained enumerators explain to a large extent the fall in female work participation numbers.)

The third is the conversion of vote-share into seats. I have some sympathy here for Indian pollsters because the more you think about, the more freakishly difficult it begins to appear. Essentially, based on the proportion of people in an area who say they will vote for a particular party, pollsters convert this into the number of seats that party is likely to win, based for the large part on what the conversion factor was during the last election. They also need to factor in new parties or breakaways and alliances.

All pollsters say they use a proprietary mathematical model to make this conversion, which includes geography and community-based ‘swing’ factors. All pollsters tell me, off the record, that they use some “intangibles” to moderate the final result. What this means is that if you or I were to have access to their mathematical models, we would not get exactly the same result as them because of that final element of guesswork (or what is called “hawa” in heartland politics). This hawa is also how some pollsters occasionally pull off the stupefying feat of getting the vote-share wrong and the seat count right.

Rather than ban or regulate polls, what I think is necessary is full disclosure. CSDS is particularly good at this, and other polling agencies too have for the most part been transparent when I’ve asked them for information. The next step is the harder one – factoring those statistical disclosures into public conversation even if that makes for less exciting news.

One important number that gets far too little airtime is the margin of error. Most polls come with a +/-3-5% margin of error on vote-share. Since the difference in the vote-shares of parties is often just 1-2%, calling a state for one party or the other in such a situation can be foolhardy.

I’ll leave you with a graph shared with me by CSDS’s Sanjay Kumar, which shows just how difficult a pollster’s job is in India. It's’ an unenviable task, even without the ruling party trying to ban you.

The Hindu