A blog that explores happenings in the realm of data and provides insights into the world we live in. Ultimately, people matter, not the numbers.
August 18, 2014 Rukmini S.
Comment   ·   print

How we put together the statistics that went into our investigation

I’d like to start by thanking readers for the enormous response that we got to my three-part investigative report on the stories behind Delhi’s rape statistics. If you missed them, Part 1 summarising what my study of district court judgements showed is here, Part 2 on the stories behind the statistics is here, and Part 3 on the long journey from FIR to judgement is here.

I’ve had some good discussions on Twitter with people about the findings, and would also be happy to respond to any questions that you leave in the comment section. In general, I found that people understood the numbers quite clearly and I usually try to stay away from giving my opinion. But I’ve had some questions on methodology that I’ll be happy to answer in this post.

With our excellent internet team, we some times use cool tools for our data journalism here at The Hindu, whether scraping tools to extract data from the Election Commission on counting day, or visualisation tools to make pretty, interactive infographics. This was not one of those time, however.

Delhi is better than most Indian cities for legal data journalism because it puts all district court judgements online - and promptly – and these can be text-searched. Ideally, I should have been able to scrape all judgements for ‘376’, the IPC section related to rape. However, I encountered a ton of issues that would have rendered a scraping tool useless (as far as I know – if you think there was a way I could have done it, do leave me a comment).

For one, while rape cases are sessions-triable, and so should show up as ‘sessions case” in the nomenclature, for some judges the cases were inexplicably classified as “criminal cases”. Then, while a simple text-search for ‘376’ should have been enough to get me all cases, the text-search function inexplicably collapsed around March 2014. With elections coming up, I had limited time to work on this and had to essentially open every single sessions court judgement and search for ‘376’ in each one. Luckily, the search function revived after two months.

More importantly, this wasn’t an exclusively number-crunching story. Since the data had quantitative and qualitative elements to it, I did need to read every judgement, and occasionally ask my court contacts for extra documents. I repeatedly ran my observations past judges and court administrative staff to make sure I wasn’t misreading something vastly.

I created a basic spreadsheet that I updated with every case; it had nine columns – date of alleged offence, date of judgement, judge name, court name and court room number, brief description of prosecution and defence arguments, conviction or acquittal, reasons for decision and two columns of notes including observations made by the judge, the use of medical evidence and interesting details about the complainant and accused. In addition, I made a note of ages, lengths of sentence, whether the complainant appeared to be poor a note and whether the couple was inter-caste and/ or inter-religious, whenever this data was available.

As I mentioned in the articles, I also interviewed judges, cops, prosecutors, lawyers, women’s rights activists and some of the young men and women involved in these cases. They gave me insights that the numbers couldn’t, but the numbers gave me, for the first time, some idea of the contours of the issue. I hope they did the same for our readers.

August 2, 2014 Ajai Sreevatsan
Comment   ·   print

Keywords: DelhirapesGDELT project

July 29, 2014 RUKMINI S
Comment   ·   print
A new NYT app helps you track the popularity of news subjects in its pages »

Keywords: NYTchronicle

July 18, 2014 T. Ramachandran
Comment   ·   print
Yes, says an aircraft tracking site. »

Keywords: Air IndiaMH17

July 14, 2014 T. Ramachandran
Comment   ·   print
Movement to Iraq increased during 2013-14; there has been an overall increase among those going abroad with 'emigration check required' (ECR) clearance between 2011-12 and 2013-14. There was a declining trend in the case of some countries, though. »
Pages: «  ‹    ›  »
October 21, 2013

The Archealogical Survey of India (ASI) is in the limelight for the excavation it has controversially launched at the ruins of the fort of Raja Rao Ram Baksh Singh, where a sadhu had prophesied the existence of a treasure of gold. A report of the Comptroller and Auditor General of India tabled recently in Parliament, had spoken of poor conservation and management taking their toll on many of the country's archaeological treasures, including World Heritage sites. Is the ASI getting its priorities right? »

October 12, 2013

Some cyclones originating over the Bay of Bengal have attained the intensity of super cyclones. And they hav claimed lives and caused destruction to property on a large scale, severest among them being the Orissa super cyclone of October 29, 1999. A look at data of the last 50 years for insights into the vagaries of our cyclones.


October 8, 2013

India's efforts to capitalise on the information and communication technology revolution are far from spectacular when it comes to numbers, going by the latest country rankings and associated data released by the International Telecommunication Union. The basis for the ranking is an ICT Development Index (IDI).


September 29, 2013

If the amount donated is below Rs 20,000, parties need not reveal the source of the funds. Three quarters of the funds garnered by them in recent years fall into this category. »

September 23, 2013

India does not rank high even among developing countries in its march towards achieving certain targets, which are anyway globally well beyond reach. »

Pages: «  ‹    ›  »


Please remember to logout of facebook after you complete this session.