Datadelve

Rape statistics: a number-crunching story

I’d like to start by thanking readers for the enormous response that we got to my three-part investigative report on the stories behind Delhi’s rape statistics. If you missed them, Part 1 summarising what my study of district court judgements showed is >here, Part 2 on the stories behind the statistics is >here, and Part 3 on the long journey from FIR to judgement is >here.



I’ve had some good discussions on Twitter with people about the findings, and would also be happy to respond to any questions that you leave in the comment section. In general, I found that people understood the numbers quite clearly and I usually try to stay away from giving my opinion. But I’ve had some questions on methodology that I’ll be happy to answer in this post.



With our excellent internet team, we some times use cool tools for >our data journalism here at The Hindu, whether scraping tools to extract data from the Election Commission on counting day, or visualisation tools to make pretty, interactive infographics. This was not one of those time, however.



Delhi is better than most Indian cities for legal data journalism because it puts all district court judgements >online - and promptly – and these can be text-searched. Ideally, I should have been able to scrape all judgements for ‘376’, the IPC section related to rape. However, I encountered a ton of issues that would have rendered a scraping tool useless (as far as I know – if you think there was a way I could have done it, do leave me a comment).



For one, while rape cases are sessions-triable, and so should show up as ‘sessions case” in the nomenclature, for some judges the cases were inexplicably classified as “criminal cases”. Then, while a simple text-search for ‘376’ should have been enough to get me all cases, the text-search function inexplicably collapsed around March 2014. With elections coming up, I had limited time to work on this and had to essentially open every single sessions court judgement and search for ‘376’ in each one. Luckily, the search function revived after two months.



More importantly, this wasn’t an exclusively number-crunching story. Since the data had quantitative and qualitative elements to it, I did need to read every judgement, and occasionally ask my court contacts for extra documents. I repeatedly ran my observations past judges and court administrative staff to make sure I wasn’t misreading something vastly.



I created a basic spreadsheet that I updated with every case; it had nine columns – date of alleged offence, date of judgement, judge name, court name and court room number, brief description of prosecution and defence arguments, conviction or acquittal, reasons for decision and two columns of notes including observations made by the judge, the use of medical evidence and interesting details about the complainant and accused. In addition, I made a note of ages, lengths of sentence, whether the complainant appeared to be poor a note and whether the couple was inter-caste and/ or inter-religious, whenever this data was available.



As I mentioned in the articles, I also interviewed judges, cops, prosecutors, lawyers, women’s rights activists and some of the young men and women involved in these cases. They gave me insights that the numbers couldn’t, but the numbers gave me, for the first time, some idea of the contours of the issue. I hope they did the same for our readers.

This article is closed for comments.
Please Email the Editor

Printable version | Nov 28, 2020 11:13:25 AM | https://www.thehindu.com/opinion/blogs/blog-datadelve/article6327818.ece

Next Story