Notebook | Comment

# Joining the scattered dots

A scatterplot on the number of COVID-19-related deaths in countries with over 2 lakh infections against the case fatality rate (CFR, deaths/cases) as of July 5, 2020.

A letter writer, who claimed to be a reader for four decades, once wrote to The Hindu Data Team, appreciating our efforts in making “interesting charts and maps.” But he also added that he was miffed at our use of what he called, “the circle graphs.” These had become repetitive, he complained.

The circles were small, the text was tiny, he said adding that he was “not interested to zoom in and see every single State of the country”.

Also read | Making data speak

Sometime later, I had a conversation with my 88-year-old grandfather, who has been reading the newspaper “end-to-end, since 1953”. I asked him whether he understood the charts we use in our data stories. His prompt response: “I skip your graphs. Especially those ‘dotted ones’. I only look for your byline and move on.”

Both these inveterate readers were referring to the “scatterplot”, arguably the most controversial visualisation technique that we, data journalists, deploy in our “Data Points” quite regularly and at times invoke the ire of other readers and even our colleagues for doing so. A colleague terms it the “crime scene graph which features blood-red dots splattered across the page”.

The reason why we deploy the ‘scatterplot’ despite the adverse feedback is simple. When we need to depict multiple data variables and the relationship between them, the choice is limited. A scatterplot helps chart those variables on a Cartesian coordinate system and allows us to study relationships.

Let us assume 10,000 people visit a beach every day. We need a graph which dissects the visitors based on their age, gender and find out their preferred visiting hours. Imagine each visitor as a small dot. So there are 10,000 dots in total. Blue dots depict men, red denotes women.

Now imagine a horizontal line which depicts the time of the day, starting at 4 a.m. in the leftmost point to 11 p.m. in the right most point. Imagine a vertical line which depicts the age of the visitor — starting at age 10 in the bottom most point to age 100 in the top most point.

## Many key insights

Now, start spreading the dots along both the lines. The dots near the top of the vertical line are elders, most of them are also on the left side of the horizontal line, which means most elders visit in the morning. The dots near the bottom of the vertical line are teens, most are also on the right side of the horizontal line, which means many teens visit in the evening. Not many red dots are on the right side, are women leaving sooner due to security concerns? Look at the few dots in the middle of the chart. Who are these middle-aged men visiting the beach in the afternoon?

One chart helps us generate the above insights. This depiction is not possible using simple bar charts or line graphs as they tend to convey only limited information even if they are easy to understand.

The scatterplot requires the reader to be a bit more involved in trying to grasp the information and examine it. Once the reader does so, the graph enables her to derive more insights than what the banal and simple graphs manage to do.

The statistician Edward Tufte, who has written several tracts on visualising data and information, estimates that 70% of all graphs in scientific journals are “scatterplots”. This is understandable as they are most effective in not just conveying information but deriving insights.

I request our readers to not to dismiss these charts even before they have a chance to parse them. Once they behold them, they will see the beauty of data.

Recommended for you