Metadata, when analysed and interpreted, can reveal more about users than plain data
When people use digital systems, they leave behind a ‘trail’ or ‘footprint’ of their activity. Just like footprints in the sand, it can be followed back to them. But, it turns out, this ‘digital footprint’ paints a more detailed and specific picture.
When users interact with computers or other connected devices, the protocols designed to facilitate these interactions result in substantial amount of information gathering, almost automatically. This comprehensive information of user-system interactions is generated in the form of computer logs. This data about data is called ‘metadata’ — which is at the centre of controversy following revelations of the U.S. National Security Agency’s surveillance programme of phone and Internet data.
When users interact with a digital system such as computer or smartphone, this activity and the changes made in the system are logged. For instance, when a user sends an email, the mail server logs details of the sender and recipient, time sent, location (in the form of an IP address), the mail client and acknowledgement after it has been read.
This metadata is collected by programmes as a definitive feature, dictated by protocols, and is useful for debugging purposes or to identify changes happening in the system.
Over a period of time, with some additional analytic intelligence, the mail server could even suggest the probable recipients of the email. The auto-prompt of recipients in Gmail is a manifestation of this interpretation of the metadata collected.
In fact, the metadata can reveal more details about users than plain data itself. Eben Moglen of the Software Freedom Law Center in a talk, explains this with the example of Facebook. “Facebook workers know who’s about to have a love affair before the people [get into a relationship] because they can see X obsessively checking the Facebook page of Y,” he says. So, it is not that X and Y are having amorous conversations that are being intercepted by Facebook, but the activity logs, or the metadata generated by X’s interaction with Y’s profile allow this implication to be drawn.
The ‘close friends’ prompt in Facebook is also a feature depending heavily on metadata interpretation.
NSA and data
Internet giants Google, Facebook, Microsoft, Yahoo and others who have been named in the NSA snooping project Prism, have one thing in common — they all possess enormous amount of data, which is analysed and interpreted.
Dealing with metadata of millions of users, amounting to zillions of bytes of data — big data — is the latest technological problem.
In attempt to build more personalised services, these companies are attempting to mine metadata. Through this, a detailed characterisation of user activity, social relations and economic interests can be formulated, and this characterisation is only getting better with every passing day.
An agency such as NSA only need to get access to the big data in possession of these companies, as has happened with Prism.
In his interview to The Guardian, Edward Snowden, the man who leaked details about the Prism programme, explains how NSA could use this metadata interpretation system in cornering any user. “You simply have to eventually fall under suspicion from somebody even by a wrong call. And then they can use this system to go back in time and scrutinise every decision you’ve ever made, every friend you’ve ever discussed something with, and attack you on that basis … and paint anyone in the context of a wrongdoer.”
Metadata is secondary information. But with comprehensive logging and traffic analysis, this data log can be turned into the life log of every user.
A few days after the NSA surveillance programme was revealed, activists who have been popularising the use of cryptography put together a list of primarily Free and Open Source Software tools and alternatives to the products from the companies part of the NSA programme.
This list is available on the website http://prism-break.org
These tools, to a great extent, serve as an antidote to the surveillance threat and the related risks, and reduce the risk of leaving behind a metadata trail.
Metadata hitherto has been plain text information. After the recent concerns, another crucial enhancement that is being proposed to encrypt metadata. Marsh Ray, a renowned security expert working for Microsoft, has proposed SSL-based encryption for metadata at the IETF (Internet Engineering Task Force), the community that develops and promotes Internet standards. “The goal is to ‘encrypt all the things’, even pre-authentication, to resist passive observers,” he tweeted on Saturday.