The proliferation of technology over the past decade has been mind-blowing. What was unimaginable and sounded like sci-fi in the last century has become common place today. Whether it is the neighbourhood street vendor making deals over his mobile phone or the traffic policeman using his BlackBerry to book a case, the world today has become more instrumented, more connected and smarter, albeit a little haphazardly. Accessibility and affordability has revolutionised how technology is made available at our fingertips.
Another new phenomenon in the last decade is the strong emergence of the social web. Websites such as Facebook and Twitter have moved from being online social networking platforms to a more serious media/business platform. So not only are sources of data is increasing, but there is also an explosion of data being generated and captured. Eighty per cent of data in the world today has been generated over the last two years. And all of this data is not always something that you can store in your regular database. There is free text, audio, pictures, video, RFID, instrument signals. And some of this data streams in continuously. All of the above describes the aspects of Big Data or ‘Internet-scale’ — Volume, Variety and Velocity.
Plethora of technologies
So what happens behind the scenes when you key in and search for a term? How is it that your favourite networking site is able to find long-lost childhood friends, and contacts from previous companies and suggest that you add them? Again here is a plethora of technologies at work.
At the very core, usually you will find the LAMP stack (Linux, Apache, MySQL and PHP). Notable and most frequently referenced is Apache Hadoop (which basically uses Map Reduce framework where large tasks are broken down into smaller chunks and intermediary results collated to get the final result). Then there are a host of surrounding ecosystem technologies such as Hive and Pig to handle query flow; No SQL data stores like Cassandra, MongoDB, HBase to store the new kind of unconventional (graph, unstructured) data that has emerged; coordination-based components such as Oozie, Zookeeper, and finally analytical components — Real Time Analytics using Storm, Statistical tools like R, Machine Learning using Mahout etc. Enhancing and strengthening open source offerings are various Big Data Vendors in the market that offer enterprise-ready services, solutions and advanced and efficient analytical platforms.
With the emergence of new kinds of data and new sources of data, there has been a paradigm shift in analysis of data. Instead of the traditional approach of end-business analysis requirements driving data collection, now data drives new insights and therefore new strategies for an organisation.
Consider the case of a company launching a new product like a new brand of a phone and the company has done campaigns and advertisements. It would now like to get to know the response to the phone. What people are saying about it, what feature did they like? Or what feature irked them enough to post about it online? How many people responded to that post? What is the circle of influence of the post? All of these questions — broadly classified as Sentiment Analysis — can be obtained by crawling the web; the various forums, posts, blogs and tweets; loading them into an analytics platform; cleaning up the unnecessary, irrelevant data; perform Text Analytics on the data, and finally classify as positive or negative sentiment. Based on the feedback, the company can plan out its future campaigns.
According to Gartner’s reports, Big Data investments in 2013 continue to rise with 64 per cent of organisations investing or planning to invest. Big Data is here to stay and it has opened doors to a whole new world of business opportunities to derive meaningful insights never thought possible before.
(The author is advisory software engineer, IBM)