Data is all around us; on the Web, within the enterprise, in closed and open academic communities, across a plethora of fields ranging from news and commerce to medical research and government records.
While the explosion of data has meant that we have access to all sorts of information, all just a click away, courtesy search engines, ‘big data' — popular tech parlance for large and tough-to-manage data sets — provides another opportunity: one that large companies and research organisations are investing hugely in. This opportunity lies in analytics, a rapidly evolving field where sophisticated algorithms and technologies are used to make sense of these large data troves.
This data could be anywhere: on social media where the buzz can be tracked, and in the enterprise, where data, once considered as mere records, can be interpreted to provide perspectives that can aid decision-making.
What analytics does is convert huge amounts of raw data into insights or information. Even enterprise data, which was once merely considered as logs or records of processes, is now seen as a valuable asset. Among the earliest to recognise the potential of data, of course, were the search engines — that offer us the service for free, only because of the huge revenue that is generated out of interpreting patterns that emerge out of our usage.
Text mining, in some ways, is converse to search; while we search for a bit of information using a key-word from a large data set, text analytics involves plumbing through the entire data set to generate keywords.
These key words, that are generated using sophisticated algorithms, in turn help deduce patterns and trends, in other words generate useful information out of it.
Text analytics, is a sub-set of data mining, where advanced algorithms using linguistic, statistical and machine learning techniques are used to process structured data (where information is largely indexed) and unstructured data (where there is no database as such). The recent explosion in unstructured content, over the past decade, has meant that algorithms used to draw insights out of this data has to be more advanced, and more cutting-edge, says Shantanu Gudihal, co-founder of Meshlabs, a Bangalore-based start-up that develops text analytics software products to solve information management and customer experience problems and generate business intelligence.
“Processing text in various natural languages requires sophisticated linguistic, statistic, and semantic computational approaches that are not available in standard business intelligence or analytical tools. That is why business analysts need text mining and analytics software to help them discover these insights.”
So far, processing and understanding text written in natural language has been left only to humans, which is limited. Mr. Gudihal points out that the applications of text mining are many, and varied. For example, publishers can bring new content-derived products to the market, insurance companies can mine their adjuster notes and customer statements to detect fraud and mitigate risks, contact centres can integrate intelligent agents such as question-and-answer systems into their service platforms, knowledge-centric professional services firms can transform productivity of their employees by giving them better information discovery tools, and so on.
While Natural Language Processing (NLP) is indeed the starting point of analytics, allowing the software to get a handle on the linguistic aspects of the text — such as parts of speech tagging or annotation — you need to bring in various supporting techniques such as lexicons, taxonomies, ontologies and other forms of knowledge representations to correctly understand the context and underlying meaning, he says.
Finally, you need to store and index information extracted in various data structures such as relational databases, triple stores, etc. Only then, the extracted information lends itself to intelligent analytics.
Still complex, costly
While there is a lot of attention on this field, particularly since ‘big data' has meant that the sample sets are larger and, therefore, more interesting, analytics is still a costly and complex process. Mr. Gudihal says his company is working on solutions that make the interface less ‘geeky' and easier to handle, and more cost-effective so it is accessible to smaller or mid-size enterprise players.
What we are doing is creating a simple platform, that uses simple listening tools to offer data to enterprises in simple data sets that help you connect the dots and make sense out of, he explains. Global studies have found that software and service text-analytics revenues now total $835 million, globally. And given that 80 per cent of data in organisations is unstructured, there is a lot of potential in the business-to-business context that remains untapped.
In India, Mr. Gudihal says, text analytics adoption among knowledge-centric and analytically driven companies definitely exceeds others.
Professional services firms such as IT/KPO/ITeS companies and market research firms are currently at the forefront.
However, he is optimistic that as more organisations begin to see value in data, publishers and media houses, and financial services firms, will jump on to the bandwagon.