Big Data is the big buzzword these days but it would have remained an empty promise without the tools to analyse large pools of data quickly and efficiently. The challenge of Big Data lies in collecting, handling, storing, processing and post processing of data sets so large and diverse (such as social media data) which conventional methods and tools just cannot do, either quickly or efficiently.
One tool that has made it possible to slice and dice large quantities of data meaningfully is R. It has quickly become the tool of first resort in analytics which is all about analysing Big Data.
R is known for its ‘right at home’ demeanour when it comes to Advanced Analytics. The technicians in the analytics business find R attractive because it does not ask for a wad of cash; in fact, it asks for nothing at all. The R programming language is an open source project, which has a very large official community (CRAN) and many other unofficial communities (such as Stack Overflow) supporting its development and user base.
The evolution of R
R rose to popularity only recently, evolving from popular statistical tool S, which was developed at Bell Labs (formerly AT&T, now Lucent Technologies), making its first appearance in 1976. R was born in 1993 at the University of Auckland, developed by Ross Ihaka and Robert Gentleman. The main motive was to take the statistical abilities of S to the open source community.
The move to make R open source paved the way for the symbiotic relationship between its users and its developers. R, today, has by far the most number of add-on packages and brings to the table statistical and graphical techniques still lacking in many other statistical tools. The idea that R is free makes for good motivation for the average ‘I just want to do this tiny thing but don’t have the tool readily available’ user to develop his own package. The ability to handle time series analysis and clustering are not the only major selling points, but its ability to customise them to handle personal needs certainly is.
R was easily accepted into the data sciences field because of its power, customisability and flexibility. It is primarily a tool that aids the science of decision-making, which is why it can be used almost everywhere. It is as appealing to the manager at the local retail store who is looking to boost sales as it is to the scientist at the Johns Hopkins University plotting a map of survival rates of patients across different hospitals in the U.S.
R only has a command line interface, which requires the user to key in commands rather than using a mouse to point and click. This may seem old-fashioned, but make no mistake, R, when compared to popular spreadsheet handlers, is virtually a spreadsheet handler on steroids
R is now part of the essential toolkit in analytics. It is being used in clinical trial design, in models analysing and predicting climate change, in statistical genetics, in psychometric models and in any area where large pools of complex data need to harnessed in predictive models.
At first glance, yes, R may appear intimidating because of its DOS-like user interface but learning to use the program for your own specific purpose is not as challenging. Many guides for those new to R can be easily found on the Web, Quick-R being one of them. There are also solutions for those who complain about the lack of a refined user interface. They can turn to R-Studio or RKWard. Indeed, few industries can afford to ignore the power of R.
KARTHIK S. KUMAR