Internet companies are betting big on India, and we’ve known that for some time now. With research reports estimating that the next five years will see a huge leap in Internet penetration, companies are also aware that the needs of the next generation of users is going to be “remarkably different”.

Shouvick Mukherjee, vice-president and chief executive officer of Yahoo India’s Research and Development (R&D) wing, believes that providing content in local languages, and building a Web that can cater for the needs of those who may not be literate, by making the human–machine interface as multimedia centric as possible, is what Internet companies must do today.

In a free-wheeling chat with The Hindu, days before Yahoo launches it ‘Winter School’ for technology researchers at the Indian Institute of Science, Mr. Mukherjee spoke about why multimedia is a huge focus area in the R&D labs, what the Internet major is doing to reach out to new users, and how algorithms are becoming more sophisticated, more on the fly and predictive to give the best consumer experience possible.


What’s the cutting-edge research work that’s happening at Yahoo’s R&D lab in India?

There are many focus areas, like cloud computing for one, which is an integral part of the technologies we build. Then there’s machine-learning technologies, a field that Internet companies work on, which includes data analysis and making algorithms to improve both our user experience and advertising. Another definite focus area is mobile, as is our focus on entertainment.

Yahoo's ‘Winter School’ focusses on multimedia. What’s the kind of research that’s happening in this field?

Multimedia is obviously a critical field, and research on multimedia is done entirely in India. All the products for more than 40 markets are made entirely here.

It’s important as we are seeing that the Internet is moving from a very text-based medium to one that is rich in multimedia, be it image or be it video. The idea of the Winter School is to increase collaborative efforts and research in this big field.

You said there’s a lot of IP (intellectual property) coming from the India labs. What’s the kind of IP that has come out in the multimedia field?

There are plenty. Take, for instance, for image search. We are working to see what is the kind of image data that people are consuming, what are the image patterns, how much time people are spending, and so how your search or costumer experience can be improved. In image search, one of the main things is the quality of image; so the work focusses on, for example, how to differentiate on what’s an adult image and what isn’t. The more exciting part has to do with fields like picture recognition and face recognition, and all that is to see how we can identify entities from images. I mean, if the machine can figure out if this is, say, Shah Rukh [Khan] or Amir Khan, then we can manage the data better for you.

Needless to say that the challenges are far greater when it comes to video.

Internet companies, including your competitors like Google Inc., are introducing new and exciting products in India — for instance, turn-by-turn navigation on maps and so on. What's Yahoo focussing on?

There’s a focus on India. There are three dimensions to this focus: mobiles, more content and — given that the Internet is moving from tier-II to tier-III cities — going local with languages. India is big, and we believe that in the next five years, 30 to 40 per cent of Yahoo’s user base will come from India, which is roughly 500-600 million new users.

And what will you have to do differently for this next hundred million, and more?

A focus on mobile is one. The second is language, as I’ve already said, and the third is entertainment. The fact is that the major use of device is on entertainment. If you look at our cricket site, when a cricket match goes on, it’s unbelievable… the amount of traffic going in. The same is for movies, for songs and so on.

So the Yahoo India pages [reached] the ‘1 billion page views’ milestone last week, out of which 40 per cent is entertainment, celebrity and cricket. It’s certainly entertainment-celebrity news-led.

Your competitors have been working with Indic scripts, and transliteration tools for some time now...

We’re trying to look at building content partnerships to get Indian languages out on the Web. The next thing is to get translations, for which we are collaborating [with] colleges and universities ... to, say, get the machine to learn to translate on a large scale. Of course, we’re not there yet; these are challenges.

Also, if you look at where the Web is going… you needn’t be really literate to see where machine–human interaction is going… human gesture, touch and voice are becoming input devices. Maybe the output will be very multimedia-centric. This, incidentally, is very aligned to what India needs.