Can offensive, objectionable, sexual, racist and violence-related suggestions thrown up during a search on the web be controlled?
A joint research by scientists at Microsoft and the International Institute of Information Technology (IIIT) Hyderabad promises to do exactly that, of controlling inappropriate suggestions thrown up by the search engines making web browsing safe for everyone, particularly children and women.
The team of Harish Yenala, a research student at IIIT Hyderabad, Manoj Chinnakotla, senior applied scientist, artificial intelligence and research, Microsoft India and Jay Goyal, principal development manager, Microsoft, have proposed a technique for automatically identifying such inappropriate query suggestions based on a new field of computer science research known as Deep Learning (DL) that aims to build machines that can process data and learn in the same way as the human brain does.
Dr. Manoj said the DL essentially involves building artificial neural networks that are trained to mimic the behaviour of the human brain. These networks can learn to represent and reason over various inputs given to them such as words, images, sounds and so on.
Best paper award
The research was presented at the recent Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)-2017 held in Seoul, Korea where it received the best paper award from among 450 papers presented.
The DL architecture that the team proposed, called ‘Convolutional Bi-Directional LSTM (C-BiLSTM), is a combination of the strengths of both Convolution Neural Networks (CNN) and Bi-Directional LSTMs (BLSTM).
Given a query, C-BiLSTM uses a convolutional layer for extracting feature representations for each query word, which is then fed as input to the BLSTM layer. This input captures the various sequential patterns in the query, and outputs a richer representation encoding them. This new richer query representation then passes through a fully-connected network that predicts the target class before giving out the output suggestion.
For instance, Dr. Manoj says a kid trying to search kite can see suggestions like killing using the first two letters. Similarly, socially offensive suggestions can actually lead to confrontation. “Our research outcome is sensitive and gives only the appropriate suggestions,” he said.
The advantages of C-BiLSTM include the fact that it doesn’t rely on hand-crafted features, is trained end to end as a single model, and effectively captures both local and global semantics.
“The team also evaluated C-BiLSTM in real-world search queries from a commercial search engine, and the results revealed that it significantly outperformed both pattern-based and other hand-crafted feature-based baselines. The C-BiLSTM also performed better than individual CNN, LSTM and BLSTM models trained for the same task,” Dr. Manoj explained.
The technology is being used in Microsoft’s search engine Bing and will be launched soon as Application Programming Interface (APIs) in Microsoft Cognitive Services.
The team is sure that the new architecture would be highly effective in other online platforms such as chatbots and autonomous virtual assistants as well, and help these platforms become more contextually aware, culturally sensitive, and dignified in their responses.