Understanding user engagement in online interactions is important in many contexts, with online shopping, advertising, e-learning and healthcare being just a few sectors. Now, IIT Hyderabad has built DAiSEE (Dataset for Affective States in E-Environments), the first multilabel video-classification dataset for recognising boredom, confusion, frustration and engagement. The dataset comprises 9,068 video snippets captured from 112 individuals. For each of these affective states, there are further four levels of labels – very low, low, high and very high. These labels are provided by observing the viewer’s reactions.
There can be multiple labels assigned to a snippet: “For example, when understanding some complex terminology from videos, a person could display high engagement and still be confused or frustrated at the same time,” explains Vineeth N. Balasubramanian of Department of Computer Science and Engineering at IIT Hyderabad who has led the research. “The combination of data and annotations related to user engagement sets the platform for DAiSEE as a specialized dataset,” he adds in an email to The Hindu. The dataset is available to the public at the website http://www.iith.ac.in/
~daisee-dataset/ Recognising, interpreting, processing and simulating human affective states, or emotions, is an important area of research known as affective computing.
The usual emotions studied include anger, disgust, fear etc. “For a large part, researchers have focused on these basic expressions, we chose to go beyond,” says Dr Balasubramanian.
For instance, in a classroom, the student could be engaged with the lesson, or bored, frustrated or even confused. “Subsequent affective states can be viewed as a result of these four,” says Dr Balasubramanian. For instance, if a person is bored or confused, they could be distracted easily. “The affective states we have considered in DAiSEE are a bit more subtle than the six basic expressions,” he adds.
In the study, people were invited to voluntarily participate in an experiment whether they would watch certain videos and then respond to a questionnaire. The consent of the participants to share the videos was taken. They were shown one educational video and one recreational, so that both focused and relaxed settings may be captured. This gave the researchers 9,068 videos of 10-second length, from which they extracted 27,000,000 images/video frames. “This is larger than most contemporary video datasets,” says Dr Balasubramanian.
The researchers used a crowd-voting method to annotate the dataset, and the best possible answers were picked using a statistical aggregation method (the Dawid-Skene aggregation). This uses an algorithm to consider the quality of the responses, which are then weighted accordingly to compute the resultant response.
The annotated data can be used by deep-learning frameworks employed in AI to learn the model accurately. In many applications, it is important to learn user engagement so that the algorithm can respond and interact with the user. “We hope for DAiSEE to be a large stride in the direction of promoting a health and improved experience of personalized interaction with such digital systems,” he says