Imagine a robot or an AI assistant giving you a nudge before you add more salt to your food, plan your next family outing or guide a doctor in a complex surgery. The next generation AI is well and truly here with ‘Ego4D’, a project initiated by Facebook AI in collaboration with Facebook Reality Labs Research (FRL Research) and other institutes from UK, Italy, Japan, Saudi Arabia, Singapore, and the United States.
International Institute of Information Technology (IIIT)-Hyderabad is the only Indian institute in the global consortium of 13 universities and labs participating in this ambitious programme. In November, a mammoth and unique dataset comprising over 2,200 hours of first-person videos in the wild, of over 700 participants engaged in routine, everyday activities will be unveiled.
The dataset comprises video footage from a first person’s perspective. “These videos show the world from the centre of the action, rather than the sidelines,” said lead research scientist, Facebook AI Kristen Grauman. The footage has been collected via head-mounted devices combined with other egocentric sensors to track the wearer’s gaze and capture the interactions.
By recognising the location, scene of activity and social relationships, these devices could be trained to not only automatically understand what the wearer is looking at, attending to, or even manipulating, but also the context of the social situation itself. IIIT-Hyderabad had collected data from over 130 participants spread across 25 locations in the country for this project.
“Initially, we wanted to have a team travelling and collect the data but due to the pandemic, we used multiple local teams training people over videos and sending cameras,” explained Center for Visual Information Technology’s C.V. Jawahar. Participants recorded activities of cooks, carpenters, painters, electricians and farmers. “This is not a scripted activity but video footage taken even as each individual went about the daily tasks in a normal setting,” said Prof. Jawahar.
While computer vision always had the potential for assistive technologies to improve the quality of life, this dataset could help push the envelope even further. For instance, a wearable device with first-person vision can help someone with visual impairment or it can help reinforce memory for those showing early signs of dementia or memory disorders. In education, it can learning experience to a whole new level.
“The first-person view is especially important in training where an instructor may not have the same perspective as you. It can prod you in the right direction if you miss a step, it can remind you. If you’re doing well, it can encourage you and even while conducting surgeries, it can provide additional cues to the surgeon wearing the device,” remarked Prof. Jawahar.
Ego4D has five benchmarks: Episodic memory - Like if you have misplaced your keys, you could ask your AI assistant to retrace your day in order to locate them. Forecasting - AI can understand what you are doing, anticipate your next move and guide you, like stopping you before you put more salt while cooking.
Hand and object manipulation - AI learns how hands interact with objects and can instruct in using chopsticks. Audio-visual diarization - if you stepped out of a meeting, you could ask the AI what the team lead announced after you left. Social interaction - AI can help you better hear the person in a noisy restaurant.
Facebook AI also envisions its applications in a futuristic like scenario – a ‘metaverse’, where physical reality, AR, and VR converge. “It could know your favourite coffee mug or guide your itinerary for your next family trip. We are working on assistant-inspired research prototypes to do that,” said Mr. Grauman. “Ego4D dataset will take computer vision higher by another couple of notches,” added IIIT-Hyderabad director P J Narayanan.