Next generation AI is here

IIIT-Hyderabad among the global consortium of 13 varsities and labs participating in the ambitious programme

October 23, 2021 11:15 pm | Updated 11:15 pm IST - HYDERABAD

Standard computer vision models work well on third-person view (left), but fail on first-person perspective (right).

Standard computer vision models work well on third-person view (left), but fail on first-person perspective (right).

Imagine a robot or an AI assistant giving you a nudge before you add more salt to your food, plan your next family outing or guide a doctor in a complex surgery. The next generation AI is well and truly here with ‘Ego4D’, a project initiated by Facebook AI in collaboration with Facebook Reality Labs Research (FRL Research) and other institutes from UK, Italy, Japan, Saudi Arabia, Singapore, and the United States.

International Institute of Information Technology (IIIT)-Hyderabad is the only Indian institute in the global consortium of 13 universities and labs participating in this ambitious programme. In November, a mammoth and unique dataset comprising over 2,200 hours of first-person videos in the wild, of over 700 participants engaged in routine, everyday activities will be unveiled.

The dataset comprises video footage from a first person’s perspective. “These videos show the world from the centre of the action, rather than the sidelines,” said lead research scientist, Facebook AI Kristen Grauman. The footage has been collected via head-mounted devices combined with other egocentric sensors to track the wearer’s gaze and capture the interactions.

By recognising the location, scene of activity and social relationships, these devices could be trained to not only automatically understand what the wearer is looking at, attending to, or even manipulating, but also the context of the social situation itself. IIIT-Hyderabad had collected data from over 130 participants spread across 25 locations in the country for this project.

“Initially, we wanted to have a team travelling and collect the data but due to the pandemic, we used multiple local teams training people over videos and sending cameras,” explained Center for Visual Information Technology’s C.V. Jawahar. Participants recorded activities of cooks, carpenters, painters, electricians and farmers. “This is not a scripted activity but video footage taken even as each individual went about the daily tasks in a normal setting,” said Prof. Jawahar.

While computer vision always had the potential for assistive technologies to improve the quality of life, this dataset could help push the envelope even further. For instance, a wearable device with first-person vision can help someone with visual impairment or it can help reinforce memory for those showing early signs of dementia or memory disorders. In education, it can learning experience to a whole new level.

“The first-person view is especially important in training where an instructor may not have the same perspective as you. It can prod you in the right direction if you miss a step, it can remind you. If you’re doing well, it can encourage you and even while conducting surgeries, it can provide additional cues to the surgeon wearing the device,” remarked Prof. Jawahar.

Ego4D has five benchmarks: Episodic memory - Like if you have misplaced your keys, you could ask your AI assistant to retrace your day in order to locate them. Forecasting - AI can understand what you are doing, anticipate your next move and guide you, like stopping you before you put more salt while cooking.

Hand and object manipulation - AI learns how hands interact with objects and can instruct in using chopsticks. Audio-visual diarization - if you stepped out of a meeting, you could ask the AI what the team lead announced after you left. Social interaction - AI can help you better hear the person in a noisy restaurant.

Facebook AI also envisions its applications in a futuristic like scenario – a ‘metaverse’, where physical reality, AR, and VR converge. “It could know your favourite coffee mug or guide your itinerary for your next family trip. We are working on assistant-inspired research prototypes to do that,” said Mr. Grauman. “Ego4D dataset will take computer vision higher by another couple of notches,” added IIIT-Hyderabad director P J Narayanan.

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.