IIIT-H develops unique lip-sync method

It can be used in gaming, to detect fake videos

Updated - November 06, 2020 05:12 pm IST

Published - November 06, 2020 12:15 am IST - HYDERABAD

A team of researchers from the International Institute of Information Technology Hyderabad (IIIT-H) have developed a method to sync-lip movements of speakers in out-of-sync videos. The artificial intelligence and machine learning powered solution, researchers say, has an array of applications, such as in gaming that involve three dimensional virtual avatars, which ‘talk’ to each other, and also in tackling problems posed by deep fake videos.

The IIIT-H’s Centre for Visual Information Technology researchers Rudrabha Mukhopadhyay and Prajwal K.R., along with professors C.V. Jawahar and Vinay Namboodiri embarked on the Wav2Lip project around February this year.

Mr. Prajwal said that the team first began working with a Face-to-Face Translation project, which led to the subsequent development of Wav2Lip, an improvement over LipGAN module.

“In augmented reality games, you have 3D avatars that interact with each other. If a player wants his or her character in the game to interact with another player’s character, and if he or she speaks into the mic, then the lip movements can be synced on to the face of that character. This is a possible application,” Mr. Prajwal said.

He gave an example of a video of a speech that was live-translated. The lip movements were usually not in sync with the translation. Wav2Lip can change all that. “The goal is to train a lip-sync model that will take an audio and generate new lip movements,” he said, adding that a large dataset of around 1,000 identities, or persons, are taken along with several thousand video clips of them speaking. After this, the model, he said, learns to generate lip movements. “This can also be used by those who are engaged in deep fake video detection,” he says.

Another important application is in the field of education. Several lecture videos can be translated in several languages and lip-synced for students to watch and learn. Speakers’ lips in press conferences can be synced in translations, in films and in case video signals are lost during video calls.

Mr. Prajwal added that as of now, the code and project are open-sourced for researchers to use. A demo has also been released.

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in


Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.