A team of researchers from the International Institute of Information Technology Hyderabad (IIIT-H) have developed a method to sync-lip movements of speakers in out-of-sync videos. The artificial intelligence and machine learning powered solution, researchers say, has an array of applications, such as in gaming that involve three dimensional virtual avatars, which ‘talk’ to each other, and also in tackling problems posed by deep fake videos.
The IIIT-H’s Centre for Visual Information Technology researchers Rudrabha Mukhopadhyay and Prajwal K.R., along with professors C.V. Jawahar and Vinay Namboodiri embarked on the Wav2Lip project around February this year.
Mr. Prajwal said that the team first began working with a Face-to-Face Translation project, which led to the subsequent development of Wav2Lip, an improvement over LipGAN module.
“In augmented reality games, you have 3D avatars that interact with each other. If a player wants his or her character in the game to interact with another player’s character, and if he or she speaks into the mic, then the lip movements can be synced on to the face of that character. This is a possible application,” Mr. Prajwal said.
He gave an example of a video of a speech that was live-translated. The lip movements were usually not in sync with the translation. Wav2Lip can change all that. “The goal is to train a lip-sync model that will take an audio and generate new lip movements,” he said, adding that a large dataset of around 1,000 identities, or persons, are taken along with several thousand video clips of them speaking. After this, the model, he said, learns to generate lip movements. “This can also be used by those who are engaged in deep fake video detection,” he says.
Another important application is in the field of education. Several lecture videos can be translated in several languages and lip-synced for students to watch and learn. Speakers’ lips in press conferences can be synced in translations, in films and in case video signals are lost during video calls.
Mr. Prajwal added that as of now, the code and project are open-sourced for researchers to use. A demo has also been released.