Meta has announced the ‘state of the art’ generative AI model Voicebox which converts text to speech and includes features to edit audio and work across languages.
In an Instagram Channels post shared by Meta CEO Mark Zuckerberg, a video showed how Voicebox could read out text in a variety of vocal styles, remove noisy distractions from audio tracks, learn and replicate speakers’ voices, and even produce output in different languages.
A blog post by Meta on Friday described that the model could do tasks that it wasn’t specifically trained to do.
The multilingual model can also produce speech in English, French, German, Spanish, Polish, or Portuguese. Other listed features included diverse text-to-speech, style transfer, content correction, in context text-to-speech, and noise removal.
(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)
“This type of technology could be used in the future to help creators easily edit audio tracks, allow visually impaired people to hear written messages from friends in their voices, and enable people to speak any foreign language in their own voice,” said Meta in its blog post.
It suggested that the model could bring more natural voices to virtual assistants and non-player-characters in the metaverse.
Zuckerberg said that Voicebox was still a “research project” but that Meta would be building more on it.
The video clip closed with a voice that sounded like the Meta chief, saying “more soon” in Polish.
Meta has been developing AI models to process multiple forms of media, and made several of these open source for research purposes.