Meta shows text-to-speech AI that can convert text to audio

By Ameen Akbar On Jun 16, 2023

Spread the love

Meta has shown off a text-to-speech program that allows users to convert written text to audio. Voicebox works in six languages, including French and German. The tool will not be made public for the time being to prevent misuse.

Meta say that Voicebox is a generative AI that can create audio files from text. With a piece of audio of at least two seconds, the tool can also match the audio from it. For example, the tool can match the generated audio with someone’s voice. Voicebox can then further create the text itself in six languages. In addition to English, these are also French, German, Spanish, Polish and Portuguese.

Voicebox can also edit an audio message in which text is spoken. For example, the tool can correct mispronounced words or filter out background noise such as a barking dog.

Meta has one flow matching model used to make the text sound natural. Flow matching is an AI training model that Meta itself designed, which is based on continuous normalizing flows. In a research paper Meta says the model has been trained on 50,000 hours of audio in each of the six supported languages. The model is said to have an error rate of only 1.9 percent in spoken words.

Meta will not make either the tool or the underlying model public for the time being. The company says such a tool has “potential to be abused and hurt people.” That is why it only wants to release an approach and the results in a scientific paper, but not the tool itself. Meta does not say whether that will happen in the future. The company has put a number of demos online in which examples of the AI can be heard.