OpenAI unveils AI voice cloning tech that only needs a 15-second sample to work

 

OpenAI’s Voice Engine was first developed in late 2022.

OpenAI has made its artificial intelligence (AI) even more humanly eerie with a text-to-voice tool that generates natural speech from a 15-second clip of someone’s voice to sound like the original speaker.

But even OpenAI is wary about the potential misuse of the technology and says it will not release Voice Engine publicly, with it currently only being available to early testers.

“We recognise that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the San Francisco-based company said in a statement.

Voice cloning AI technology is not new and has already been used under concerning circumstances.

Ahead of the primary vote in the United States in January, AI-generated robocalls mimicking President Joe Biden were sent to thousands of voters telling them to stay at home and abstain from voting.

The US Federal Communications Commission (FCC), as a result, banned AI-generated robocalls last month.

But it is not just elections that can be affected but voice cloning technology or deepfakes. Fraudulent extortion scams via impersonating AI are also a growing concern.

But it can also be used for good. OpenAI has shown how the technology is helping patients who suffer from sudden or degenerative speech conditions by restoring their voice with videos or audio materials from before they lost the ability to speak.

OpenAI said another use case is for people who cannot speak or have difficulty speaking to give them a voice, which does not sound like a robot.

“These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI said in its blog post.

Voice Engine is so far only available to several of OpenAI’s partners, which the company said have agreed to their usage policies that prohibit the impersonation of another individual or organisation without consent.

Companies with access to Voice Engine include the education technology company Age of Learning, the visual storytelling platform HeyGen, and the health system Lifespan.

OpenAI said another safety measure is watermarking to trace the origin of any audio generated by Voice Engine; it also requires the partners to get the “explicit and informed consent” of the original speaker.

“We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures,” OpenAI said.

Source: euronews.com

Peter Tolan is a Junior Content Editor for the HIPTHER network, where he has quickly established himself as a versatile voice in the global iGaming and technology sectors. Operating across the network's specialized platforms, Peter leverages a deep understanding of the European and American gaming landscapes to deliver high-impact, B2B intelligence. He is a key contributor to the "Evolution" side of the industry, specializing in the analysis of online gaming trends, the fast-paced world of esports, and the integration of deep-tech innovations. With a sharp eye for emerging technologies, Peter ensures that the HIPTHER community remains at the forefront of the global digital revolution.