OpenAI Launches Next-Gen Speech and Voice Models

1 year ago

OpenAI has unveiled its latest suite of audio models, aiming to revolutionize the development of intelligent voice agents by enhancing speech-to-text and text-to-speech capabilities.

Audio Models.png.

The newly introduced gpt-4o-transcribe and gpt-4o-mini-transcribe models mark a significant improvement over previous iterations, notably the Whisper models. These models exhibit a reduced Word Error Rate (WER), indicating higher transcription accuracy.

This enhancement is particularly beneficial in scenarios involving diverse accents, background noise, and varying speech speeds, making them ideal for applications such as customer service centers and transcription services.

According to OpenAI, this improved accuracy stems from targeted innovations in reinforcement learning and extensive training with diverse, high-quality audio datasets. This approach enables the models to better capture speech nuances, reduce misrecognitions, and increase transcription reliability.

The new gpt-4o-mini-tts model also introduces a groundbreaking feature: the ability to instruct the model on not just what to say, but how to say it. Developers can now specify the tone and style of speech, such as directing the model to “talk like a sympathetic customer service agent.”

These new audio models are now accessible to developers worldwide through OpenAI’s API platform. For those already building conversational experiences with text-based models, integrating these speech-to-text and text-to-speech models offers a straightforward path to developing voice agents.

OpenAI has also released an integration with the Agents SDK to simplify this development process. For low-latency speech-to-speech experiences, the speech-to-speech models in the Realtime API are recommended.

Check out OpenAI’s livestream replay to learn more.

Want to see more of our stories on Google?

P.S. Want to keep this site truly independent? Support us by buying us a beer, treating us to a coffee, or shopping through Amazon here. Links in this post are affiliate links, so we earn a tiny commission at no charge to you. Thanks for supporting independent Canadian media!