Creative GenerationConsumer FacingVerified

ChatGPT can now hold spoken conversations using a synthetic voice — users speak to it and it speaks back, in real time, in one of several AI-generated voices. This feature, called Advanced Voice Mode, launched in September 2024. Separately, developers can use the text-to-speech API to have any text read aloud in a lifelike voice for their own apps.

Details

Advanced Voice Mode uses OpenAI's GPT-4o model, which processes audio directly rather than converting speech to text first, allowing for lower-latency and more natural-sounding conversation. It supports more than 50 languages, custom spoken instructions, and video or screen sharing on mobile. The text-to-speech API offers 13 built-in voices and allows developers to instruct the model to speak in a particular style — for example, as a sympathetic customer service agent. A Realtime API enables developers to build their own live voice-agent products using the same technology.