Details
Advanced Voice Mode uses OpenAI's GPT-4o model, which processes audio directly rather than converting speech to text first, allowing for lower-latency and more natural-sounding conversation. It supports more than 50 languages, custom spoken instructions, and video or screen sharing on mobile. The text-to-speech API offers 13 built-in voices and allows developers to instruct the model to speak in a particular style — for example, as a sympathetic customer service agent. A Realtime API enables developers to build their own live voice-agent products using the same technology.
Have evidence about OpenAI's AI practices? Submit a report.
Report a Sighting →