Details
Scribe takes audio or video files as input and produces structured JSON transcripts as output, including speaker diarization (identifying who said what), character-level timestamps, and tagging of non-speech audio events such as laughter or applause. A real-time version, Scribe v2 Realtime, processes live speech with approximately 150 milliseconds of latency and is designed for use in conversational AI agents and meeting assistants. The tool is available via the web dashboard and API.
Have evidence about ElevenLabs's AI practices? Submit a report.
Report a Sighting →