Creative GenerationReplaces Human LaborVerified

Every voice learners hear in the Duolingo app — characters speaking sentences, words, and dialogues — is produced by AI voice synthesis, not live human recordings. Duolingo built custom synthetic voices for each of its animated characters by training AI on recordings made by human voice actors. Those voices now generate unlimited audio across 30+ languages automatically.

Details

Duolingo has used text-to-speech (TTS) technology since at least 2017, initially through Amazon Polly. In August 2021, the company announced custom AI voice synthesis built from recordings of auditioned voice actors, using Microsoft Azure Custom Neural Voice technology. The system uses cross-lingual transfer: once a high-quality voice is trained in one language, it can be adapted to additional languages without re-recording. Voice actors provided the original training data but are not involved in producing individual audio clips. All lesson audio is now AI-synthesized at production time.