Details
NVIDIA ACE combines multiple AI models—including Riva for automatic speech recognition and text-to-speech, Nemotron large language models for contextual dialogue generation, and Audio2Face for realistic lip-sync and facial animation—into a unified platform. Developers receive user voice input, which the AI processes through a speech-to-text model, feeds into an LLM for response generation, and then converts back to synthesized speech with synchronized facial animation. ACE is designed for cloud deployment via NIM microservices and is also being extended to on-device deployment across an installed base of 100 million RTX AI PCs.
Have evidence about NVIDIA's AI practices? Submit a report.
Report a Sighting →