Details
Cosmos world foundation models are neural networks that accept text, image, video, or sensor data as input and generate photoreal, physics-aware synthetic videos of environments such as warehouses, factories, and road scenes. This synthetic data is used to train and evaluate AI models for robots and autonomous vehicles. Cosmos includes diffusion and autoregressive transformer models (ranging from 4 to 14 billion parameters), advanced video tokenizers, guardrails, and tools for fine-tuning on proprietary data. A major update in March 2025 added Cosmos Reason, a chain-of-thought reasoning model for understanding video data and predicting physical interactions.
Have evidence about NVIDIA's AI practices? Submit a report.
Report a Sighting →