Content ModerationInternal OnlyVerified

Before releasing new AI models, OpenAI runs a structured safety-testing process. This includes a formal risk framework with defined thresholds (the Preparedness Framework), a network of outside experts who try to find dangerous capabilities (red teamers), and an open-source benchmarking tool called Evals. Safety evaluations can block a model from being released if risks are deemed too high.

Details

The Preparedness Framework (Version 2, released April 2025) tracks three main risk areas: biological and chemical weapons capability, cybersecurity threats, and AI systems gaining the ability to improve themselves. Each area has defined "High" and "Critical" risk thresholds; a model rated Critical in any area cannot be deployed until risks are reduced. The Red Teaming Network includes more than 100 domain experts across 29 countries and 45 languages. In a first-of-its-kind arrangement, OpenAI and Anthropic jointly evaluated each other's models in early 2025. System cards — public safety documents — are published for each major model release.