OtherInternal OnlyVerified

Anthropic operates a formal safety framework called the Responsible Scaling Policy that sets rules for when and how it can train and release more powerful AI models. Under this policy, each new Claude model is assigned a safety level, and passing specific safety tests is required before the model can be deployed. The framework is now in its third version and has been updated as Claude's capabilities have grown.

Details

The Responsible Scaling Policy was first published in September 2023 and has been revised to Version 3.0, released in February 2026. It classifies AI models on a scale from ASL-1 (no meaningful risk) through ASL-4 (requiring the most stringent safeguards). Claude Opus 4 became the first model classified as ASL-3 in May 2025, triggering a set of enhanced deployment safeguards including Constitutional Classifiers and hardened infrastructure security. Anthropic publishes detailed model system cards — technical documents summarizing safety evaluations — for each major release, running up to 244 pages. Version 3.0 introduced public Frontier Safety Roadmaps with stated safety goals and Risk Reports, but also drew criticism for removing a prior commitment to pause model training if adequate safety measures could not be confirmed. Approximately 8% of all Anthropic employees work on security-related areas, and a named Responsible Scaling Officer (Jared Kaplan) holds accountability for the policy.