Details
The Responsible Scaling Policy was first published in September 2023 and has been revised to Version 3.0, released in February 2026. It classifies AI models on a scale from ASL-1 (no meaningful risk) through ASL-4 (requiring the most stringent safeguards). Claude Opus 4 became the first model classified as ASL-3 in May 2025, triggering a set of enhanced deployment safeguards including Constitutional Classifiers and hardened infrastructure security. Anthropic publishes detailed model system cards — technical documents summarizing safety evaluations — for each major release, running up to 244 pages. Version 3.0 introduced public Frontier Safety Roadmaps with stated safety goals and Risk Reports, but also drew criticism for removing a prior commitment to pause model training if adequate safety measures could not be confirmed. Approximately 8% of all Anthropic employees work on security-related areas, and a named Responsible Scaling Officer (Jared Kaplan) holds accountability for the policy.
Have evidence about Anthropic's AI practices? Submit a report.
Report a Sighting →