AI Trace
Anthropic: Anthropic trains Claude to refuse harmful requests using a method called Constitutional AI, which teaches the model to evaluate its own responses against a written set of principles — like giving an AI a rulebook it must check before answering. A separate layer of automated filters screens conversations in real time, and a trust and safety team enforces usage policies that banned 1.45 million accounts in the second half of 2025. | AI Trace