Content ModerationInternal OnlyVerified

OpenAI offers a free Moderation API that developers can use to automatically check whether text or images contain harmful content — such as hate speech, graphic violence, or sexual material — before allowing it into their apps. It is used both inside OpenAI's own products and by third-party developers building on the OpenAI API.

Details

The Moderation API offers two tiers: a newer multimodal model (omni-moderation-latest) that analyzes both text and images with a 42% improvement in multilingual accuracy, and an older text-only model. It returns a flag for each content category — including hate speech, harassment, self-harm, sexual content involving minors, and graphic violence — along with a confidence score for each. Developers typically use it in two ways: to screen user-submitted prompts before sending them to an AI model (input moderation), and to review AI-generated outputs before displaying them to end users (output moderation)