Details
The Moderation API offers two tiers: a newer multimodal model (omni-moderation-latest) that analyzes both text and images with a 42% improvement in multilingual accuracy, and an older text-only model. It returns a flag for each content category — including hate speech, harassment, self-harm, sexual content involving minors, and graphic violence — along with a confidence score for each. Developers typically use it in two ways: to screen user-submitted prompts before sending them to an AI model (input moderation), and to review AI-generated outputs before displaying them to end users (output moderation)
Have evidence about OpenAI's AI practices? Submit a report.
Report a Sighting →