Creative GenerationVerified

Stability AI offers Stable Audio, an AI system that generates original music tracks and sound effects from text descriptions. The flagship enterprise model, Stable Audio 2.5, produces stereo tracks up to three minutes long and is designed for use in advertising, gaming, and film production.

Details

Stable Audio 2.5 uses a latent diffusion model architecture with a diffusion transformer (DiT) and accepts text-to-audio and audio-to-audio inputs, including audio inpainting. It reduces generation from 50 computational steps to 8 via Stability AI's proprietary Adversarial Relativistic-Contrastive (ARC) post-training method, producing tracks in under 2 seconds on GPU. The model was trained on a fully licensed dataset. A lighter open-source variant, Stable Audio Open Small (341M parameters), was co-developed with Arm for on-device mobile audio generation. Stable Audio 2.0, which preceded 2.5, was trained on a licensed dataset from the AudioSparx music library.