Creative GenerationInternal OnlyVerified

Meta released a family of AI audio generation models between June and November 2023: MusicGen and AudioGen (within the AudioCraft framework, open-sourced in August 2023) generate music and sound effects from text prompts; Voicebox generates speech in six languages but was withheld from public release; and Audiobox unifies speech, sound effects, and soundscapes. The AudioCraft models are available to researchers and developers as open-source code; Voicebox and Audiobox were released only to selected researchers.

Details

MusicGen (open-sourced June 2023) takes a text description or reference melody as input and outputs instrumental music clips; it was trained on 20,000 hours of Meta-owned and licensed audio. AudioGen, bundled in the AudioCraft framework (open-sourced August 2023 under MIT and CC-BY-NC 4.0 licenses), takes text prompts as input and outputs environmental sounds and sound effects. Voicebox (announced June 2023) takes text and a short audio sample as input and outputs speech in six languages; Meta declined to release the model or its code publicly, citing potential for voice impersonation and deepfakes. Audiobox (announced November 2023, demo released December 2023) is a unified model taking text prompts and voice inputs to generate speech, sound effects, and soundscapes; it was initially released only to a hand-selected group of researchers and academic institutions.