Details
MusicGen (open-sourced June 2023) takes a text description or reference melody as input and outputs instrumental music clips; it was trained on 20,000 hours of Meta-owned and licensed audio. AudioGen, bundled in the AudioCraft framework (open-sourced August 2023 under MIT and CC-BY-NC 4.0 licenses), takes text prompts as input and outputs environmental sounds and sound effects. Voicebox (announced June 2023) takes text and a short audio sample as input and outputs speech in six languages; Meta declined to release the model or its code publicly, citing potential for voice impersonation and deepfakes. Audiobox (announced November 2023, demo released December 2023) is a unified model taking text prompts and voice inputs to generate speech, sound effects, and soundscapes; it was initially released only to a hand-selected group of researchers and academic institutions.
Have evidence about Meta's AI practices? Submit a report.
Report a Sighting →