OtherVerified

Midjourney trains its AI image and video generation models on datasets that include billions of images crawled from the public web, licensed data, and user-submitted prompts and outputs. This training practice is active and ongoing since 2022.

Details

According to Midjourney's AB2013 Documentation (a California regulatory disclosure), its models are trained on datasets comprising billions of images, text, and audiovisual content from multiple source categories: publicly crawled web content, licensed data, public domain data, and data potentially protected by copyright used under a fair use claim. User-submitted prompts and generated outputs are also covered by Midjourney's Terms of Service, which grants the company a perpetual license to use them for service improvement. Training data undergoes processing steps including safety filtering (to remove CSAM and other sensitive content) and privacy processing to filter personal information. As of 2025, Midjourney faces multiple active copyright infringement lawsuits — including a class action by artists (Andersen et al., filed January 2023) and a lawsuit filed by Disney and Universal in June 2025 — challenging whether its training on copyrighted works constitutes fair use.