OtherConsumer FacingVerified

Starting around early 2024, Tumblr's parent company Automattic began selling public Tumblr posts — years of user-generated writing, images, and art — to AI companies including OpenAI and Midjourney, to use as training data for their AI models. Users were enrolled in this data sharing by default, with an opt-out toggle that had to be turned off separately for each individual blog.

Details

On February 27, 2024, investigative outlet 404 Media reported that Automattic had compiled all of Tumblr's public post content from 2014 to 2023 and was in advanced negotiations with OpenAI and Midjourney. Internal documents showed the data dump erroneously included content that should have been excluded: posts from deleted or suspended accounts, private posts on public blogs, unanswered asks, and posts marked as explicit or mature. Automattic added a per-blog "Prevent Third-Party Sharing" toggle the following day — after the story broke — but sharing was enabled by default (opt-out, not opt-in). Tumblr staff posted a public acknowledgment of the deals on February 27, 2024, framing the practice as a response to AI companies already scraping the web; the post received over 52,000 reblogs, reflecting the scale of user concern. A Tumblr product manager who worked on the data preparation subsequently posted publicly that he was removing his own photos from the platform. A follow-up report in March 2024 revealed a separate "Firehose" pipeline selling approximately one million daily WordPress posts through a data intermediary; Automattic later deprecated this pipeline. Critics noted that once content is used to train an AI model, it cannot be meaningfully removed.