WordPress and Tumblr will sell user data to train AI models – report


A deal is in place for Automattic, the parent company of Tumblr and WordPress, to sell user data for the purpose of training artificial intelligence (AI) models, as per reports from 404media.

Internal documents obtained by 404media reveal that Automattic is set to engage in a data sale with companies such as Midjourney and OpenAI.

It is claimed by 404media that a new feature will be introduced today to provide users with the option to opt-out of sharing their data with third parties, including AI firms.

Automattic’s AI Strategy

Automattic has published a statement on their website called “Protecting User Choice” which details their approach towards AI.

The statement acknowledges the impact of AI on content creation and consumption, stating, “AI is rapidly transforming nearly every aspect of our world, including the way we create and consume content. At Automattic, we’ve always believed in a free and open web and individual choice.”

Automattic mentions that they currently block AI crawlers and search engine indexing sites to safeguard user content unless consent is given for public visibility.

The company also expresses their transparency in collaborating with third parties by ensuring alignment with community interests such as attribution, opt-outs, and control of data.

The statement assures that partnerships with AI firms will respect user opt-out preferences and commit to updating partners on requests for removal of content from past and future data sources.

While the exact impact of the collaboration with AI companies on user data remains to be seen, the link between the 404media report and the public statement indicates that Automattic is mindful of the implications of AI integration with public data.

Why AI Firms Seek User Data Access

AI companies rely on data sources to train their algorithms, allowing for more precise outputs. A comprehensive and diverse dataset leads to more accurate AI models in theory, enhancing their knowledge on specific topics.

In response to evolving AI regulations and the need for robust datasets, collaborations with entities holding valuable data are crucial for optimal AI performance. This approach also ensures legal compliance with data ownership rights, benefiting both AI model developers and platform owners like Tumblr and WordPress through data access arrangements.

