In a world where artificial intelligence is becoming increasingly prominent, the use of internet data to train these AI systems has raised some serious ethical concerns. Companies are taking advantage of public data to feed AI models like ChatGPT, Midjourney, and Sora, without seeking permission from the creators of that data.
One such case involves The New York Times suing OpenAI for allegedly using its archives without consent. Similarly, Getty Images took legal action against Stable Diffusion for copyright infringement. On the other hand, The Associated Press has decided to license part of its archives to OpenAI, and Shutterstock has signed a long-term deal with the same organization to provide training data.
Automattic, the parent company of popular platforms like Tumblr and WordPress, is also in talks to sell user data to OpenAI and Midjourney. Reddit has already sold access to its posts to Google in a $60 million deal, showcasing the lucrative nature of accessing internet data for AI training purposes.
It’s a known fact that large AI models are being trained on various internet platforms, including World of Warcraft message boards, Patreon, and Kickstarter. Meta, the parent company of Facebook and Instagram, also uses public posts from these social media platforms to train its own AI models.
As companies continue to navigate the ethical implications of using internet data to train AI, it raises important questions about privacy, consent, and ownership of online content. The issue of data scraping and collection for AI training purposes is one that will likely continue to be a topic of debate in the years to come.
“Explorer. Devoted travel specialist. Web expert. Organizer. Social media geek. Coffee enthusiast. Extreme troublemaker. Food trailblazer. Total bacon buff.”