What you need to know
- Microsoft’s AI CEO claimed that content shared on the web is “freeware” that can be copied and used to create new content.
- The remarks centered around Microsoft and other companies using preexisting content to train AI models.
- The CEO claimed that there’s a separate category of content that cannot be used to train AI, which is indicated by an organization explicitly stating “do not scrape or crawl me for any other reason than indexing me so that other people can find that content.”
Microsoft may have opened a can of worms with recent comments made by the tech giant’s CEO of AI Mustafa Suleyman. The CEO spoke with CNBC’s Andrew Ross Sorkin at the Aspen Ideas Festival earlier this week. In his remarks, Suleyman claimed that all content shared on the web is available to be used for AI training unless a content producer says otherwise specifically.
“With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That’s been the understanding,” said Suleyman.
“There’s a separate category where a website or a publisher or a news organization had explicitly said, ‘do not scrape or crawl me for any other reason than indexing me so that other people can find that content.’ That’s a gray area and I think that’s going to work its way through the courts.”
Suleyman’s quote raises several questions:
- Is it actually okay to use other people’s work to create new content?
- If so, is it okay to profit off those recreations or work derivative of preexisting content?
- How could websites and organizations “explicitly” say that their work cannot be used for AI training before AI became commonplace?
- Has Microsoft respected any organization that specified content should only be used for search?
- Have Microsoft’s partners, including OpenAI, respected any demands that content not be used for AI training?
Microsoft AI CEO Mustafa Suleyman: the social contract for content that is on the open web is that it’s “freeware” for training AI models pic.twitter.com/FN1xrqnJC0June 26, 2024
Several ongoing lawsuits suggest that publishers do not agree with the take of Suleyman.
Training vs. stealing
Generative AI is one of the hottest topics in tech in 2024. It’s also a hot button topic among creators. Some claim that AI trained on other people’s work is a form of theft. Others equate training AI on existing work to artists studying at school. Contention often circles around monetizing work that’s derivative of other content.
YouTube has reportedly offered “lumps of cash” to train its AI models on music libraries from major record labels. The difference in that situation is that record labels and YouTube will have agreed to terms. Suleyman claims that a company could use any content on the web to train AI, as long as there was not an explicit statement demanding that not be done.
Microsoft and OpenAI have been on the receiving end of several copyright infringement lawsuits. Eight US-based publishers filed suits against OpenAI and Microsoft, joining The New York Times, which already had an ongoing suit.
AI-generated content is controversial in ways other than its source material. An animated video stirred up Pink Floyd fans when it became a finalist in an animation competition.
Assuming I’ve understood Suleyman correctly, the CEO claimed that any content is freeware that anyone can use to make new content, unless the creator says otherwise. I’m not a lawyer, but Suleyman’s claims sound a lot like those viral chain messages that get forwarded around Facebook and Instagram saying, “I DO NOT CONSENT TO MY CONTENT BEING USED.” I always assumed copyright law was more complicated than a Facebook post.