The Business Times

AI chatbots sucked up troves of data. Now copyright holders want a cut

Published Sun, Apr 7, 2024 · 11:00 AM

How do ChatGPT and other chatbots generate written works, images and music to rival the output of a talented human? By ingesting content that people already created and identifying patterns in the material so they can produce something new. These generative artificial intelligence platforms have hoovered up 19th century novels, beat poetry, draft contracts, movie scripts, photo essays, millions of songs and everything in between on the way to becoming the most disruptive technological force since the invention of the internet. 

It turns out this vast trawling of mankind’s past endeavors doesn’t come for free. News organizations, novelists, music publishers and others whose copyrighted works were fed into the chatbots’ large language models as part of their training are demanding a share of the profits. 

Some have cut deals with ChatGPT’s owner, OpenAI, for using their work. Others are suing the company and other AI platform developers in US courts. The outcome will be a test of the “fair use” principle, which makes it possible – in certain circumstances – to use books, news stories, song lyrics and other copyrighted material without paying their creators. 

The Cases

Litigation is ongoing against AI companies in at least 20 cases, most of them in California or New York federal courts. Some have been brought by news organizations. Others involve authors trying to recover some income from the use of their stories. 

The case that legal experts are watching most closely is one brought by the New York Times in late December against OpenAI and its investor Microsoft Corp. The newspaper is seeking as much as US$450 billion in damages, claiming OpenAI infringed its rights by using Times articles to develop ChatGPT. The complaint is distinct from others in that it also accuses the chatbot maker of engineering its product to reproduce Times articles almost verbatim when prompted. The Times said the companies spent months negotiating before it filed suit. OpenAI said in a motion to dismiss that the examples of regurgitation cited in the complaint were “highly anomalous,” and the byproduct of a bug in the chatbot.


Start and end each day with the latest news stories and analyses delivered straight to your inbox.


Another lawsuit accuses Facebook and Instagram owner Meta Platforms of illegally copying books by the authors Sarah Silverman and Richard Kadrey for its AI tools.

Stock photo supplier Getty Images claims its photographs shouldn’t have been used without permission to train the image generator Stable Diffusion, owned by Stability AI.

A pair of lawsuits brought by online news outlets The Intercept and Raw Story Media accuse OpenAI and Microsoft of violating the 1998 Digital Millennium Copyright Act by stripping away copyrighted information when they trained ChatGPT’s LLM.

The Authors Guild, which advocates copyright protection, has also sued OpenAI, as have individual authors like Julian Sancton and Nicholas Basbanes. Anthropic, the Amazon-backed AI startup, is also the defendant in a lawsuit related to its use of song lyrics to train its chatbot, Claude.

What is ‘fair use’?

Fair use is seen as a “safety valve” built into US copyright law. 

It’s meant to allow copyrighted works to be used without permission so long as it benefits the public in some way. It also ensures that copyright law abides by free speech protections. Parody is generally protected under fair use, for example. 

The law is notoriously nimble, and the only way to know whether fair use applies in a particular case is to ask a federal judge. The judges draw on four criteria, including whether the use adds to the original work in some way, as well as if it causes monetary losses to the copyright holder. 

They also consider the nature of the work, including whether it was fictional or based on real-world facts. 

Most other countries don’t have a fair use doctrine. Some, like the UK, have one that’s more limited, and others, like Japan, have passed laws exempting AI training from copyright liability. That may threaten what has long been seen as a competitive advantage for US-based companies. 

More recently, it’s been used to regulate technology, as with a case brought by the Authors Guild against Google in 2005. The guild accused the search giant’s books platform of infringing copyright by showing snippets of published books in search results. After a decade of litigation, a federal appeals court judge delivered a major victory to the tech giant in 2015, ruling that Google used only small sections from the books, and in a transformative way. 

So could ‘fair use’ be an effective defense for the AI platforms? 

The Google Books case provides the best precedent for some of the challenges against OpenAI, said Rebecca Tushnet, a first amendment scholar and professor at Harvard Law School. It found that creating a database was fair use because the whole was greater than the sum of its parts, and the use of the excerpts increased public knowledge of the original works without being a full substitute for them.

“The practice of creating a training set is squarely in the fair uses of the past,” Tushnet said. What’s different is the question of whether an AI model can spit out large portions of text that mirrors a copyrighted work without obliging the AI owner to pay for that work. This is yet to be tested in court. 

Tushnet and other legal experts warn it’s risky to interpret individual rulings that sided with the principle of fair use as a guide to other cases, and much depends upon how the platform in question is ingesting and employing the copyrighted content. 

Some major news outlets have cut licensing deals with AI companies, but more individual authors have chosen to sue, in a sign of the challenges of scoring lucrative deals for smaller bodies of work.

Potential solutions

While they argue their case in court, the AI companies continue to negotiate commercial terms with publishers. That suggests the eventual judgments are more likely to influence how money is eventually split between those opposing parties than to challenge the actual principle of training chatbots using works protected by copyright. 

Rolling back tools like ChatGPT would disrupt an industry dominated by American companies that could end up generating hundreds of billions of dollars in revenue, and US judges may be reluctant to do that, according to legal experts not involved in the cases. If damages are imposed, they are unlikely to amount to hundreds of billions of dollars, they say.

The Associated Press already struck an agreement with OpenAI in July 2023 to license its content for training ChatGPT in exchange for an undisclosed sum. Germany’s Axel Springer reached a similar news content deal worth tens of millions of euros. The chatbot’s responses to user prompts will include attribution and links to the full articles, the company said. 

CNN, Fox and Time have also had talks with OpenAI to license their content. 

Individual authors filing suit have sought monetary damages. Basbanes, for example, has asked for US$150,000 for each work he alleges was infringed. He is also asking for a share of the profits that he claims OpenAI has earned from using his works. 

Rulings in favor of copyright holders could potentially harm smaller AI startups that can’t afford expensive licensing deals, handing a potential advantage to the bigger players. On the other hand, claimants like the New York Times say they risk losing some of their paying audience if readers can prompt an online tool to reproduce portions of their work. 

“There’s no right answer,” said Jennifer Jenkins, a copyright law expert and professor at Duke University. “But if history is a guide, the sky hasn’t fallen yet, and I don’t think generative AI systems are going to disappear.” BLOOMBERG



BT is now on Telegram!

For daily updates on weekdays and specially selected content for the weekend. Subscribe to



Get the latest coverage and full access to all BT premium content.


Browse corporate subscription here