Anthropic Agrees To Pay $1.5 Billion To Authors For Using Pirated Books To Train AI Models

One of the most-watched cases in the world of tech appears to be nearing a resolution.

Anthropic has agreed to pay $1.5 billion to authors for using pirated copies of their books to train its AI models. Anthropic’s move will settle a class-action lawsuit brought by book authors who had accused it using their works without their permission. If approved by the judge, the settlement will be the largest copyright recovery ever.

The case was brought by three authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson, who’d alleged that the use of millions of books to train AI models amounted “large-scale theft”. A judge, however, had ruled in June that Anthropic training its model using millions of books didn’t infringe copyright, and was protected under “fair use”. However, the judge had observed that given how Anthropic had used pirated books from the internet to train its models, that amounted to a violation of copyright law. Anthropic had later bought copies of many of the books that it had used after its use of pirated works had been highlighted.

Anthropic has now agreed to pay authors about $3,000 for each of an estimated 500,000 books covered by the settlement. The money would go to authors whose books were included in Pirate Mirror Library and Library Genesis, two databases of pirated books Anthropic used for training. “As best as we can tell, it’s the largest copyright recovery ever,” said Justin Nelson, a lawyer for the authors. “It is the first of its kind in the AI era.”

Experts had feared that had Anthropic not settled, it could have lost the case and been made to pay even more money. “We were looking at a strong possibility of multiple billions of dollars, enough to potentially cripple or even put Anthropic out of business,” said William Long, a legal analyst.

But a $1.5 billion payout will be seen as a win for Anthropic, which had announced a $13 billion fundraise at a valuation of $183 billion just last week. The lawsuit was being keenly watched by the AI industry, given how most players in the space had similarly used not always pirated, but copyrighted material to train their models. But the judge decreed that Anthropic’s use of these books was fair. “Like any reader aspiring to be a writer, Anthropic’s [AI large language models] trained upon works not to race ahead and replicate or supplant them – but to turn a hard corner and create something different,” US District Judge William Alsup had said in June. Anthropic is nevertheless getting a rap on the knuckles for using pirated copies of these books, but will take it in its stride — the settlement likely paves the way for companies to use millions of copyrighted works together to create AI models. And given AI budgets for datacenters and infrastructure, paying for copyright usage could end up being a rounding error for these AI giants.

Posted in AI