Judge Rules That AI Training Doesn't Violate Copyright

What this case does NOT cover is the USE of the database/LLM.

This ruling is about a case where the AI company bought a huge number of books and then scanned them in a destructive manner. The books weren't resold so the company keeps their right to a copy of each book. Those books could be loaded to a normal database and searched. The AI company jammed them in a a really lossy database called an LLM.

That's the ruling in this case. Was it legal to buy a book and put it in a computer running the worst database in existence. And yeah, if you buy books, you can put them in a computer.

Agqin, this case does NOT cover is the USE of the database/LLM.

If this database/LLM is for Sam Altman or Mark Zuckerberg personally, its all good. Legit copyright use. If its for internal business use, a corporate library, that's probably fine, but no making copies of the books for everyone since you only started with one of each book. You want to sell access to it? That might be less fine. A private library is a real business model but it has limits. Like, copying 3 pages from a book is OK but copying the whole book is bad. Can't use your right to a copy to de-value the marketplace.

LLM using author's work to compete with the author de-valuing the marketplace? Totally not part of this case.
 

log in or register to remove this ad

What this case does NOT cover is the USE of the database/LLM.

This ruling is about a case where the AI company bought a huge number of books and then scanned them in a destructive manner. The books weren't resold so the company keeps their right to a copy of each book. Those books could be loaded to a normal database and searched. The AI company jammed them in a a really lossy database called an LLM.

That's the ruling in this case. Was it legal to buy a book and put it in a computer running the worst database in existence. And yeah, if you buy books, you can put them in a computer.

Agqin, this case does NOT cover is the USE of the database/LLM.

If this database/LLM is for Sam Altman or Mark Zuckerberg personally, its all good. Legit copyright use. If its for internal business use, a corporate library, that's probably fine, but no making copies of the books for everyone since you only started with one of each book. You want to sell access to it? That might be less fine. A private library is a real business model but it has limits. Like, copying 3 pages from a book is OK but copying the whole book is bad. Can't use your right to a copy to de-value the marketplace.

LLM using author's work to compete with the author de-valuing the marketplace? Totally not part of this case.
Are you a lawyer? I am not, but what you are describing seems very different from what the article is describing, in which the judge seems to be making an important ruling re. the transformative aspects of LLMs, which it seems to me (not a lawyer) could have far-reaching ramifications in US law. However, your characterization seems very dismissive.

If you’re a lawyer, can you explain why this case is unimportant?

Edit: this article from a few days ago has lawyers alluding to the legal implications but is not exactly rigorous. However, it does seem to be taken as an important ruling. I’m trying wrap my head around whether this matters or not.

 
Last edited:

What this case does NOT cover is the USE of the database/LLM.

This ruling is about a case where the AI company bought a huge number of books and then scanned them in a destructive manner.

Again, something you seem to ignore - a lot of the buying of books was done AFTER they used MILLIONS of pirated copies to do the training.

Part of the point here is that buying books after the fact does not justify the piracy.

"We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness)," Alsup stated.
 

I would love to get @Snarf Zagyg ‘s take on what this ruling and the judge’s comments mean, as well as the potential significance. It seems to me that there are some distinct issues at play. I am very loathe to offer an opinion but fascinated by the topic.
 

Again, something you seem to ignore - a lot of the buying of books was done AFTER they used MILLIONS of pirated copies to do the training.
tI'm not ignoring it, I'm saying that is a different trial than this one. This judgement JUST says that you can train an LLM using books you buy.

Hey look! You gave the quote saying that will be a separate trial!
"We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness)," Alsup stated.

And not sure your "again" was directed at the right person since that was my only post on the topic.
 

Are you a lawyer? I am not, but what you are describing seems very different from what the article is describing, in which the judge seems to be making an important ruling re. the transformative aspects of LLMs, which it seems to me (not a lawyer) could have far-reaching ramifications in US law. However, your characterization seems very dismissive.

Nope, not a lawyer and even if I was, copyright law is a notable crapshoot because by statute it is case-by-case and very non-deterministic.

However a) I follow former copyright lawyer & Verge editor Nilay Patel who had a recent podcast on the subject and b) Alsop is a judge who learned how to write code to be able to rule on software copyrights so his rulings are full commentary using analogies based on things I am very knowledgeable about.

He discusses how its accepted precedent that loading a book you own to a database is legal. And he states that "training" an LLM is essentially loading data to a database. (An erratic, unreliable, lossy, and unpredictable database that sometimes lies and the difference between creativity and hallucination is in the eye of the beholder, but regardless, training = loading a database.)

So there is a legal route to making an LLM. And that's important, if for no other reason that it points out there is a legal way to train an LLM and that companies who don't use that way do not have any precedents as a shield.

But its only part of the overall LLM process, and the rest of those steps are not in this ruling. Like, there's the whole part about how you use and/or commercialize the LLM. You can make a database legally that you then use in an illegal way. I.e. you legally put copies of your book into a database then you print copies you sell. That's blatantly illegal use of a legal database.

What about printing and freely giving away out-of-print books that address some long-forgotten public health risk, and there's no in-print literature from any author? Call a lawyer and a judge, because that miiiight be fair use.

This didn't cover use of the LLM, just training.

Judges in LLM cases have started saying (in legalese) "this lawsuit against LLM failed because it made a very stupid claim but if they had said X and Y, I would have ruled very differently."
 

Remove ads

Top