Unbelievable Scale of AI’s Pirated-Books Problem

I 100% realize my anecdote isn't the point of what you wrote, but it made me think of something to add to this. A friend of mine years ago received a notification from his ISP that they had provided his contact information to Paramount at the request of their legal department. A few months goes by and then he received a letter in the mail from Paramount's legal team regarding a movie of theirs they had determined he downloaded illegally, which checking his media server he confirmed he did have the movie in question. The letter offered him 2 options: pay a legal fee of a few thousand dollars and they'd go away or potentially end up being taken to court by them and pay much more. He consulted a lawyer and was advised to settle if he didn't want to risk things going further so he did. He hasn't heard anything since and also wisely stopped torrenting media.

So while this didn't reach the courts to determine penalties against him, it ended up costing him more than the $15 or so the movie would have cost him to buy. It would be interesting to see everyone who created something Meta torrented taking them to court for damages and seeing how that shakes out in court.
I expect this will result in some class action suits.
 

log in or register to remove this ad


Likely. Several authors I follow are really angry over this. I can't begin to imagine what the publishers and their lawyers are planning but I can't imagine them just sitting back and doing nothing.
If past is prologue, it will drag out for years and the mega-corps will either be let off the hook or they will be given a relatively minuscule fine that will, after lawyers’ fees, result in something like less than $1 for each affected author.
 

If past is prologue, it will drag out for years and the mega-corps will either be let off the hook or they will be given a relatively minuscule fine that will, after lawyers’ fees, result in something like less than $1 for each affected author.
And to address something someone said upthread about Meta's legal team probably having a fit about this, I doubt it. They were likely consulted, presented the worst case scenario and Zuckerberg decided that was less than the profit he saw in going down this path so cost of doing business basically.
 

And we have to be clear, there are two separate violations here. The first was that they literally downloaded the archive via a torrent. They stole the works, plain and simple. That was them getting the data in the first place. The second is that by using those books against their licenses (stated in the copyright statements at the beginning of every book), each use of is a further injury to the owners of the works.

At the end of the day though, I think we're going to see "AI" die down a bit "soon". They're too unreliable for a lot of business use, the IP issues are going to cause a mountain of litigation, they cost a ton of money to operate but don't seem to have much of a model for revenue, and then there's the environmental issues. Unless there's a significant, and I mean really significant, change in how they work and how much they cost to operate, I don't see them riding this big for long. It just costs way too much and right now everyone's speculating that there's a killer app in there to make a mountain of money on to match the mountain of litigation.

Where they're doing really amazing things is in scientific applications and that doesn't have any of the IP problems we're talking about.

While I have nearly 100% in agreement with your posts, to date, I have at least one disagreement with the above post. One big use of AI is to predict protein folding. I would argue that that is a scientific application, one which has huge commercial applications.

TomB

Edit: We should clarify what type of AI we are talking about. General AI seems weak, at the moment. Domain specific applications seem to be going strong.
 
Last edited:

While I have nearly 100% in agreement with your posts, to date, I have at least one disagreement with the above post. One big use of AI is to predict protein folding. I would argue that that is a scientific application, one which has huge commercial applications.

TomB
That however doesn't require LLMs. The neural networks that predict protein folding weren't trained by stealing from books.
 


And to address something someone said upthread about Meta's legal team probably having a fit about this, I doubt it. They were likely consulted, presented the worst case scenario and Zuckerberg decided that was less than the profit he saw in going down this path so cost of doing business basically.
Exactly. Along with all the other “AI” companies who stole their training data. The eventual lawsuits, even if they lose, will amount to a drop in the bucket compared to their profits. This is why I’m glad for DeepSeek and other the Chinese open-sourced “AI” programs that will just keep stealing the code from Western companies and releasing it as open-source, thus completely undermining the Western companies’ ability to profit off creatives’ stolen IP. It does nothing to get the creatives the justice they deserve, but at least it’ll damage or destroy a few of the parasitic companies trying to profit off global-scale IP theft.
 



Remove ads

Top