Unbelievable Scale of AI’s Pirated-Books Problem

There also seems to be an issue with not having any laws covering AI and being behind the ball with technology until it is too late. Not sure where the piracy and AI would be different between AI and myself? If I read a book on a website, am I stealing it? I can see if I copy it to my computer. Is that what AI does, or is it just plugged into everything so it can just go reference that site anytime? A bit like me going to the library if I need to research something and using all those books for my paper.
 

log in or register to remove this ad

Training AI on existing works isn't piracy in any reasonable definition of the term. It isn't necessarily ethical, but it isn't piracy. By intentionally using an incongruity term and trying to shoehorn it into your argument, you actually weaken your argument.

More simply: if I can't ask.ChatGPT to replicate the PHB, it isn't piracy.
This isn't for you to proclaim. Copyright, etc., are all complex legal constructs and there are legal cases going through about these things as we speak. Whether it is "piracy in any reasonable definition of the term" is something that is being determined by the (admittedly slow) legal process, not hot takes on internet forums.

If even high-powered legal experts can't agree whether it's plagiarism or not, none of us here are in any position to legislate. It'll shake out in the wash, hopefully, and we'll have some clarity soon.
 

This is going to be an unpopular opinion:

Training AI on existing works isn't piracy in any reasonable definition of the term. It isn't necessarily ethical, but it isn't piracy. By intentionally using an incongruity term and trying to shoehorn it into your argument, you actually weaken your argument.

More simply: if I can't ask.ChatGPT to replicate the PHB, it isn't piracy.

You have to download the works in order to ingest them into the LLM. Furthermore, each of these LLMs have paid models, so the companies are profiting off the data that they’ve downloaded. I disagree with you.
 

You have to download the works in order to ingest them into the LLM. Furthermore, each of these LLMs have paid models, so the companies are profiting off the data that they’ve downloaded. I disagree with you.
That said, for these multi-billion dollar megacorps, I'm sure if they simply pivoted to paying for one copy of each thing they scrape it wouldn't badly hitting their bottom lines. The article says millions of books, so tens of millions of dollars, maybe, but that's peanuts to them.

The question is more one of licensing the content, not purchasing a copy of it.
 

I'm no legal expert, but taking things without paying for them (or their license) in order to turn around and profit off of them seems to be, quite literally, the definition of piracy? I think even Long John Silver and Captain Hook would agree.
 

This is going to be an unpopular opinion:

Training AI on existing works isn't piracy in any reasonable definition of the term.

The thing is, thanks to history, it is actually piracy. If you don't like that, take it up with Napster, a quarter-century ago.

Making an electronic copy for reasons other than fair use is copyright infringement. So long as they were positioning their generative AI systems as "research" they had an argument that the electronic copies of works made in the course of the training process were fair use. Once they moved to commercial application, however, that excuse went away.
 

This is going to be an unpopular opinion:

Training AI on existing works isn't piracy in any reasonable definition of the term. It isn't necessarily ethical, but it isn't piracy. By intentionally using an incongruity term and trying to shoehorn it into your argument, you actually weaken your argument.

More simply: if I can't ask.ChatGPT to replicate the PHB, it isn't piracy.

I've worked in software development for almost 40 years and deal with IP laws on a fairly regular basis as a result. Hey, software developers are authors too and the code they write is covered under the same IP laws as books and films. And you haven't had fun until you've had to really dive into a licensing agreement for a code library.

It's piracy. Period. There are emails now public that show they pursued getting the texts legally and opted not to because of cost and time. That's when they downloaded LibGen. I'm sure we all remember those stupid commercials about downloading a car, downloading the torrent was the first illegal act because at that moment they stole the works. They knew it, and did it intentionally. Whether you share them or not from there isn't the point, you've already committed piracy.

Next, there's what they did with it. I'm going to have a hard time with a fair use defense because at the heart of fair use is how much of the original content did you reuse? That's why reviews and reactions only show short clips of things or quote small sections. They used 100% of each and every text. And they're trying to squeak in under a fair use argument under the "research and training" aspect except that another aspect is whether you're making commercial use of the work you're using. As long as these companies were non-profits, they could kinda squeak by on the "research" aspect of that. But once you've got Meta, Google, and so on trying to monetize it, they're now making profit off the works they've acquired illegally. Keep in mind, they still haven't acquired legal copies of the texts, they only have the copies they pirated off the net.

And to be specific to your point about the PHB... Does Wizards allow non-open content from the PHB to be reprinted elsewhere? I can look it up on ChatGPT. That's specifically theft of WotC's IP.
 

The thing is, thanks to history, it is actually piracy. If you don't like that, take it up with Napster, a quarter-century ago.
Speaking of Napster, it's interesting how some have changed their tune (pun intended) about piracy. Either because they got older and wiser or it's finally hitting close to home for them.
 

Speaking of Napster, it's interesting how some have changed their tune (pun intended) about piracy. Either because they got older and wiser or it's finally hitting close to home for them.

Yeah its totally comparable.

In one corner, we have the university student downloading a song, making no money.
In the other, we have the international mega-corp billionaire driven company, downloading the internet, to then (somehow) spin a profit from.

Totally the same.

Just wait till they find out about Youtube!
 

Yeah its totally comparable.

In one corner, we have the university student downloading a song, making no money.
In the other, we have the international mega-corp billionaire driven company, downloading the internet, to then (somehow) spin a profit from.

Totally the same.

Just wait till they find out about Youtube!
so if it wasn't a megacorp doing it, we'd be okay with it? At what dollar point is piracy not okay? Also at which point is copyright laws/extensions okay? We used to decry Disney and others whenever they wanted copyright laws extended.
 

Remove ads

Top