The AI Red Scare is only harming artists and needs to stop.

So if we're going to reframe the argument as "yes it's breaking the law...but which law?" I'm okay with it. The point has always been: the law is being broken.
I want people to use a different word than stealing when... nothing has been stolen. Just like you don't accuse someone of theft when they burn down your house or hit you with a bat.

And more to the point; just because I might argue "arson isn't theft", does not mean I have stated "arson is not a crime".

I'm making a linguistic argument, not a legal one.
 

log in or register to remove this ad


I don't know what they did, but if this is the basis of your objection, are you saying that if they fed the input directly into the training program without making two copies of it on the computer (indeed potentially just feeding it into a service of some sort and thus never making any permanent file of it) that in your mind the whole procedure would now be legal?
I don't know what you mean by feeding input directly into a training program without creating a copy of it. I mean, I understand that input can be parsed and processed a few bits at a time without ever storing a full copy of the data, but sequentially parsing and storing every bit of data from a file in a buffer is functionally equivalent to copying the file. If you hire a hundred people to photocopy one page each of a hundred page book, you've effectively copied the entire book, even if those pages are never assembled all at once in the same location. The actual act of copying the book has effectively occurred in full.

Because if I had known that some judge was going to rule that way and I was running an AI firm I totally would have had the programmers write things that way in order to follow the strict letter of the law. But, I believe you technical approach to legal rights fundamentally breaks down as non-transparent law. The law should never be such that it involves unforeseen technicalities.
I'm not a lawyer, so I can't really have a formal "approach to the law." That being said, I don't think my philosophy regarding copyright law is particularly technical. I'm just considering the spirit of the law: If you hold a copyrighted on certain content, you basically have the exclusive, transferable right to profit from that content. That's all. If you give someone permission to read a copy of that content, they can create a copy of it for that permitted purpose if creating a copy is the only possible way to read it (as it is when reading a website). If you don't give someone permission to use a copy of that content for some other (non-Fair-Use) purpose, they don't get to use it for that other purpose.

A copyright isn't an opt-in right, where the copyright holder has to explicitly enumerate every possible process which might copy or distribute their content in order to prohibit others from using their content in that way. Copyright is an opt-out right, where every possible (non-Fair-Use) process which might copy or distribute the copyright holder's content is prohibited unless the copyright holder has given their express permission for their content to be copied or distributed in that way.

And there is also precedent for why your technical letter of the law approach is flawed that I mentioned before and that is internet browsers. Open AI is far from the first company to scan the whole internet into a database and then make a derivative work of it. Google is the first company to do that. And they are still doing it. They have web crawlers that go out and read all the words, put them into a database and use that data as the basis of making search engines. They then use that search engine as the basis for developing revenue from ads. So your strict letter of the law approach based on technicalities makes not only training an AI illegal, but also building a search index for a web search.
I don't agree that a search engine and an AI training set are legally or morally equivalent in any way. When someone posts public-facing content online, they are implicitly giving permission for internet users to find and read that content via the internet. The express purpose of internet browsers and search engines is to enable internet users to find and read internet content. Those two technologies are making public-facing content available in the manner copyright holders intended, without doing anything further with that content.

AI training sets do absolutely nothing to help internet users find and read any copyrighted content used in the creation of those training sets. If those training sets copy or distribute copyrighted material in any way that isn't expressly Fair Use, they're violating the content creator's rights. The creators intended for their content to be available to read on the internet, with all of the necessary permission implied by that intent. They did not, at any point, give anyone permission to copy or distribute their work in any way aside from merely making it available to read online.

And further, being based on a technicality as I said I could just do this without storing a file at all. And heck for all I know, they didn't ever store a file. Maybe they just put this all into some sort of database structure immediately upon crawling the web with a custom web crawler.

The web browsers that you use to search the web are just one way of accessing and displaying the information on the internet. You can - and I have - write automated web crawlers that involve no human viewing at all and which just download information from websites. In my case, I was downloading genetic transcriptions from NCBI for use in things like automated annotation and eventually protein folding, but you can do this to any website. I have for example for a while now considered writing a simple crawler (with a suitable wait period between requests) to download all my past posts at EnWorld so that I'll have a copy in the event EnWorld blows a fuse.
Depending upon how they're used, I would say web crawlers may or may not be violating copyright laws. The NCBI is a government organization, so by my understanding of U.S. copyright law, the content it creates during the course of fulfilling its government function isn't protected by copyrights (and even if it were, non-commercial, academic use of its content is Fair Use).

Also, archiving a website to preserve its content in case of data loss is a long-established Fair Use case, so a web-crawler isn't violating any copyrights by, for example, creating a back-up copy of ENWorld for the express purpose of data preservation.

On the other hand, if I use a web crawler to find and download all bootleg copies of Disney films posted anywhere on the web because I want free copies of Disney films on my computer, I don't see any way that's not violating copyright law. Ditto if I use my web crawler to find and download all copyrighted images posted anywhere on the web because I want to use those images for some non-archival purpose (i.e. training an AI).

So is Google also guilty of a mass copyright violation and can be sued by every website it's ever crawled with its own web crawlers? Is every web search engine also guilty of copyright violation?
If Google starts using its web crawlers to do things which violate copyright law, then yes, I think every website creator should sue Google into the ground in a massive class-action lawsuit and win. (The EU courts might even let something like that happen, given their track record with tech companies.)

As I noted above, though, I don't see how search engines violate anyone's copyright. They are specifically enabling the permitted use of the copyrighted content in the manner the copyright holder intended for it to be used.

I think you are getting lost down a technical rabbit hole that doesn't really matter.
Since it seems you don't think my position in this debate matters, I guess we don't have anything further to discuss. I've given my two cents, so I'll bow out and give you the last word. Cheers.
 

Don't read into it. I didn't say it was inappropriate for you to say. The word "inappropriate" did not appear. Nor did any other admonishment.

I said that, if nobody here seems to have wisdom, maybe it isn't a thread for you.

Like, if you don't like smooth jazz, you can make negative statements about smooth jazz, but you really aren't going to get much out of a concert of smooth jazz.

Not the same thing. The fact I don't think its either clear that AI is doing the same thing (at least for context relevant purposes) or isn't is necessary to have an opinion here, but I do think if you're going to claim one or the other in authoritative way for your argument its your obligation to prove it, not the opposite to prove you wrong.

There's ways to argue your point with qualification. That's not what happens in a lot of these threads.
 


Of course there are different standards at play here. Humans have personhood and rights under the law. AI does not have have personhood and rights under the law.

That's not the argument I'm taking issue with. Its the argument that teaching an AI is in a fundamental and relevant way different than teaching a human with the same materials. If you're going to argue that, you get to present evidence, not just take it as a given.

There can be all kinds of arguments hostile to AI that are based on different criteria that are valid on the face of them. That isn't one of them.
 

Yeah. Because AI isn't about culture. It is about the data you put into it. Cultural bits are only one kind of data.

Thinking about AI (generative or otherwise) from only the point of view of writing fictional text or making pretty pictures is wearing blinders so you don't see most of the possibilities.

It doesn't help that some people don't even make a distinction between generative and other forms of AI in their arguments (though some, but not all, can be distinguished by context).
 

At the end of the day, the problem with AI IMO isn't this idea of theft, its the fact that artists are losing jobs. I work closely with an amazing artist who loved AI at first and hates it now, because he thinks it'll ruin the job opportunities for artists in the future. I cannot say he is wrong for feeling this way. Book covers, posters, ads -- all those things can be done now with AI, and in a year or two, the quality will match some of the best artists that live. The regulation we need is to ensure that people who want to do art can still do art and make money for it.
 

The regulation we need is to ensure that people who want to do art can still do art and make money for it.
I'd settle for transparency as a hard requirement. If a product uses AI, they need to be required to state that fact on their labels, along with pertinent info like which AI product (PhotoShop filters, ChatGPT, etc.), what kind of AI (adaptive/generative/etc.) and so forth. Just like listed ingredients on a consumable product are required. To make this easier, they could develop standardized symbols, or a rating system--a white "A" means Adaptive + PhotoShop, while a blue "G" means Generative + filters, I'm just spitballing ideas here.

The goal would be to allow consumers who are passionate about the issue, one way or another, to make an informed decision before they purchase a product. And it would also be very informative to see who would oppose that transparency.
 

I'd settle for transparency as a hard requirement. If a product uses AI, they need to be required to state that fact on their labels, along with pertinent info like which AI product (PhotoShop filters, ChatGPT, etc.), what kind of AI (adaptive/generative/etc.) and so forth. Just like listed ingredients on a consumable product are required. To make this easier, they could develop standardized symbols, or a rating system--a white "A" means Adaptive + PhotoShop, while a blue "G" means Generative + filters, I'm just spitballing ideas here.

The goal would be to allow consumers who are passionate about the issue, one way or another, to make an informed decision before they purchase a product. And it would also be very informative to see who would oppose that transparency.
Fantastic ideas.
 

Remove ads

Top