The AI Red Scare is only harming artists and needs to stop.

As multiple people have pointed out the copyright infringement is a major issue.
Hence my question about using public-domain material to train on.
And it being trained on public-domain material wouldn't change the fact that AI is plagiarism.
Is it, though, if it's doing the same thing you or I could (and would) in writing a college essay, in taking and synthesizing those source materials and producing what it's being asked for?

I mean, if I had the computing power and time I could go out, take a bazillion photos of different landscapes across the world, load them all into my megacomputer, and train an AI landscape generator solely off all those photos I took. Which would mean, in theory, I'd hold the copyright not only over the photos (never mind that photos IMO should not be copyrightable in the first place; that's another can o' worms) but over the AI thus produced.

And people would still find a way to complain about it.
 

log in or register to remove this ad

I know a lot of people (not personally, just in general) are against AI art, but personally I love it, I have a lot of fun creating stuff for my games, it might be an image of an NPC that or maybe a location that I can show my players. Friends and I have also, used it to create an image of our PC's, it works well (sometimes ruined by those extra digits or crazy amount of scabbards).
I think there's a difference ethically between using AI art "for fun" and using it professionally. I would consider using an AI-generated picture for a character illustration to be akin to using a photo of a celebrity. If this is something I'm doing for a home game, no-one cares. If it's something I do in a product I publish, I'm in trouble.

Digital piracy is obviously theft.
Piracy/copyright infringement is not theft. If I steal something from you, you no longer have it. If I infringe on your copyright, you still have the original product. If you're going to compare piracy to a "traditional" crime, the one you're looking for is counterfeiting. You're inflating the supply of something, thereby reducing the value of each unit, but no-one is losing the things they already have.
 

The question is whether they acquired the data set legally. As far as I know they did. They certainly wouldn't have the right to distribute that data set or sell it, but the data set itself - which I admit I haven't really given much thought to - doesn't strike me as critical to the discussion. I think we both agree they would have had a right to read the data or look at the data. I always assumed that they just picked a bunch of things on the internet to scan which they could legally read and then did so.
My objection isn't that OpenAI read a bunch of data they scanned from the internet. If all they were doing was having their AI read the contents of their browser cache as it was being displayed in the browser window, I don't have any problem with that. I don't think it's possible for a copyright holder to post publicly-facing internet content without implicitly giving permission for people (or entities) to create copies of that content for the express purpose of viewing those copies in internet browsers. Allowing those cached copies to exist is literally the express purpose of posting public-facing content online, so it would be non-sensical to say you're posting public-facing content online but also withholding that permission.

My objection would be if, as I suspect is the case, OpenAI copied the content they scraped from the internet into a training database separate from any browser cache. Saving a new copy of online content into a database separate from your browser cache is not standard practice when viewing a website, so there's no implicit permission to do so.

This has nothing to do with distribution or derivative works, and everything to do with reproducing a copyrighted work. You literally need permission to create a copy of a copyrighted work. Sure, that permission is implicitly granted to users of internet browsers for the purpose of reading a website on a browser. But at no point did OpenAI ever ask any copyright holder permission to make an additional, unauthorized copy of their work for an entirely different purpose, in an entirely different block of memory, unrelated in any way to reading it on a browser.

It would be interesting if the terms of service of any of those sites at the time specifically blocked the use of data for machine training, but I doubt it unless there was some blanket prohibition against using the text as the basis of scientific research. No one was thinking about those things at the time. No one was regularly saying anything in their terms of service like, "Sign here if you agree when accessing this information not to use it to train an AI."
It doesn't really matter what the terms of service on the websites were. Websites don't have to establish additional restrictions on the use of copyrighted content to prevent it from being copied to anything other than a browser cache. Copyright law already does that.

By default, you cannot legally create a copy of a copyrighted work without getting permission from the copyright holder. Permission to create a copy in your browser cache for the purpose of viewing a website is not permission to create an additional copy for some other purpose. You don't get to say, "Well, it's in my browser cache, so I'll just save a copy to my training database, too." By default, copyright law says you need to actively receive permission to make that second, non-browser-related copy of that data.

If, at any point, OpenAI copied copyrighted content stored in their browser cache and pasted it into a database (or other, non-browser-related file), I would content they violated a copyright. My understanding of copyright law is that OpenAI isn't allowed to copy the data in their browser cache to a new location without first getting permission to do. (And if they're skipping the browser and just scraping copyrighted material directly into a training file, I'd that's an even more blatant copyright violation.)
 

Sam Bankman-Fried, Peter Thiel, Sam Altman, etc. being the ones at the top don't inspire confidence in the honesty of AI creation. The people who claimed "THE BLOCKCHAIN!,"cryptocurrency, NFTs, etc. were the NEXT BIG THING are the same one now claiming the same of AI.

Massive embezzlement, pseudoscientific blood transfusions (it's like a bad satire, the rich guy's literally trying to drain the blood from young people), and a con man, at what point do you admit AI's overrun with the least trustworthy people?

Which is more likely: Dishonest people are being dishonest again or they've somehow run into the one singular time they're right about a technology after endorsing fraud after fraud?
 

The problem with analogies is that unless there is a one to one and onto relationship between the analogy and the thing that it is supposed to represent, then the analogy exists to confuse and obfuscate truth and not illustrate it.

The analogy exists because someone used it. You perhaps should refrain from telling people why they did so, because you cannot read minds and may very well be incorrect.

For example, sausage is a "collage" or "mosaic" of meat, but as I've already pointed out that analogy of a collage or a mosaic is itself deceitful and wrong about how AI text and art is produced and those that use it are liars (and have to use human produced collages to represent the supposed AI work). The AI doesn't really store pieces of what it consumes. It stores the contextual relationships it discovered between the things. The more it trains the less information it has about any one text or image. It's learning patterns, and the patterns between thing that make up language or art are not and cannot be copyrighted. Nor does your analogy work because it didn't take anything from the owner. It didn't even use a meat printer to make an illegal copy of the meat (if that even could be a thing and even if the law protected against it). A better analogy would be that it looked at the meat and made a painting of it, but that analogy is so weird that while it is more accurate than your sausage analogy it's probably best that we just don't use analogies at all because I doubt you understand why that analogy is more accurate.



No, I don't. That's the point. I see no (relevant) difference between a human mind being inspired by art and an AI mind being inspired by art.

With respect - AI minds are not currently provided any of the rights and privileges of humans, so either we are undertaking a grave miscarriage of justice, or there IS a difference between AI minds and human ones.

If you are not, in fact, an expert in human cognition, maybe you should wonder if you are in a position to have a relevant opinion on this line.

... But no piracy occurred in this case because no permanent copy of the information was made.

"Permanent" is not a criteria that is used to determine if copyright has been violated.

Making a copy of intellectual property in digital format without permission - even into a cache that you will later erase - is generally not allowed unless it is specifically within "Fair Use". The RIAA has seen to that.

As long at the people making generative AIs were doing so for "academic purposes" and providing the results free for academic use, they had an argument that might hold up. Once you start selling the feature, and the results, however, that argument evaporates.

Until you have generative AIs required to attend school along with the 8-year-old kids, training an AI is not "education" of that AI. It isn't a sentient being that government and society has a vested interest in developing. It is a tool for a corporation makes to sell for cash.
 
Last edited:

Question for the AI detractors: if AI only used public-domain material to train on, would that make it acceptable?
This question wasn't addressed to me (I'm not necessarily on one "side" or the other in this debate), but I just wanted to go on record with this answer:

I believe in the public domain as a fundamental human right. Once something has properly entered the public domain, I believe people have the right to create derivative works from it, with any tool at their disposal, up to and including AI. Content in the public domain belongs to all of humanity.
 

I'm not sure how they did it but the key point is that as far as I know they obtained all the data legally. The point I was making was that merely obtaining an electronic copy doesn't violate copyright or everyone's browser cache would make them a felon.
Accessed, not Obtained. Accessed.
Obtaining implies ownership.
 

You're not literally copying the entire thing and then outputting part of it.
Actually, in many cases, musicians are. Garth Brooks. Uncle Cracker. Gwen Stefani.
Any band doing a cover, many of which do get just far enough to sustain a new copyright.

Every classicist doing the styles of Mozart, Beethoven, Bach, or Brahms - all of whom were largely formulaic. As in, given the content less one harmonic line, one can fill in the missing line note perfect level of formulaic.
(take 2 years of music theory and you're still just learning how to emulate them.)
Do you not know the difference between inspiration and plagiarism?
The difference is legally irrelevant.
Academically, 3 sources is inspiration, 1-2 is plagiarism (per all of my assorted instructors in Music, History (both to BA level), and Education (MA level).
 

If you are not, in fact, an expert in human cognition, maybe you should wonder if you are in a position to have a relevant opinion on this line.

My problem is I'm remarkably unconvinced anyone making claims regarding that subject for any purpose is really as knowledgeable as they think they are here; there's way too many incentives for people to believe what they want to believe regarding this topic, including people who study one or the other half of it professionally.
 

I've followed this discussion (which is actually pretty informative, thank you everyone!) because I'm really not that knowledgeable about AI art, but very interested in getting an idea of how it works and what it implies ethically. My takeaway at this point, from a purely moral standpoint, is:

What seems morally (and maybe also legally) wrong is using art without permission for commercial purposes (I'm not even sure if the question where and for how long it is copied/stored is that essential). That seems to be beyond the realm of fair use for me. Someone is taking people's art and training an AI with it to later sell the services of said AI, without the creators of the art seeing any recompensation. I don't think the question whether the resulting art could be considered plagiatory is that important.
 

Remove ads

Top