The AI Red Scare is only harming artists and needs to stop.

So, the generative AI itself does not contain an obvious copy. But, a copy is made, when the data is put in the training set. The process requires copying, even if the final AI doesn't contain that copy.

No...it doesn't make a copy. This statement is factually incorrect. Models catalogue information (kinda like metadata) about the material in the training set. It learns the characteristics or patterns present in the training set. In supervised learning, labels are added to help the model classify the patterns it detects. In unsupervised learning, the model is exposed to unstructured data with no external context. Most models include various evaluation models to help it detect patterns. The system examines the training set for patterns representative of each attribute the ML model must predict. With art, this might include things like color palette, medium, artist name, etc. But there is no need to copy the original source material into the data set. There is often a preprocessing step where the raw inferences undergo further normalization or tokenization (to reduce duplication, etc). The statistical inferences about patterns or characteristics are used to train ML models to make predictions. So when you tell a generative AI to give you a picture of a kitten in the style of Van Gough, it is essentially making a speculative prediction about what such a thing might look like based on what it "knows" about the patterns associated with each term in your request. It is mapping the input data in your request against a statistical model to generate diverse outputs.

It should be obvious that the source material in the training set is not "copied into" the model because the size of the size difference between the source material and the model. At no point is a copy of the source data ingested into the "final AI". Such raw data is useless to it.

There are good reasons to raise concerns about the rise of generative AI and its impact on the creative arts, but ignorance about how the technology works isn't helpful. It sheds more heat than light. I would argue the issue is not that the model contains copies of the original material, but that the statistical predictions made by the ML model are so damned accurate these days they can produce works remarkably similar to material in the training set.

Is this copying in the traditional sense? Probably not. Is it plagiarism? Maybe and maybe not. I suppose it depends on whether you believe a map is the same as the territory it represents. If making a map is the same as "stealing" or "copying" the terrain it represents, then sure. The model is effectively a map of the patterns detected in the training set.

Does this mean that generative art is entirely OK? No. Its existence raises some real moral and practical issues. But these are NEW issues. We've never had a technology like this before in human history. So we are figuring this out for the first time. I don't want to ignore the genuine concerns of artists. Some of those concerns are real and valid. But misrepresenting how the technology works isn't helping them in the long-term - it will be used by the tech bros to discredit their case.
 

log in or register to remove this ad

And you know on what the value and the price and labor are based? On implicit and explicit social contracts; they are not some ideal result of a context-free negotiation between a buyer and a seller, but are based and power, hierarchies and laws - which means that they are always already political. Which also means that they are subject to political struggle and regulation. Which is ecactly what "protecting the rights and interests of artists" is. Struggling to get paid for your work is just as legitimate as developing new technology to depress the price of other people's work; on a societal level, both are political acts; telling one side to just roll over and die can' t be the answer.

This is absolutely true. And this is where the silicon valley ideology matters. Dotcom neoliberalism is all about destroying existing power structures that protect people against exploitation. That's a consistent pattern. It's why Amazon and Google are so hostile to unions. It's why Peter Thiel funded Cambridge Analytica to push Brexit. It's why Elon Musk is sinking so much money into the corpse of Twitter. It's why things like Uber always lead to a gig economy that strips away worker protections. And it's why the primary selling point of AI in certain circles is to disenfranchise artists and other creators. These are political problems rather than technical ones. And they demand political solutions.
 

"Not getting it" is very different from a hallucination. It happens to people all the time, due to oversimplification, misreadings, differing core assumptions ... but whoever wrong a human might get things, if, for example, you give them a list of Star Trek episodes and ask them to write an essay about Star Trek, they will not come up with the number and title of a non-existent episode to prove some point (apart from lying consciously to deceive the reader).
Erroneous conclusions with incorrect citations is how I'd categorize BOTH types of errors... and both happen in humans and in LLMs.
 

So, the generative AI itself does not contain an obvious copy. But, a copy is made, when the data is put in the training set. The process requires copying, even if the final AI doesn't contain that copy.
Nice strawman, Umbran. Time to torch it. And go way overboard on the technicalities, since you've triggered Pedant Mode.

With the exception of physical proximity to a tangible original, ALL, and I do mean ALL other forms of access require copying at some level.

This page doesn't actually even exist as a singular page; it's generated on the fly by the server from a bunch of database entries (1 per post, 1 or more per poster, the topic entry record, as well, and the topics access records). Once it's created the HTML, it copies it into a series of network packets that get copied machine to machine until they get to my machine, where they get reassembled into either memory or a file on storage medium, and then the HTML is transformed into the image file on my machine which dynamically updates as I type this response.

The act of copying is inherent in electronic media.

If the original art is a paint file on the artist's computer, it's been created in memory, then compressed, and the compressed copy stored on the storage drive as a file (which is a combination of raw data, and an entry on a database so the raw data can be found). For them to see it, it' has to be copied back to memory, usually decompressing it at the time.

For me to watch the Taylor Swift video on Disney+, they had to copy the master to their servers (plural)... and one of them sends me copies of chunks as I watch; the software does delete them from memory after they've been seen... but to see it, I was seeing partial copies.

Even Audio CD and DAT have to copy to produce the sound - mind you, they tend to store fractions of a second to single digit seconds of audio in what's called the "read ahead buffer" ... which is just a chunk of RAM holding a small partial copy of the data from the CD or DAT... the worst ones hold about 1/100 sec - while some of the better ones hold upwards of 5 sec in the RAB RAM.

Likewise, playing an MP3 file from my drive still has to copy a Read Ahead Buffer's worth continuously to actually fit the waveforms.

The act of copying it for use is of need not in and of itself a violation of copyright, because it's prerequisite for almost all uses of the material covered.

Oh, and getting the material from memory to screen? That involves sending data to the screen, which is also a copying process.

And, technically, due to persistence of vision, you've got a copy stored in your optical physiology of everything you see, for at least 1/60 second, as the cells can't fire again for a certain amount of time after firing. Vision is literally based upon that neural copy from the rods and cones... So, at an absurd level, even direct vision of a tangible original in visual proximity is still an act of copying, since there's little evidence of neural processing in the eye itself, only recording and transmitting in different medium the stimulus, itself a reflection (literally) of the tangible.
 

Nice strawman, Umbran. Time to torch it. And go way overboard on the technicalities, since you've triggered Pedant Mode.

With the exception of physical proximity to a tangible original, ALL, and I do mean ALL other forms of access require copying at some level.
This is also misleading. You are correct that the material in the model isn't a copy of the source material. Instead, it is a representation of the data in the same way a map is a representation of some real world relationships between places. Like a map, there is a process of selection and curation involved. This is why a topographical map is different to a subway map. Each model uses different weights to different features or patterns present in the training set.

Given how contentious this topic is and how high emotions can be on both sides, it is very important to be precise in the technical description of how the models work. Umbran's position isn't a strawman argument, but reflects a common misconception. However, we can't move past the entrenched positions until both sides step down from their soapboxes and start communicating with one another respectfully.
 


There's actual lawsuits in process about this very topic.

The fact is that the copyright status of these actions is currently being determined. And any of us who claim to know one way or the other is talking, if you pardon my French, complete bollox.

It's something for the courts to determine, not for randos to declare on the internet.
 

There's actual lawsuits in process about this very topic.

The fact is that the copyright status of these actions is currently being determined. And any of us who claim to know one way or the other is talking, if you pardon my French, complete bollox.

It's something for the courts to determine, not for randos to declare on the internet.
That was my point earlier. I think it'll take several years for the legal system to sort out all the implications. But in the meantime, most publishers will continue to enforce a no-AI policy to reduce potential legal risk. It's not necessarily because they are implacably opposed to the technology (although some are). It's because they are businesses and don't want exposure to risk if they guess the future legal landscape wrong.
 

For now.

Once it starts learning for itself and-or becomes able to self-train from its own productions, lookout.
I'd hardly call AI being able to steal intellectual property for itself progress.

And we've already seen what happens with self-training. AIs filling the internet with endless streams of ever more jumbled nonsense is not only not progress, it's a threat.

It's like when people kept claiming that "THE BLOCKCHAIN!" would completely change finance forever when in reality all it did was enable a lot of scams and help criminals move their money without being caught.
 

I suspect the problem is it doesn't really qualify as 'plagiarism' in the traditional sense as it's transformative--you're turning a large number of texts into a relationship between tokens.

Minor nitpick: Thiel's something of an outlier on the Silicon Valley right as most companies don't like Brexit--it's a restriction on trade after all. There are actually cleavages on the right between populist and ultracapitalist wings (for example on immigration-compare Fox and the WSJ ed page), though I'm not sure this is worth going into here. I basically agree with everything else Prime_Evil says--they want to make money and don't care who they mess up on the way there. (I also agree it's a bit unfair to give people a hard time for not understanding what is, after all, a very technically complex process only made possible by exponentially increasing computing power.)

As Morrus says, the legality on this is still to be decided. And it'll have more to do with the opinions of the judges than any technical factors.

A personal note: I'm not an artist, so my livelihood hasn't been affected (...yet; the possibility of technological obsolescence was a major reason I've refused to upgrade my lifestyle and may have influenced my decision not to start a family, though I did not expect the day of obsolescence to come this soon). But I did have the fantasy of writing a novel once I had hoarded enough money to survive without working. But, honestly, chatGPT (and novelAI) can write better than I can, so why bother? All this to say my sympathies are with the artists, though I don't know what to do--I think banning it is going to be extremely difficult given how many people have powerful computers and how widespread the knowledge of how to do it is.

Ashothero, the hour draws near.
Night falls.
 
Last edited:

Remove ads

Top