Prime_Evil
Adventurer
So, the generative AI itself does not contain an obvious copy. But, a copy is made, when the data is put in the training set. The process requires copying, even if the final AI doesn't contain that copy.
No...it doesn't make a copy. This statement is factually incorrect. Models catalogue information (kinda like metadata) about the material in the training set. It learns the characteristics or patterns present in the training set. In supervised learning, labels are added to help the model classify the patterns it detects. In unsupervised learning, the model is exposed to unstructured data with no external context. Most models include various evaluation models to help it detect patterns. The system examines the training set for patterns representative of each attribute the ML model must predict. With art, this might include things like color palette, medium, artist name, etc. But there is no need to copy the original source material into the data set. There is often a preprocessing step where the raw inferences undergo further normalization or tokenization (to reduce duplication, etc). The statistical inferences about patterns or characteristics are used to train ML models to make predictions. So when you tell a generative AI to give you a picture of a kitten in the style of Van Gough, it is essentially making a speculative prediction about what such a thing might look like based on what it "knows" about the patterns associated with each term in your request. It is mapping the input data in your request against a statistical model to generate diverse outputs.
It should be obvious that the source material in the training set is not "copied into" the model because the size of the size difference between the source material and the model. At no point is a copy of the source data ingested into the "final AI". Such raw data is useless to it.
There are good reasons to raise concerns about the rise of generative AI and its impact on the creative arts, but ignorance about how the technology works isn't helpful. It sheds more heat than light. I would argue the issue is not that the model contains copies of the original material, but that the statistical predictions made by the ML model are so damned accurate these days they can produce works remarkably similar to material in the training set.
Is this copying in the traditional sense? Probably not. Is it plagiarism? Maybe and maybe not. I suppose it depends on whether you believe a map is the same as the territory it represents. If making a map is the same as "stealing" or "copying" the terrain it represents, then sure. The model is effectively a map of the patterns detected in the training set.
Does this mean that generative art is entirely OK? No. Its existence raises some real moral and practical issues. But these are NEW issues. We've never had a technology like this before in human history. So we are figuring this out for the first time. I don't want to ignore the genuine concerns of artists. Some of those concerns are real and valid. But misrepresenting how the technology works isn't helping them in the long-term - it will be used by the tech bros to discredit their case.