EDIT: I found the quote I was trying to reply to
It’s still a plagiarism engine. Yes, ChatGPT copies text. That’s what these programs do. Take from multiple sources, copy them, cobble them together, and spit out content.
That's at best, a gross over simplification, and at worse, flat out wrong. Truth is, we don't exactly understand how LLM's work, and that makes them all the more frightening. These Large Language Models don't just rearrange words and regurgitate content. Not only are the computer scientists not even really sure how LLM's are capable of doing what they are doing, some even question if LLM's "understand" how they are doing what they are doing.
So I am with you in the sense that we need to put a hold on AI, but for a totally different reason that I will explain later. I don't think generative AI with RNNs (Recurrent Neural Networks) or NLP (Natural Language Processing) through Transformer algorithms like BERT, LLama or GPT are just plagiarizers. I do believe they "learn". Is it stealing for a human to study the works of the masters when learning how to paint? We humans learn by watching and studying others. Our styles are also imprinted upon by those that we have an affinity for. Are we all plagiarizers too?
If the argument is, "they shouldn't have taken the data without the creator's consent", that's a bit more hairy...but even then, it's not any different than what humans do. Can you stop me from studying Van Gogh, or Rembrandt to learn how to paint? Or listening to Jimi Hendrix how to play a guitar? Or imitate the dance moves of Michael Jackson?
These LLMs and Generative AI are doing the same: learning. What makes them dangerous, is that we don't know how they are doing what they are doing, the biases from the data they were trained on, and how realistic what they produce is, to the point that it can affect society (ie, think deep fake news). Jobs have always been under threat by technology. This is just the first time in history that the creatives and knowledge workers, and not just the blue collar types have been affected.
About 4 months ago, a
letter and petition was put out to have a moratorium on new LLM training and research. Last I remember, it had over 12k signatories, some of them luminaries in data science, philosophy and physics (one I recall sticking out was Max Tegmark). If you read it, the concern was that these LLM's are showing a lot of
emergent behavior that can't really be explained. If any computer scientist tells you "LLM aren't not intelligent", they are full of it. We don't know how
our intelligence works, so how can they make the preposterous claim that these LLM's haven't achieved some kind of early AGI (Artificial General Intelligence)?
A hot area of research in Machine Learning is called
explainability. Data scientists are scratching their heads
how some of these models work. In many ways, data science is a return to good old fashioned empirical science. Just run experiments, observe the results, then try to come up with a hypothesis to explain how what happened, happened. Most science today is, you have a hypothesis, then you come up with an experiment to test it, record the results and compare to your hypothesis. This is the other way around. You start with data, and try to learn what the "rules" are by testing out various statistical configurations (the models or layers in deep learning).
In classic programming: rules + data => answers
In machine learning: data + answers => rules
What machine learning is doing, is figuring out "the rules" for how something works. To simplify it as plagiarism or regurgitation is not what it is doing. It's figuring patterns and relationships, and yes, what is the next most likely word (though much much more complicated than simple Markov Chains). Some of the tasks that GPT-4 have been given are truly amazing to me, and lit a fire under my ass that I needed to learn how this stuff works or I am going to be out of a job in the next 10 years.