• The VOIDRUNNER'S CODEX is LIVE! Explore new worlds, fight oppressive empires, fend off fearsome aliens, and wield deadly psionics with this comprehensive boxed set expansion for 5E and A5E!

Sarah Silverman leads class-action lawsuit against ChatGPT creator


log in or register to remove this ad

DaedalusX51

Explorer
What makes them dangerous, is that we don't know how they are doing what they are doing,
Let’s be clear here that there are two reasons why this is being said. One, is that they are focusing on results and not documenting each evolution of the system in order to get a working product out the door as fast as possible; and two, revealing the data the model was trained on and how it works would likely open them up to many lawsuits since the data they used was not owned by them.
 

Maxperson

Morkus from Orkus
Never would work, someone always opens the box. Better have the box out there in the open and develop the social and practical defences as needed.
Sure. I didn't say it could be put back into the bag. I just agree that I was one who if he had his way, would have the tech buried so deeply that it would never see the light of day again.
 

RareBreed

Adventurer
Let’s be clear here that there are two reasons why this is being said. One, is that they are focusing on results and not documenting each evolution of the system in order to get a working product out the door as fast as possible; and two, revealing the data the model was trained on and how it works would likely open them up to many lawsuits since the data they used was not owned by them.
No, this is not just a lack of documentation. The model layers have become so deep, that the data scientists developing them can no longer explain how the program is able to solve the problem in the way that it does. Hence, emergent behavior in these LLMs. An example of this is that GPT-4 was not trained to learn how to perform arithmetic, but it can do it anyway.

That is what makes these LLM's scary. They have become so complex with so many "moving parts" that while we have a general understanding, we can't explain them. It's not much different than how we know how the neurons in our brain fire and how they have limiters that can squelch the firing (in machine learning, the analog to this are called Activation Functions, that convert the linear dot product calculated slope with a weighted Y-intercept bias into non-linear functions....these are usually implemented as ReLU or some kind of sigmoid/softmax function). But even though we have a good idea how individual neurons work, we have zero clue how the mass interaction of them creates thoughts, emotions, memory or our subjective sense of a "mind".

While it is true that the model architecture and the data set trained on are proprietary, that's not so people can figure out how they work. The scientists who created some of the new generation Transformers don't even know how they are doing what they are doing. Some customers are demanding "explainability" and honestly, they dont know how it actually works, and they are literally called black boxes.
 

RareBreed

Adventurer
To be more accurate from my earlier post, some companies do purposefully obfuscate how their models work. But from what I have been reading, once you get a million feature parameters (ChatGPT was over 100 million, and new LLM's are over a billion), basically the scientists can't explain how they work anymore. In some cases it's even less than a million. Unfortunately, there is a term "black box" that conflates two different concepts (and they need not be mutually exclusive):

  • Purposeful obfuscation for proprietary reasons (or to hide biased data sets)
  • Scientists can't explain how the model can do what it does (eg, see emergent behavior)

 

DaedalusX51

Explorer
To be more accurate from my earlier post, some companies do purposefully obfuscate how their models work. But from what I have been reading, once you get a million feature parameters (ChatGPT was over 100 million, and new LLM's are over a billion), basically the scientists can't explain how they work anymore. In some cases it's even less than a million. Unfortunately, there is a term "black box" that conflates two different concepts (and they need not be mutually exclusive):

  • Purposeful obfuscation for proprietary reasons (or to hide biased data sets)
  • Scientists can't explain how the model can do what it does (eg, see emergent behavior)


They could know how they work, but they chose not to document and regression test after each new data point was included. It was their choice to do it this way so that they could get it done quicker. It would take a tremendous amount of time and they are more interested in beating their competitors to the market than the advancement of human knowledge. This emergent behavior is the same as in a human mind. We do not have the tools to build a functioning brain a neuron at a time, but we do have that ability with these algorithms. We could be discovering how the evolution of consciousness functions by watching a brain be built one neuron at a time, but they would rather try to make fat stacks of cash instead.
 

RareBreed

Adventurer
They could know how they work, but they chose not to document and regression test after each new data point was included. It was their choice to do it this way so that they could get it done quicker. It would take a tremendous amount of time and they are more interested in beating their competitors to the market than the advancement of human knowledge
Again, this is not true in all cases. In some cases, yes, for proprietary reasons, companies don't divulge either the data used, initial parameters, and/or the model architecture. But for other cases, we simply don't know how it works, only that it does through experimentation. It's not just that data scientists won't tell you how their architecture, it's that they are unable to tell you (even if they wanted to). This is all the more true once you start getting into the big leagues, with LLM's having millions of feature parameters trained on petabytes of data.

Also, traditional regression testing does not work for most machine learning predictions. Why? Because most QA done today relies on deterministic answers. For example, given input A, I always expect output B. Much of machine learning is really statistics on steroids.

At best, you can tell if the model architecture is predicting better than some other model (or even the same model with tweaked initial starting weights, training epochs, learning rates, etc). It is an active area of research on how to best QA test machine learning.

I do agree however that companies are rushing headlong into more training and not taking the time to truly understand how their models are working. That's why that petition was sent asking for a moratorium to take time for better inference and explainabilty techniques to be developed in the data science community. How many times do we have to play matches with technology and not consider "unforeseen consequences"?

We do not have the tools to build a functioning brain a neuron at a time, but we do have that ability with these algorithms. We could be discovering how the evolution of consciousness functions by watching a brain be built one neuron at a time, but they would rather try to make fat stacks of cash instead.
Sorry, but that's just not going to happen...at least until we get quantum computers, then probably. It's also questionable why we would need or want to "recreate" a human brain (it would be an imperfect model of our own brain, and may not be necessary for true AGI).

People don't know how much compute power it takes to train these models. Everyone thinks that Cloud Compute is infinite, but it isnt [link to a pdf]. People also usually don't talk about the gathering and cleaning of data for the training but that can also be prohibitively expensive (I have seen spark cluster jobs that cost millions per week). In my experience at work, sometimes you simply can't get on-demand instances and certain machine types (especially GPU instances) are in high demand, so spot instance types are out the door. So scaling up to human brain levels of neural connections is tractably not feasible with our current tech.

Quantum computers on the other hand, thanks to superposition of quantum bits, act as massively parallel processors and can solve all instructions simultaneously. A single 64 qubit register quantum computer will effectively be as powerful as 2^64 64 bit computers (that's 2 raised to the 64th power...that's huge). Granted, I'm not factoring in anything IO bound (eg, access to memory), but still. There's also renewed interest in analog computers due to certain advantages they have specifically for machine learning.

Also, don't be fooled by the term "neural network" and assume they really are like our own neurons. Some researchers, like neurosurgeon David Hammerhoff and the esteemed mathematician Sir Roger Penrose think our neurons have a mechanism to operate at least on some level via quantum mechanics. As the eminent physicist Richard Feynman proved, our classical computers can simulate (albeit very slowly) everything a quantum computer can except one: nonlocality. If our brains work at some kind of quantum mechanical level, our classical computers wont be able to fully emulate our minds. This does not however mean they can't achieve their own form of intelligence. It just would not necessarily be like ours, even if we could have the same number of artificial perceptrons as our human brains have.

This has been my huge bone of contention with other so-called Computer Science experts saying that LLMs are not AGI and don't "think", "understand" or have "true" intelligence. They are all comparing our computers to how our brains work, but 1) we don't know how our own intelligence works (we can't even properly define intelligence) and 2) AGI doesn't have to think like we do. For #2, jet airplanes don't have to flap their wings to fly like birds do, so why does AGI have to have the same kind of intelligence as our own mind/brain?

Lastly, the view of consciousness as deriving from the brain (ie, an epiphenomenon) is only one school of thought. The truth is, we don't know how consciousness is formed, though there's ideas aplenty. As I mentioned earlier, perhaps consciousness requires a quantum mechanical aspect...or maybe not. Would we even know that AGI (whether through LLM's or something else) is conscious? We can't detect it in our selves, so how could we do it with machines?
 



People don't know how much compute power it takes to train these models. Everyone thinks that Cloud Compute is infinite, but it isnt [link to a pdf].
the_cloud.png

There's planned downtime every night when we turn on the Roomba and it runs over the cord
 

Voidrunner's Codex

Remove ads

Top