How Generative AI's work

Generative AI has been much in the news both in RPG-land and in the general media. I do a fair amount of work in this area, so I thought it might be helpful to give a worked example of how generative AI works. For text, the AI repeatedly tries to guess the next word in the sequence and keeps going until the special "|end|' token is its best guess, which means that it is done.
It has no idea of truth, falsehood, safety, appropriateness or tone. It just wants to give you a plausible next word; any qualities of that next word are determined by how it was trained and the previous input. This is why, if you start being aggressive with an AI, or give it weird prompts, it gives aggressive or weird answers -- because that output fits the input you have given it best.

However: Commercial AI's have a "secret sauce" filter on top of the basic operation which modifies this basic algorithm to make it more sane. I'm not covering that layer in this post.

So, let's ask an AI: What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush?

You and I might look at that list and go "huh, British Pop/Rock artists" and that would generate a list of possibilities and we'd select one as an answer. This is not how GenAI works even slightly. Instead it applies its (currently 7-70 billion or so) parameters to the words "What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush?" and comes up with a set of possibilities for the next word. Actually, not really a word, but a token, which can be part of a word, as we will see.

Things to note:
  • That is the complete description of what it does: Choose a plausible next token from a sequence of input tokens
  • From the set of possible next tokens, it chooses one at random with probabilities proportional to how likely it thinks each token is.
  • You can control how random that choice is with the temperature parameter. At zero, it will always choose the most likely answer (and so become mostly deterministic). At values over 1 it gets a little wild ...
  • For each output token, it has to run a repeated complex model involving 70 billion parameters. This is not cheap.

-------

So, onto our worked example: What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush?

I'll set the temperature to 1.2 to get some extreme suggestions and ask for 5 different responses. For each response, I show the token chosen and its probability together with the top 5 alternatives that were given. Note that the possible responses all start with the same basic probabilities: David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%). That's because they all start with the same token sequence. Once the first token is chosen, then they diverge.

Response 1: Not enough information given.
Not (2.29%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
enough (16.03%) -- possible (44.07%) • a (22.79%) • enough (16.03%) • able (2.26%) • sure (1.65%)
information (88.32%) -- information (88.32%) • context (6.28%) • data (3.24%) • info (1.63%) • details (0.18%)
given (80.25%) -- given (80.25%) • provided (17.45%) • to (1.12%) • available (0.85%) • for (0.21%)
. (93.24%) -- . (93.24%) • <|end|> (6.30%) • to (0.16%) • yet (0.07%) • here (0.06%)

Response 2: Led Zeppelin
Led (5.01%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Ze (99.99%) -- Ze (99.99%) • Z (0.01%) • ze (0.00%) • Ze (0.00%) • -Z (0.00%)
ppelin (100.00%) -- ppelin (100.00%) • pp (0.00%) • pl (0.00%) • ep (0.00%) • ppe (0.00%)

Response 3: The Beatles
The (19.12%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Beatles (61.99%) -- Beatles (61.99%) • Clash (24.43%) • Smith (3.23%) • Cure (3.03%) • Who (2.50%)

Response 4: The Velvet Underground
The (19.12%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Velvet (2.11%) -- Beatles (61.99%) • Clash (24.43%) • Smith (3.23%) • Cure (3.03%) • Who (2.50%)
Underground (99.99%) -- Underground (99.99%) • Under (0.01%) • underground (0.00%) • Und (0.00%) • (0.00%)

Response 5: David Bowie
David (27.11%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Bowie (100.00%) -- Bowie (100.00%) • Byrne (0.00%) • Bow (0.00%) • bow (0.00%) • (0.00%)
Response 5 is actually the most likely -- 'David' has a 27% chance of being chosen (the highest) and then it's almost certain we'll go with "Bowie" rather than "Byrne" or others.
One other thing to note is that when the AI chooses "The" as the next token, it has no idea at that point what comes next. It's only AFTER we've added 'The' to our token sequence, making the new input sequence "What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush? The" that it comes up with "Beatles (61.99%) • Clash (24.43%) • Smith (3.23%) • Cure (3.03%) • Who (2.50%)" and (for response 3) chooses "Beatles" or (for response 4) chooses "Velvet" -- and that last one has a really low probability. If we lowered the temperature, we'd be hugely unlikely to see that chosen.

---------

So, not necessarily RPG-focused, but I hope this post helps understand a little about how this new tech works, how you can use it, and why it does weird things. If you want to post in this thread, please keep it focused on the technology rather than the ethical usage of AI. That is a hugely important discussion, but I'm hoping to keep this thread more focused on how it works, rather than how we should manage it.
 

log in or register to remove this ad

giant.robot

Adventurer
A small clarification, you made this point but I didn't find it clear in the description: when the GenAI is choosing the probability of the next word, it's calculating the probability of the next word in relation to the current sequence of words which includes the prompt. As the sequence changes the probability of certain next token increases or decreases. The probability of tokens after the word "The" at the beginning of a sentence are myriad. There's fewer good probabilities for the sequence "The dog" and so on as the sequence grows.

The training data is a bunch of plain language sentences. So if you ask an AI about hobbits, the training data near the word hobbit likely had a bunch of statements about hobbits. So when the AI encounters the "hobbit" token the probability of other tokens from the training data near the word "hobbit" end up showing up in the output.

The AI doesn't actually "know" anything about hobbits. It pulls up tokens that were near the word "hobbit" in the training data. If you poisoned the training data and used the word hobbit to describe refrigerators, when you asked an AI trained on this poisoned data you'd end up with stories of Maytag the Hobbit from the Home Depot Shire.
 


giant.robot

Adventurer
I was aware of these details. What I'm interested in seeing reporting about is how much electricity it costs to do these processes, as opposed to the energy cost to use a human mind.
That's only interesting if you account for all the electricity has ever used in order to draw a picture or write a paragraph. It takes less than a cent worth of electricity for my laptop to generate a picture with Stable Diffusion. My power comes entire from solar so there's very few negative externalities for that fraction of a cent of electricity.

A data center running Stable Diffusion (or whatever) is even more efficient per image than my laptop. So...what do you care how much electricity is used? No one stole your electricity to do it. So long as whoever generated an image paid their bills...it's not your concern.
 

My understanding, which could admittedly be flawed, is that currently the use of large scale computing to generate images and texts is less energy efficient than human labor.

My concern is that we're getting mediocre outputs, and it's possible that even if green energy is being used for it, that's green energy not being used for a genuine need.
 

UngainlyTitan

Legend
Supporter
My understanding, which could admittedly be flawed, is that currently the use of large scale computing to generate images and texts is less energy efficient than human labor.

My concern is that we're getting mediocre outputs, and it's possible that even if green energy is being used for it, that's green energy not being used for a genuine need.
First off, this strikes me as a question about the social/economic utility of using AI, which is outside the scope of the thread. I am also not convinced that it a good metric for judgement. One may as well ask as to the utility of this discussion.
If focusing on the issues of resource consumption (power, cpu cycles, etc. server utilization) perhaps it would be worth inquiring is there room for improvement in computation efficiency in this process?
It has always struck me as a brute force approach to have a computer have a conversation.
I am also curious as to what the path from a plausible sounding chat bot to some thing that could give contextually relevant and meaningful answers.
Is this a matter of throwing more compute resources at the problem. Can different AI approaches be layers to improve things? What are the alternatives?
 

ichabod

Legned
I found this article, which seems to be a pretty good overview of the energy usage issue. The brain apparently uses about 0.3Kw per day. That's 100 times what a smart phone uses, so putting the two articles together you could generate 100,000 AI images with the energy a brain uses in a day.

Edit: However, training the models to do that takes way more energy than that, and we don't have good numbers on that.
 


giant.robot

Adventurer
Insulting other members
Concern about the energy uses for GenAI are disingenuous at best and insipid at the worst. Computers use electricity. The CPU cycles and thus power used to generate this forum post and transfer it over the Internet and then render on your screen are no more useful than CPU/GPU cycles used to generate a paragraph of text or an image. It's a thing you chose to do with the electricity you paid for, both directly and by proxy of paying your utility and ISP bill. I don't get to dictate that maybe you shouldn't "waste" electricity by posting. Nor do I get to say watching a movie on your 50" 4K TV is a "waste" of electricity. Idling on Fortnite is "wasting" far more electricity rendering a bunch of useless frames on your GPU than the handful of seconds it takes for Stable Diffusion to generate an image of a cat wearing a suit of armor. You paid your utility bill, I don't get to dictate what you do with it.

If you're worried about electricity use and pollution, worry more about smelting bauxite ore using coal power or other inefficient industrial processes. Eat less red meat to require fewer cows which produce literal shitloads of methane which is roughly 40x worse of a greenhouse gas than CO2. Be more concerned about people running Bitcoin miners buying bulk electricity from coal power plants because it's cheap. Vote in green energy initiatives in your state/country so any electricity use is greener and pollutes less than coal and oil. Stop NIMBYs from opposing wind and solar projects. Do those things before getting your panties in a bunch because someone used a computer to make a picture with a computer.

Comparing the electricity use to training of inference of AI models to a human just doesn't make any naughty word sense. All of the training data for AIs already exists. The electricity used to generate all of it is a sunk cost, you're not going to somehow get that power back. So generating training data has already happened. The actual training process doesn't actually use all that much electricity compared to a lot of other things. You can't train a large model in your garage on a single GPU but it's something that can be done renting some capacity from a cloud provider which also already exists and it a sunk cost and would be used for something else if not training an AI model. Microsoft's investment in OpenAI comes mostly in the form of Azure credits to let them do training and inference on Azure hardware that would normally cost many millions of dollars at retail rates. But again those data centers and equipment already exist and are sunk costs. They'd be used for something else when not training an AI model.

The power a human takes to draw an image is never going to be less than an inference takes. My laptop can literally run Stable Diffusion locally and generate an image in a few seconds. The utility company wouldn't even be able to bill me for such little electricity usage. It's far less power than if I was drawing in Procreate. Even if I was a super artist that could sketch some great picture of a cat in a suit of armor in a few minutes I'd be using order of magnitude more electricity than running Stable Diffusion on the same machine.
 


Remove ads

Top