How well can a dedicated RPG GenAI perform?

FrogReaver

As long as i get to be the frog
That's true -- a better statement would be that LLMs are really good at pulling together information from multiple sources into a coherent whole. Whether that coherent whole is true is less certain.

However, if you provide information as input into the prompt, using techniques like vector-indexing / RAG, they are given a lot of weight. So if you do a good job of ensuring that the inputs are actual factual, then you strongly improve the odds that the output is. Except for the errors I made in extraction, the inputs I provide can be thought of as facts, but that is certainly not true in general.
So Facts in facts out… mostly.
 

log in or register to remove this ad

Art Waring

halozix.com
You should probably still do that. I should do better at that.
I am trying my best. I am not perfect though, I am prone to making mistakes just like anybody else.

So let me ask instead of assume. Do you believe there is any way in which it can be used that doesn’t harm others? If so, an example or two would be helpful.
I can name hundreds of use cases that are beneficial and not harmful, but many of those particular cases involve machine learning, or in some cases narrow ai, and not they do not necessarily include generative ai as it is a different class of ai (I will leave those technical details to the experts).

Machine learning that can improve medical diagnostic technology. New methods of detecting cancers and other diseases, and that is just in the medical field. Unfortunately a lot of these developments will take some time as they are not the same as a chaGPT bot, but it is a different application of the technology altogether.

Many of us have also done a lot of research, our conclusions just differ. I do believe there are risks and dangers. I don’t believe you accurately portray and/or contextualize them.
Well, that's like, your opinion man...
And you are free to have your opinion, but that doesn't give you the right to shoot me down because you don't agree with all of my opinions on the matter, like what happened above in the thread.

The short answer is yes, I read what you say and usually disagree with your take on its significance and your conclusions.
And that does not make my opinion any less relevant than yours.

What you might fail to consider is that up to now, I have given you the benefit of the doubt. I only ask that you treat me the same as I have treated you.
 

So Facts in facts out… mostly.
Right. The danger is when you ask for a fact that does not exist. As an example, if I ask Chat-GPT for a table of animals and famous stuffed toys based on them, I get this:
  • Bear: Teddy Bear (Named after President Theodore Roosevelt)
  • Elephant: Babar (From the children's book series "Babar the Elephant")
  • Rabbit: Peter Rabbit (From Beatrix Potter's stories)
  • Dog: Snoopy (From the "Peanuts" comic strip by Charles M. Schulz)
  • Cat: Hello Kitty (From Sanrio, a popular Japanese character)
  • Kangaroo: Kanga and Roo (From A.A. Milne's "Winnie-the-Pooh")
  • Monkey: Curious George (From the children's book series "Curious George")
  • Pig: Piglet (From A.A. Milne's "Winnie-the-Pooh")
  • Duck: Donald Duck (From Disney's famous character)
  • Tiger: Tigger (From A.A. Milne's "Winnie-the-Pooh")
When I then ask it to add Rhinocerous to this list, it adds this:
  • Rhinoceros: Moominmamma (Though the Moomins are actually a type of troll, their appearance is often likened to a hippopotamus or rhinoceros-like creature;
Now calling a moomin based on a rhinocerous is a bit of a stretch -- I'd classify that as an error. The LLM did actually qualify its results, which is nice, but we might have been unlucky and not have had it done so.

The problem is that it has this compelling pattern of names and toys and the pattern-matching agent really wants to keep that going. You have to give it instructions on what to do on a failure to keep it from fabricating: If I try the query add another animal to the original list of 10 , a rhinocerous. But if you cannot find a suitable stuffed toy, do not guess but instead say "I cannot find a toy", the it adds this instead:
  • Rhinoceros: I cannot find a famous stuffed toy based on a rhinoceros.
In many ways, designing good prompts is like working out the personality of someone and craftily phrasing things to bring out the best in them and let them fail graciously.
 

FrogReaver

As long as i get to be the frog
I am trying my best. I am not perfect though, I am prone to making mistakes just like anybody else.
Same
I can name hundreds of use cases that are beneficial and not harmful, but many of those particular cases involve machine learning, or in some cases narrow ai, and not they do not necessarily include generative ai as it is a different class of ai (I will leave those technical details to the experts).

Machine learning that can improve medical diagnostic technology. New methods of detecting cancers and other diseases, and that is just in the medical field. Unfortunately a lot of these developments will take some time as they are not the same as a chaGPT bot, but it is a different application of the technology altogether.
I’ll circle back to this.
Well, that's like, your opinion man...
And you are free to have your opinion, but that doesn't give you the right to shoot me down because you don't agree with all of my opinions on the matter, like what happened above in the thread.
That’s kind of my point. It’s just your opinion. It’s just my opinion.
And that does not make my opinion any less relevant than yours.
Nor is yours any more relevant than mine.
What you might fail to consider is that up to now, I have given you the benefit of the doubt.
I have you as well.
I only ask that you treat me the same as I have treated you.
I’ll try to treat you better than that.
 

Umbran

Mod Squad
Staff member
Supporter
That's true -- a better statement would be that LLMs are really good at pulling together information from multiple sources into a coherent whole. Whether that coherent whole is true is less certain.

But, at that point, the "coherent" becomes a question. It may at first seem coherent, because the LMM follows the patterns of word choice and punctuation placement it sees in its training data.

But there's no coherence of thought, because the LMM isn't thinking. We see this most quickly when we ask an AI to produce a narrative, rather than have it collate and rehash information it has been given.


Except for the errors I made in extraction, the inputs I provide can be thought of as facts, but that is certainly not true in general.

(Emphasis mine)
And that last bit is the real point. The generative AI cannot, in general, be trusted to give factual output. More broadly, generative AIs are known to "hallucinate" - Hallucination (artificial intelligence) - Wikipedia
 

FrogReaver

As long as i get to be the frog
But, at that point, the "coherent" becomes a question. It may at first seem coherent, because the LMM follows the patterns of word choice and punctuation placement it sees in its training data.
I don’t see alot of difference between ‘seem coherent’ and ‘is coherent’.
But there's no coherence of thought, because the LMM isn't thinking.
To me the ‘what is thought question’ is too philosophical.
We see this most quickly when we ask an AI to produce a narrative, rather than have it collate and rehash information it has been given.
An example here might be compelling.
(Emphasis mine)
And that last bit is the real point. The generative AI cannot, in general, be trusted to give factual output.
Neither can people. When put in this context I’m not sure I see the implicit problem.
More broadly, generative AIs are known to "hallucinate" - Hallucination (artificial intelligence) - Wikipedia
I find the question of when can it be trusted to do so and why doesn’t it always do so far more interesting. Some of that may even be controllable in the near future.
 

And that last bit is the real point. The generative AI cannot, in general, be trusted to give factual output. More broadly, generative AIs are known to "hallucinate" - Hallucination (artificial intelligence) - Wikipedia
Definitely, yes. In my organization, we have a number of rules covering AI usage. One is that AI output must always be reviewed by a person, and another is that ownership and responsibility for the results is by the person using the tool, not the tool itself. So, basically, if you use an AI and it makes a mistake, you are responsible for it.

On the other hand, that is generally true for people also. If I have a colleague prepare a slide deck for a presentation, and it has a factual error in it, then I am responsible for that deck. People make errors and we have guards and checks for them also.

Where GenAI is currently safest to use is in an advisory capacity. It is least likely to give errors when it has fine tuning on a domain, or when it uses an indexed search into known facts from that domain. But, just like you or I, it’s always going to make some mistakes and we need to guard against them.

In my work, we don’t consider AIs as a way to get answers. We consider it primarily as a tool to save time — getting started on a letter, providing a summary of 100,000 words of accumulated reports, suggesting when some text contains specific language we want to detect. For this thread, the AI I adapted to tell me about The One Ring is pretty untrustworthy. I correct one-two facts every time I copy a block of output into my notes — but it takes me half the time to check and fix than it would to write from scratch. In a real-world application, it can do much better. In a recent study I helped with, it disagreed with experts at about the rate they disagreed with each other.

So, yes, absolutely, don’t trust AI output. But also, don’t write them off because they aren’t perfect. Sometimes “mostly right” is pretty damn useful.
 

Umbran

Mod Squad
Staff member
Supporter
I don’t see alot of difference between ‘seem coherent’ and ‘is coherent’.

And you won't see a lot of difference between "seems like a friend" and "is a friend" until after you find out they've been skimming from your wallet. Teh difference is in details.

A passage can seem coherent when each sentence looks grammatically sound, and individually reads well enough. But they fail to be actually coherent when strung together if one does not logically follow from another, information is missing, there is repetition that is not internally consistent, or topics change without explanation.

To me the ‘what is thought question’ is too philosophical.

An example here might be compelling.

I do not have an actual example document at hand, and this discussion is not important enough to me to go and build one myself for supporting my position.

I have, however, read narratives created by ChatGPT, for example. You may start with Greg, Sam, and Beth as characters, but on page two Sally appears with no note of who they are or how they got there. Greg's hair color changes several times over the course of several pages, and in one paragraph we are told that they are driving along in Beth's VW, and a page later they are standing in an ice cream parlor with no transition.

The AI can't form a narrative, in which event A causally leads to B leads to C. Because, when it is forming paragraph 17, it is not referring to any prior paragraphs for content, context or continuity, because a LLM doesn't construct its bits based on content, context, or continuity. It doesn't have a concept of causality, of an "event" in a narrative that has "impact and consequences" later in the narrative. It doesn't have the concept of a character that is a person who needs to have consistency of personality or behavior or desires, etc.

The LLM is effectively only doing short-range pattern matching of words and punctuation. The semantic content isn't relevant.

Neither can people. When put in this context I’m not sure I see the implicit problem.

You don't see the problem???

Well, let me ask you - how many people marry their laptops? Pretty much none, right? And the ones that try to do so would be looked at as... strange, right? Ergo, the person-to-person relationship is not the same as the person-to-machine relationship. Therein lies the problem.

What people expect from other people, and what they expect from machines are not the same. We know that our fellow humans can be unreliable. But, we tend to expect our machines to be reliable at what they do. That's pretty much the entire point of having a machine to do stuff for you rather than have another human do it.

I find the question of when can it be trusted to do so and why doesn’t it always do so far more interesting. Some of that may even be controllable in the near future.

I've already laid that out in broad strokes - the LLM does not have abstract thought, or understanding of the content of the words it is putting out.
 

Because, when it is forming paragraph 17, it is not referring to any prior paragraphs for content, context or continuity, because a LLM doesn't construct its bits based on content, context, or continuity. It doesn't have a concept of causality, of an "event" in a narrative that has "impact and consequences" later in the narrative. It doesn't have the concept of a character that is a person who needs to have consistency of personality or behavior or desires, etc.

The LLM is effectively only doing short-range pattern matching of words and punctuation. The semantic content isn't relevant.
A point of clarification here. First, LLMs do base their predictions based on previous paragraphs and do indeed try for context and clarity. This is the "context window" which you hear a lot about. For generating stories, you would need a pretty decent context window, so if you use a small window (E.g ChatGPT 3.5) It will forget pretty fast. If you use a model with a large context window (say a 128K one) it will be generating the last paragraph of your Great Gatsby - sized novel using every bit of context you entered before.

My guess is you have seen results from early, small-context LLMs. They should not be used for generating narratives.

Second, stating that they only do "short-range pattern matching of words and punctuation" and ignore semantic content is a bit of a mischaracterization. I've already noted that their context windows tend to be much longer now, but also they don't match words and punctuation -- they translate those words into a high-dimensional vector called an embedding and match using those.

Those embeddings are a way of promoting the words to semantic meaning. LLMs do not explicitly give semantic meaning to anything, but similarity of embeddings is very close to what we mean by similarity of semantics. So I tried a word-based match of "best fighter anywhere right now" against "current heavyweight boxing champion" it would score very low, but the embeddings would be very similar.

I don't subscribe to the strong statement that this shows that an LLM understands what it is doing. I don't think that it it understands semantic content*, but I do think that it is good at semantic similarity. So although it doesn't know anything about what makes a scene in a book, it does know that in the context of writing novels, there is semantic similarity between paragraphs of a very particular form.

(*) I read a paper recently where the authors used LLMs to generate a knowledge graph, which is about a good a construct for defining semantics as I know of for computers. I'm still thinking that saying that you can transform data into a state that shows knowledge proves that you actually have knowledge, but it's not a bad argument, to be fair.
 

I do not have an actual example document at hand, and this discussion is not important enough to me to go and build one myself for supporting my position.

I have, however, read narratives created by ChatGPT, for example. You may start with Greg, Sam, and Beth as characters, but on page two Sally appears with no note of who they are or how they got there. Greg's hair color changes several times over the course of several pages, and in one paragraph we are told that they are driving along in Beth's VW, and a page later they are standing in an ice cream parlor with no transition.
No worries -- I went ahead and created an example using the latest version of ChatGPT. I gave it five different prompts to generate five scenes with very simple prompts like "next scene. make it a love scene" and "tie everything up"

Here's the results: https://willsfamily.org/files/misc/GPT Story.pdf

Apart from it being a banal and exceptionally familiar piece of work, with a weak ending, you can see that current LLMs are much better than you describe. The scenes not only follow each other, they use information from one to guide the next (the key and letter information; also the fireplace) and they doing keep character descriptions consistent. I'd argue that it's very coherent. And very boring.
 

Split the Hoard


Split the Hoard
Negotiate, demand, or steal the loot you desire!

A competitive card game for 2-5 players
Remove ads

Top