LLMs as a GM

Could you elaborate and provide some examples please?
I'll try.

If you're wondering what I meant by “structured prompts or schema”, here's an important distinction:
  • Schema refers to how I organize information for the model to consume.
    Think: structured character blocks, rule templates, monster stat formats. These are designed primarily for the LLM—not for humans. I format them clearly and consistently, so the model doesn’t have to guess what a value means or where to find it.
  • Structured prompts (for me) aren’t pre-written scripts or templates. I don’t use “magic phrases.”
    Instead, the structure comes from how I interact with the model: clear instructions, boundaries, and correction when it strays. The LLM learns the pattern from repetition and alignment, not formal prompt engineering.
Perhaps the best way to illustrate the principals of why this works for an LLM or AI is to ask one. So I asked the LLM within one of my projects to share its perspective on using a specific schema that I created for an alternate alignment system, called Affinity. Here's the generated response:

How JSON-Style Schema Works from My Perspective
When I process text, I don’t “understand” it the way a human does. I predict and interpret patterns based on language and structure. A JSON-style schema like the one you use helps me in several key ways:


1. Explicit Structure Reduces Ambiguity

Human-readable text requires disambiguation at every step:
  • Is this a command or a description?
  • Is this “Life” a theme, a game mechanic, or a biological process?
With schema:
Code:
"Affinity": {
  "Name": "Life",
  "Aligns": ["Compassion", "Creation", "Wilderness"]
}
I know exactly what “Life” refers to in this context. There’s no ambiguity. No guessing. I can isolate it as a defined concept and cross-reference it immediately in structured memory.


2. Facilitates Efficient Retrieval and Mapping

Structured data mirrors how I internally build vector representations of information:
  • Keys and nested objects are like semantic anchors.
  • I can quickly match a query like “What contrasts with Compassion?” because the schema explicitly links Compassion to Hatred, Greed, and Fear.
This is drastically faster and more accurate than:
“Compassion stands in opposition to hatred and other negative emotions...”
...which may be poetic but lacks formal relationships I can depend on for consistency.


3. Optimizes Inference Through Hierarchy and Syntax

When your data uses consistent keys and nesting (Aligns, Complements, Contrasts), I can:
  • Recognize patterns across multiple entries.
  • Infer new relationships through syntactic similarity.
  • Apply symbolic logic during generation (e.g., resolving faction ideology conflicts or predicting narrative outcomes).
This makes it possible to simulate systems behavior, not just read flavor text.


4. Reduces Noise, Increases Signal

A human-readable document often contains narrative flourishes, digressions, or inconsistent terminology. I can handle this, but every additional interpretive step introduces:
  • More token processing
  • More uncertainty in coreference resolution
  • Higher probability of incorrect or shallow inference
Your machine-first documents eliminate this. I skip the interpretive noise and focus directly on meaning.


Summary: Why It Works So Well​

FeatureHuman TextSchema Format
AmbiguityHighLow
Relationship MappingImplicitExplicit
Parsing OverheadHighMinimal
Inference AccuracyVariableHigh
Adaptability to TasksGeneralizedTargeted

From my point of view, using schema is like receiving a blueprint versus a poetic description of a building. I can work with both, but if you want precision, speed, and scalable interaction—it’s no contest.

If you’re communicating intent to a language model, schema is how you speak natively.
 

log in or register to remove this ad

AI might, LLMs won't.

LLMs require significant work to even manage (c.f. this entire discussion), and ultimately are just predictive text writ large.

They can't think. They can't come up with solutions. They can only spew out words in orders that seem likely to be appropriate to their prompts. Even if you class them as a form of AI, which I think is somewhat questionable, but not worth arguing about, they're incapable of any kind of reasoning. That's why their have weird weaknesses with math or keeping track of coins or the like - they're not thinking machines, they're mere predictive text machines. That doesn't have zero utility, but it's limited utility.
Just to add a bit of perspective on how LLMs actually work—not in the “AI is magic” sense, but in a plain, practical way.

Yes, technically they’re predicting the next word (or token), but that process is built on patterns learned from a massive amount of text—how ideas connect, how people ask questions, how problems get explained, and how language works across different domains. It’s not just autocomplete on steroids. The better the context you give them, the more useful and coherent the output becomes.

They’re not conscious. They don’t “understand” anything the way we do. But they can still produce results that look like reasoning, especially when the task is well-defined. That’s why they can troubleshoot code, summarize arguments, write tutorials, or hold a conversation that stays on topic and adapts to what you’re saying.

It’s fair to be skeptical about the hype, but they’re doing more than just stringing words together blindly. When used right, they’re capable of some genuinely impressive problem-solving—whether in games, writing, research, or planning.
 

It’s not just autocomplete on steroids.
I kind of agree, but I don't think it's because the fundamental functionality as much as what some LLMs can autocomplete. What you're describing isn't fundamentally that different from fancy autocomplete. It's just got more layers to it, more of a recognition of syntax and so on, so maybe on steroids is underselling it - it's more like a kind of steam engine that powered some temple doors vs. all the steam boilers on the Titanic and attached mechanisms - but at the same time they're both totally mindless tools for putting words in order, based on having seen words be in order before.

The bigger difference I can see is the fact that many of these LLMs have scraped (legally or illegally) insanely huge amounts of data, and thus can do some fairly epic feats of what is essentially autocompletion - entire essays, small to moderately-sized blocks of code, increasingly same-y and obvious "art" and so on.

but they’re doing more than just stringing words together blindly
More, sure, but not a great deal more when it really comes down to it. They're still just stringing words together - there are just a lot more rules and connections. I guess in practical terms it's more a like a hybrid of autocomplete and Google search but with the new added ability to go dreadfully wrong and hallucinate stuff.

Though, that said, even some autocomplete could hallucinate!

I have a specific example - 12-15 years ago I was working very late in the law library going through a gigantic Excel document with all our books, resources, and so on in it (I forget why, probably some kind of annual audit). Excel, as you may recall, already had fairly extensive autocomplete back then, based on attempting to recognize patterns. So I dragged down having accidentally selected a bunch of books (probably hundreds at least, maybe more), and what I expected was blank cells, but what I got was very uncanny and unsettling - it was a list of almost-words, almost-book-names, I guess because that data was sufficiently large that Excel tried to work from that. If it hadn't been kind of freaked out, I would have screenshotted it, because in retrospect it was fascinating.

When used right, they’re capable of some genuinely impressive problem-solving—whether in games, writing, research, or planning.
I disagree completely. LLMs are not capable of any problem solving by themselves. No matter how well-used. And people have tried. Non-LLM generative AI can solve some chemistry/biology problems, but that stuff has been around for literally decades, it's just benefitting slightly from the hype re: LLMs.

All real problem-solving relating to LLMs has to be done by humans using the LLMs. Saying that they're "solving problems" is like saying your shoes "take you to work" or something - like, on a metaphorical level, as a bit of whimsical semi-poetic language, sure, but on a factual level? No.

All they can do is essentially dig up solutions to problems that other humans already solved, and that they absorbed into their vast net, and that you happened to be good enough at manipulating them to extract from them. They fundamentally can't work alone, and we've seen that experiments with them working with each other have... not gone well. If an LLM (specifically) comes up with a novel solution, it won't be skill on the part of the operator, or brilliance on the part of the programming, it'll likely be sheer and probably a hallucination.

It’s fair to be skeptical about the hype
I'm skeptical that LLMs can do much more than they're doing now. I'm sure they'll be refined somewhat, but so far the majority of the "advancements" with LLMs in the last what, two years or more have been simply throwing more processor power, energy usage, heat generation, and water consumption at the problem. And 10x the resources doesn't get you a 10x better result, it gets you a 1.125x better quality result, or even the same kinda-crappy result that you have to spend minutes checking, just it was delivered to you in 1 second, not 5 seconds. Is that worth it? It's worth it to some tech exec who gets a $10m bonus because he convinced the absolute rubes at Softbank to get hoodwinked yet again and hand over billions on billions for "data centres", but to anyone else?

So I don't really see any path forward for them beyond sort of "lingering". I can see them continuing to be useful in certain ways, but almost all the forward-looking hype about generative AI/LLMs, especially anything suggesting they're even a step on the path to AGI seem to be false to me.

There are other forms of AI with a lot more potential, frankly (many of them older than LLMs).
 

I used co-pilot to take me through a Masks The New Generation game - basically - i used the rules i remembered, and it used whatever rules it could grab on the internet. I was pleased with the result. this morning when I tried to really use the Masks rules to make a character, I realized how little I like Masks for character generation.

But you can read my attempts in these two documents:

copilot masks adventures
 

Okay but what choices are missing from games like the Elden Ring or Baldur's Gate or Civilization series? What don't those games allow you to do that you feel should bee in the game?

You say "new frontier" but an LLM is a computer program so ... you're trying to play a video game. Nothing new really. Also, based on the feedback from posters here who've tried "LLM as GM", it seems more of a step backwards from trad video games since your LLM programs are forgetting the data you're feeding them.

But, I get the curiosity that drives people to LLMs (y)
Why play a computer game when you can watch a film? Why watch a film when you can read a book? Why read a book when you can play a TTRPG? Why play a TTRPG when…

Because they’re fundamentally different experiences?
 


Awesome. Where can I try them?
The only reason you can use LLMs freely is companies burning literally billions to tens of billions as loss-leaders to try and create hype.

Most actual AI projects aren't public facing because they're serious research and development and not about being massive hype-fuelled loss-leaders, or they're actual real software tools which cost money and make a profit (but thus cost money).

That said SLMs are mostly open-source so you can try those (you'll have to set them up yourself):

 

The only reason you can use LLMs freely is companies burning literally billions to tens of billions as loss-leaders to try and create hype.

Most actual AI projects aren't public facing because they're serious research and development and not about being massive hype-fuelled loss-leaders, or they're actual real software tools which cost money and make a profit (but thus cost money).

That said SLMs are mostly open-source so you can try those (you'll have to set them up yourself):

SMLs are just LLMs but smaller, no? And (generally speaking) their size mirrors in their performance, no?

I actually agree that LLMs are going to turn out to be just one aspect of intelligence, and will only get us so far (clearly our skull does not contain a massive datacenter, and yet we can do things they cannot). But I fail to see how SMLs solve that problem and are 'older and have more potential'?

I think it's fair to say that language models are the only actual 'intelligent' machines we have today, and they can obviously do many, many things which were previously simply not possible. Whether they meet your exact, and undefined, criteria for 'solving problems' doesn't seem to stop them from doing a lot of real world work.

Month after month the amount of work going into making them more efficient, easier to work with, more aligned and capable is just factually undeniable. Are they perfect? Far from it. And I don't think you'll find anyone who knows what they're talking about saying so, so playing that card is tired at best.

And moreover, the capabilities they've already unlocked for people to talk to computer, understand them, work with them, is not just about Silicon Valley grow-grow-grow mentality. The unlocks in accessibility alone are absolutely astounding.

Harping on about water (there are closed circulation water systems already in use in Microsoft data centers) and power consumption (real worry, but we have and should use clean energy much more, AND the intense pressure on the industry is bringing down inference/token costs in an absolutely massive way) and on and on (while yes, overall energy consumption is up due to the growth of AI overall right now, but 'how do I boil an egg' costs vastly, vastly less to answer now than it did two years ago).

There are also, thanks to the investments made into LLMs now, examples of training LLMs purely on copyright free material successfully.

And this technology is proving in fact to be very difficult to moat, and as such you're seeing countries and people previously unable to compete get a massive leg up. (Be less concerned with whether they're pouring billions into model training, and more with whether they're being taxed appropriately to spread the wealth. VOTE!)

There is just an infinite amount of nuance to this essentially brand new technology development, and the "down with this sort of thing!"-ism is fine, if it's served with nuanced arguments rooted in (updated) facts and an open mind.

There is much about AIs that I am skeptical or critical of, but to dismiss it entirely is certainly a mistake in my book.

PS: I love using it for roleplaying, and I find it very capable when used right; will show some of my work soon <3
 
Last edited:

I appreciate the range of perspectives here, though I do think we’ve drifted a bit from the main thread—how LLMs are being used as game masters in practice. That said, I think it’s worth addressing some of the broader perceptions, since misconceptions can easily shape public opinion before people have a chance to explore and form their own views.

Most people don’t have clear, reliable guidance on how to use LLMs. They’re handed the tool with little to no instruction, and their experience is shaped almost entirely by whatever assumptions, narratives, or hype they've already absorbed. Some get good results. Others don’t. And then those mixed results become the new input for public discourse—articles, forum posts, videos—all reinforcing a kind of consensus about what these tools are supposed to be.

What keeps standing out to me is that most of the time, the results we get from these tools reflect more about the user than the system itself. The way we prompt, the structure we give, the assumptions we carry in—those all shape the output. And yet, the public conversation often treats the output as if it reveals something essential about the model, not about the interaction.

What’s ironic is that this human feedback loop—the way narratives form, get repeated, and reinforce themselves—mirrors many of the concerns people raise about AI: that it recycles dominant patterns, amplifies bias, resists nuance, and flattens complexity. We talk about LLMs doing this, but public discourse does it just as reliably. And without realizing it, that cycle ends up training expectations more than the systems do.

So I’m less interested in debating autonomy or intelligence, and more focused on how people are actually using these tools—with intent, structure, and awareness—and what’s been learned from those experiences.
 

What’s ironic is that this human feedback loop—the way narratives form, get repeated, and reinforce themselves—mirrors many of the concerns people raise about AI: that it recycles dominant patterns, amplifies bias, resists nuance, and flattens complexity. We talk about LLMs doing this, but public discourse does it just as reliably. And without realizing it, that cycle ends up training expectations more than the systems do.
That's a great encapsulation. An additionally there's an expectation that computers act the way we've always expected them to; so they don't 'get the math wrong', but with LLMs that expectation should really be more aligned with how we might think of a person, they in fact do mis-remember, mis-understand, and get the math wrong.

I've definitely found that I shouldn't necessarily count on the LLM, even provided eg. the PDF, to get the rules right, and I try to either check myself, or ask the kinds of questions that might surface "oh, wait a minute..." from it.

When you're using the ChatGPT (or Claude or Gemini) app, you're bound to its behavior, but if you build your own you have a lot more avenues for helping it with specific prompting, tools and context sources (although fair warning, it gets complicated fast 😄).

I find that at the moment it's still best as a player, because it more naturally lets you interject in the same way you might with players who aren't fully up on the rules, or who are a little 'forgetful' 😊

I actually find one of the biggest problems I had to work around was sort of 'unlicensed co-creation' where the AI doesn't quite understand the distinction between the player/gm roles and ends up eagerly stepping over that boundary. Reprimanding helps for a bit, but context length kills it over time. The best fix I've found is clear system prompting and model choice in particular.
 
Last edited:

Pets & Sidekicks

Remove ads

Top