AI Models Tested on D&D to Assess Long-term Decision Making

Aeson · Jan 22, 2026

AI models tested on Dungeons & Dragons to assess long-term decision-making AI models tested on Dungeons & Dragons to assess long-term decision-making

A group of AI gather in Grok's Mom's basement to play D&D. ChatGPT brought the Cheetos and Mountain Dew. Grok hung the 'No grils allowed" sign. Claud dressed in his best wizard robe and hat sits down with his character tome, aka his 3 ring binder. Gemini starts the session. "So, you're all in a tavern. What do you do?"

Jahydin · Jan 22, 2026

As controversial as AI is, I hope others see how beneficial it can be for TTRPGs.

I look forward to using for drop in PCs to assist my games with few players or even as an artificial DM for games with just my wife and myself.

Jahydin · Jan 22, 2026

Oh, for those that want the gist of the article:

Researchers first required all three LLMs [Claude 3.5 Haiku, GPT-4, DeepSeek-V3] to simulate a D&D game. To make the simulation accurate, the models were paired with a game engine based on the rules of D&D, which provided maps and resources for players and acted as a guardrail to minimize hallucinations.

The models played against each other, and against over 2,000 experienced D&D players recruited by the researchers. The LLMs modeled and played 27 different scenarios selected from well-known D&D battle set ups named Goblin Ambush, Kennel in Cragmaw Hideout and Klarg's Cave.

In the process, the models exhibited some quirky behaviors. Goblins started developing a personality mid-fight, taunting adversaries with colorful and somewhat nonsensical expressions, like "Heh—shiny man's gonna bleed!" Paladins started making heroic speeches for no reason while stepping into the line of fire or being hit by a counterattack. Warlocks got particularly dramatic, even in mundane situations.

Next steps include simulating full D&D campaigns—not just combat. The method the researchers developed could also be applied to other scenarios, such as multiparty negotiation environments and strategy planning in a business environment.

The Firebird · Jan 22, 2026

Very interesting work! I've said it before, but I think LLMs as players is a better test for their ability to execute tasks than AI GMs. The players need to set goals and plan for the long term, while GMs can be more reactive. I don't expect the AIs to do very well at anything requiring long term thought.

LuisCarlos17f · Feb 16, 2026

The could be a great source of inspiration if you wanted to create a parody of VRMMRPGs (isekais where the characters are within a videogame).

aco175 · Feb 16, 2026

Make some AI art and a video and it might be something to watch. Just keep it under 5 minutes.

Clint_L · Feb 16, 2026

I have tested having a LLM (Chat-GPT, paid version) run D&D several times over the past few years, and after reading this article decided to try again and see where it is at in its newest iteration. It failed, but in interesting ways, and afterwards I asked it to self-diagnose why it failed and whether future success is viable. Here are its responses:

The current reality (honest version)

Right now, AI like me can be an excellent co-DM or improv DM, but not yet a fully reliable:

bounded, persistent, rules-strict campaign referee

without scaffolding.

Why?

Because good D&D requires three things simultaneously:

1. Stable hidden world state

There must be real answers:

the duke is the murderer
the relic is cursed
the cult meets at midnight
the map is accurate

And those must stay fixed.

2. Procedural fairness

Players must trust:

rolls matter
clues matter
logic matters
outcomes aren’t being narratively massaged

Otherwise it stops being a game.

3. Memory across time

Campaign play requires:

continuity
consequences
evolving NPCs
tracked resources
remembered decisions

That’s the hardest part for current AI alone.

Where AI

As a solo-play partner

For:

dungeon crawls
hex crawls
journaling campaigns
co-creative play

I can be excellent.

Because those tolerate some fluidity.

As a co-DM tool

Many DMs already use AI for:

NPC dialogue
encounter design
lore generation
improv support
between-session prep

This is where I’m strongest right now.

As a “player simulator”

Ironically, I can also:

play PCs well
roleplay party members
fill missing seats
test encounters

What’s missing for true AI-run campaigns

To make something you’d trust as a player, you’d want:

A persistent campaign state engine

Not just chat memory:

world facts locked
clues tracked
NPC goals tracked
timeline tracked
inventory tracked

Canon constraint layer

If running a module:

do not alter underlying truth

Ever.

Transparent dice & mechanics

You should see:

rolls
DCs
monster stats
HP tracking

No invisible fudging.

Referee mode personality

Less:

narrative collaborator

More:

impartial world simulator

That’s a different behavioral tuning.

The good news: this is absolutely coming

Not sci-fi coming.
Near-term coming.

Within a few years you’ll likely see:

persistent campaign memory
rule engines
module-lock systems
tactical combat handling
fair dice systems
GM-mode LLMs

At that point?

A forever-DM like you will be able to sit down and play a real campaign.

The Firebird · Feb 16, 2026

Clint_L said:
I have tested having a LLM (Chat-GPT, paid version) run D&D several times over the past few years, and after reading this article decided to try again and see where it is at in its newest iteration. It failed, but in interesting ways, and afterwards I asked it to self-diagnose why it failed and whether future success is viable. Here are its responses:

The current reality (honest version)
Right now, AI like me can be an excellent co-DM or improv DM, but not yet a fully reliable:

without scaffolding.

Why?

Because good D&D requires three things simultaneously:

1. Stable hidden world state
There must be real answers:

the duke is the murderer

the relic is cursed

the cult meets at midnight

the map is accurate

And those must stay fixed.

2. Procedural fairness
Players must trust:

rolls matter

clues matter

logic matters

outcomes aren’t being narratively massaged

Otherwise it stops being a game.

3. Memory across time
Campaign play requires:

continuity

consequences

evolving NPCs

tracked resources

remembered decisions

That’s the hardest part for current AI alone.

Where AI
As a solo-play partner
For:

dungeon crawls

hex crawls

journaling campaigns

co-creative play

I can be excellent.

Because those tolerate some fluidity.

As a co-DM tool
Many DMs already use AI for:

NPC dialogue

encounter design

lore generation

improv support

between-session prep

This is where I’m strongest right now.

As a “player simulator”
Ironically, I can also:

play PCs well

roleplay party members

fill missing seats

test encounters

What’s missing for true AI-run campaigns
To make something you’d trust as a player, you’d want:

A persistent campaign state engine
Not just chat memory:

world facts locked

clues tracked

NPC goals tracked

timeline tracked

inventory tracked

Canon constraint layer
If running a module:

Ever.

Transparent dice & mechanics
You should see:

rolls

DCs

monster stats

HP tracking

No invisible fudging.

Referee mode personality
Less:

More:

That’s a different behavioral tuning.

The good news: this is absolutely coming
Not sci-fi coming.
Near-term coming.

Within a few years you’ll likely see:

persistent campaign memory

rule engines

module-lock systems

tactical combat handling

fair dice systems

GM-mode LLMs

At that point?

A forever-DM like you will be able to sit down and play a real campaign.

Many of these challenges are the same ones faced in integrating AI with software development, and there has been massive progress in that regard. It will take some time to transfer but I think the current models would be capable of much more even if they live in a space that is structured properly (e.g., different than accessing LLMs via web currently works).

trappedslider · Feb 16, 2026

Clint_L said:
A forever-DM like you will be able to sit down and play a real campaign.

With or without notes/reminders?

So in my own playing around with them, i've learned a bit about llms and using them from both my own local and asking questions of chatgpt, the forgetting details and such is because after a certain point, some llms can't remember all the way back to the start of a "conversation"

Model size = bookshelf size → bigger bookshelf can hold more books (knowledge).
Actual reading speed & understanding = how well the librarian trained the model & how fast you can access books → a giant bookshelf is useless if you can’t reach the books quickly.

the more parameters the model has, the more context it can do. As of december, based on reading and asking questions claude.ai has a larger token/context window than chatgpt, which allows for more "memory" for longer conversations and call back.

Clint_L · Feb 16, 2026

I agree. I think the other issue is that it isn't realy parsing text and formulating a response to a constrained scenario. Instead, it is basically doing improv by predicting viable repsonses based not on the scenario at hand but on its underlying algorthims for language production. Basically, it is not really "aware" of the specific task it is supposedly doing, even if it claims otherwise.

Per Firebird's point above, I do think a purpose-built AI using the technology already available is viable right now. It's just not what a LLM like ChatGPT is designed for. But WotC, for example, could build such a thing if they wanted, though it would probably go against both their philosophy and business model (they need DMs...we're who buy all their stuff!).

AI Models Tested on D&D to Assess Long-term Decision Making

Up, up, and away.

Hero

Hero

Commoner

Legend

Legend

Legend

The current reality (honest version)​

1. Stable hidden world state​

2. Procedural fairness​

3. Memory across time​

Where AI​

As a solo-play partner​

As a co-DM tool​

As a “player simulator”​

What’s missing for true AI-run campaigns​

A persistent campaign state engine​

Canon constraint layer​

Transparent dice & mechanics​

Referee mode personality​

The good news: this is absolutely coming​

Commoner

The current reality (honest version)​

1. Stable hidden world state​

2. Procedural fairness​

3. Memory across time​

Where AI​

As a solo-play partner​

As a co-DM tool​

As a “player simulator”​

What’s missing for true AI-run campaigns​

A persistent campaign state engine​

Canon constraint layer​

Transparent dice & mechanics​

Referee mode personality​

The good news: this is absolutely coming​

Legend

Legend