D&D General Requesting permission to have something cool

EzekielRaiden · Nov 1, 2023

Snarf Zagyg said:
Based upon the press coverage, I am reasonably certain that any data mining of BG3 will only tell us what various race and class combinations will result in people saying, "Yeah, I'd totally hit that."

To be clear, I don't mean looking at player data. I mean using it as an in-house virtual testbed.

Full-time in-house playtesters can only do a handful of encounters per day. A machine running BG3 code (without actually showing the 3D models) could run hundreds of combats an hour easily, perhaps far more. Different setups would yield different kinds of data. E.g. create monsters that you are very confident do exactly the kinds of damage they're supposed to, on average, during a fight (various "yardstick" encounters, so to speak.) Then test different group compositions against these yardsticks. Run the same scenario ten thousand times each for two groups that are identical except for having a Champion vs a Battle Master, and then do a couple hypothesis tests to see if the two distributions are consistent with one another or not (e.g. a goodness-of-fit test). Iterate this over various different setups so you can get actual data on the performance of various characters in combat, and make appropriate tweaks as needed to ameliorate unintended weaknesses or strengths (every class and subclass should have weaknesses and strengths, but unintended ones are likely a problem.) Out-of-combat stuff, as stated, requires actual human brains behind it--but now you can have your playtesters focus on that stuff, rather than spreading their efforts around.

Then, once that phase of testing is done, aka once you've shifted from asking how strong PCs are to knowing how strong they are, set up some standard party slates (say 10 parties with widely varying composition), and then use those as yardsticks for testing more interesting and diverse monsters. Again, some things cannot be tested in a code environment, they require human judgment to work correctly, and those are the exact things that you sic your live playtesters on. The stuff that doesn't critically depend on human judgment, however, can be run through the simulation a zillion times to check that the spread of results it generates fits the intended results. E.g., ten thousand runs each of parties A-J showing that ghouls are massively more deadly than they should be (aka, what actually bit the 5.0 design team in the ass when they tried some live playtesting) would mean that ghouls either need to be toned down a bit, or need to be marked as higher CR, or some other thing.

As stated, if we can get some randomization thrown into the mix as well, beyond just the dice roller (e.g. something that can generate reasonable random encounter maps, so it's not just five-a-side on a featureless flat plane), you could go even further. Use standardized monsters AND standardized parties, and then add terrain effects. Hazards, traps, muck, ice, kaiju stomach acid, the works. Check how these things compare to the exact same maps without said hazards, and boom--you can generate a mathematical representation of how much tougher features can make a fight. Naturally, as with all these numbers, it would be an approximate and statistical fit, not a diamond-perfect solution (in math jargon, it is a "numeric" solution rather than an "analytic" solution.)

And, to reiterate, this is not perfect. It cannot completely replace human playtesting, and it should never be expected to. Human judgment, preferences, needs, and interests will always be central to the process of making a good game. But an automated virtual testing environment does wonders for the parts of the game that really are just a lot of random-number generation and mathematical calculation, and combat balance is precisely that. Then, precious live-playtester time can be focused on the stuff that benefits most from live playtesters. What could have been months or even years of public playtesting can potentially be reduced to days, even hours of virtual playtesting.

nevin · Nov 1, 2023

EzekielRaiden said:
I do not share your confidence in this claim. Particularly given the apparent runaway success of BG3, which is nearly perfect for making a virtual testing environment that can run numbers and give a useful spread of statistical results. It includes terrain (obstacles, impassable barriers, hazards, slowing effects, height difference, etc.), a wide variety of implemented monsters, many abilities including instant-death ones (e.g. intellect devourers), utility magic, conditional effects, all sorts of stuff.

It--obviously!--cannot handle the sheer creative potential, and careful decision-making, of actual human beings. It cannot generate new ideas, and in all likelihood, even some things it could theoretically test will simply be too difficult or cumbersome to actually express within its engine's code. But it can do a hell of a lot, and it can collect that data at lightning speed, allowing you to do the equivalent of hundreds or thousands of hours of live playtesting with the push of a button. If coupled with some fancier tricks (like, say, something which automatically generates varied terrain and encounters indexed by intended difficulty), it can even be used to do something like actually getting some kind of feel for how impactful terrain features can be on encounter difficulty. If that's feasible, it could open up room for an entirely new set of tools and advice for DMs on how to make their encounters both better and more fitting to their vision for their campaigns.

This really isn't that big an ask in a world with computers. Particularly since they've already expressly said that they're making their own virtual tabletop. Basic statistical modeling. I'm not even talking ANOVA, I'm genuinely just saying basic tests like hypothesis testing, goodness-of-fit tests, and proportion tests. Basic survey design, e.g. you don't make a push polls and shape your questions so it's not actually possible to voice relevant criticism. And some basic consistency on their standards for what gets multiple attempts vs what gets crapcanned on the first pass, e.g. you don't spend six to eight months trying to make Specialties or that martial bonus dice thing work only to quietly abandon both (and suffering serious consequences as a result of dropping them), while literally completely abandoning two whole classes and never making another public attempt simply because things didn't go well on the first try.

None of this is hard. None of it is complicated. They're already doing some of it, and have access to tools that can do much of the stuff they haven't yet. A single survey consultant could fix up their survey design stuff right quick. You don't even need a stats consultant (though that would of course be incredibly useful)--just basic Stats 101 stuff is all you need.

it's all complicated. What the internet wants today can all be spun in a different direction by influencer's tomorrow. As it's been demonstrated on multiple ocassions in the play tests just asking the questions creates expectations that then stress the model and change things. There are an insane number of studies that talk about how the question you ask in polls can influence the answers. And we haven't even touched on the fact that WOTC and HASBRO don't always agree on what to do let alone the seperate teams and individual contributors.

There is nothing simple or easy about changing a product with a large user base.

EzekielRaiden · Nov 1, 2023

nevin said:
it's all complicated. What the internet wants today can all be spun in a different direction by influencer's tomorrow. As it's been demonstrated on multiple ocassions in the play tests just asking the questions creates expectations that then stress the model and change things. There are an insane number of studies that talk about how the question you ask in polls can influence the answers.

Yes...that's why I'm saying the survey design is bad. There are ways to ask the questions they want to ask that don't do that. This is a topic that has been studied in social science and psychology for, I am not joking, more than a full human lifetime. There is ample literature on the subject and a consultant would trivially identify many of the errors WotC is making in their survey design.

nevin said:
And we haven't even touched on the fact that WOTC and HASBRO don't always agree on what to do let alone the seperate teams and individual contributors.

There is nothing simple or easy about changing a product with a large user base.

I never said there was. I said that the things I am proposing are simple steps, in and of themselves actually quite easy. The overall task remains quite difficult. I would never say otherwise--it is, after all, something many people are doing as their career.

But when they do things as boneheaded as literally making polls where the only answers are versions of "yes," their survey design is SO busted that, yes, it really is extremely easy to do better! As in, almost anyone could do better.

Oofta · Nov 1, 2023

EzekielRaiden said:
Yes...that's why I'm saying the survey design is bad. There are ways to ask the questions they want to ask that don't do that. This is a topic that has been studied in social science and psychology for, I am not joking, more than a full human lifetime. There is ample literature on the subject and a consultant would trivially identify many of the errors WotC is making in their survey design.

I never said there was. I said that the things I am proposing are simple steps, in and of themselves actually quite easy. The overall task remains quite difficult. I would never say otherwise--it is, after all, something many people are doing as their career.

But when they do things as boneheaded as literally making polls where the only answers are versions of "yes," their survey design is SO busted that, yes, it really is extremely easy to do better! As in, almost anyone could do better.

But you still haven't answered the question I posted above. What are you measuring and where do you put your emphasis? Garbage in, garbage out and all of that. I also don't think it has a snowball's chance in heck of happening, but that's beside the point.

nevin · Nov 1, 2023

OK my mistake then.

EzekielRaiden said:
Yes...that's why I'm saying the survey design is bad. There are ways to ask the questions they want to ask that don't do that. This is a topic that has been studied in social science and psychology for, I am not joking, more than a full human lifetime. There is ample literature on the subject and a consultant would trivially identify many of the errors WotC is making in their survey design.

I never said there was. I said that the things I am proposing are simple steps, in and of themselves actually quite easy. The overall task remains quite difficult. I would never say otherwise--it is, after all, something many people are doing as their career.

But when they do things as boneheaded as literally making polls where the only answers are versions of "yes," their survey design is SO busted that, yes, it really is extremely easy to do better! As in, almost anyone could do better.

ok I misunderstood the message and got it backwards.

I'd say they've decided where they are going and they are going through the motions so no one can say they didn't try. The biggest problem I see with them messing with DND is they don't really control the image or the narrative. Any big changes could change the narrative and they are probably terrified of the internet turning on them. DND has been around so long just changing spell descriptions starts wars on forums. I don't see them making any serious attempts at changes till the popularity dies down.

EzekielRaiden · Nov 1, 2023

Oofta said:
But you still haven't answered the question I posted above. What are you measuring and where do you put your emphasis? Garbage in, garbage out and all of that. I also don't think it has a snowball's chance in heck of happening, but that's beside the point.

...you measure the things that can actually be measured, based on whatever design goals you have. E.g., as mentioned, when checking CRs, you check to see if various party comps (run a zillion times each) get the right spread of "number of deaths per combat" or "number of spell slots spent" or whatever. When testing different subclasses, you'd look primarily at things like damage output, damage suffered, healing provided, stuff like that--things the computer can quite easily track.

Plenty of things, as I said, cannot be tested this way, and those things will always require human judgment to test. Nobody's getting rid of human playtesters. But instead of waiting for ten human groups to run through an encounter you've set up, you can get at least approximate data from the virtual simulation--and then those ten groups can instead focus on the critical intangibles like "feel", on things like presentation and roleplay and utility effects etc., etc.

Dismissing the entire thing as "oh it's just garbage in, so it's garbage out" is foolish. There is enormous potential in things like the BG3 system to give designers real data. Yes, like all statistics, it means you must ask good questions and really carefully think about the answers. That's nothing new.

Oofta · Nov 1, 2023

EzekielRaiden said:
...you measure the things that can actually be measured, based on whatever design goals you have. E.g., as mentioned, when checking CRs, you check to see if various party comps (run a zillion times each) get the right spread of "number of deaths per combat" or "number of spell slots spent" or whatever. When testing different subclasses, you'd look primarily at things like damage output, damage suffered, healing provided, stuff like that--things the computer can quite easily track.

Plenty of things, as I said, cannot be tested this way, and those things will always require human judgment to test. Nobody's getting rid of human playtesters. But instead of waiting for ten human groups to run through an encounter you've set up, you can get at least approximate data from the virtual simulation--and then those ten groups can instead focus on the critical intangibles like "feel", on things like presentation and roleplay and utility effects etc., etc.

Dismissing the entire thing as "oh it's just garbage in, so it's garbage out" is foolish. There is enormous potential in things like the BG3 system to give designers real data. Yes, like all statistics, it means you must ask good questions and really carefully think about the answers. That's nothing new.

You don't need a simulator to do that though. Run it enough times and you'll just get the averages which you can figure out on a spreadsheet. But it still depends on assumptions, and there are far too many to plug in.

You need real people to get a feel for whether something is fun to play which is not 100% correlated to any particular aspect of the character that can be measured.

EzekielRaiden · Nov 1, 2023

Oofta said:
You don't need a simulator to do that though. Run it enough times and you'll just get the averages which you can figure out on a spreadsheet. But it still depends on assumptions, and there are far too many to plug in.

You don't need a simulator, no. But it can be incredibly useful--which is literally what I said. It can let you focus your attention on the things that truly need human attention. Which I've said. Repeatedly.

Oofta said:
You need real people to get a feel for whether something is fun to play which is not 100% correlated to any particular aspect of the character that can be measured.

How many times do I need to say "you will never replace human testers," "human testing is critical," and other such things, for it to count? How many words must I spend on it to say "this is just one tool in the toolbox, which can be incredibly useful, but never enough by itself"?

Because I said those things. At least three times each. With examples! What more could I possibly give you except instant surrender, except "oh yeah, you're right, this is totally useless garbage, I was stupid for ever thinking it might have any potential at all"?

In order for it to be a game at all it has to have assumptions. It really isn't that many to "plug in," as you say. BG3 works just fine for the stuff it's trying to do--and the stuff it isn't trying to do should be, as I have said repeatedly

tested in some other way, that involves actual humans using their human judgment.

Oofta · Nov 1, 2023

EzekielRaiden said:
You don't need a simulator, no. But it can be incredibly useful--which is literally what I said. It can let you focus your attention on the things that truly need human attention. Which I've said. Repeatedly.

People say all sorts of things. Repeatedly. Doesn't make them right. If you're running the same scenario repeatedly it will average out. If it averages out you don't need a simulator, you can do the same with math and averages.

But it still doesn't matter because it just depends on what scenarios you're validating and what the input variables are.

EzekielRaiden · Nov 1, 2023

Oofta said:
People say all sorts of things. Repeatedly. Doesn't make them right. If you're running the same scenario repeatedly it will average out.

Yes...that's the point. It will regress to the mean. You will get a central tendency and a standard deviation--statistics. Those statistics can then tell you whether things are performing as desired.

Oofta said:
If it averages out you don't need a simulator, you can do the same with math and averages.

No, you can't. That's why we actually collect statistical data, and do things like Monte Carlo simulations, rather than doing everything analytically. Some math questions cannot be easily answered analytically; some genuinely cannot be answered analytically at all. For example, any polynomial of at least degree 5 is not guaranteed to have analytic solutions--meaning, if you have something of the form y=ax^5+bx^4+etc., it may in fact be impossible to get a closed-form answer for the roots of that polynomial. It isn't that our math isn't good enough to do it; it is that we can prove that some degree-5 polynomials literally don't have solutions that can be calculated using elementary functions. For another example, partial differential equations often do not have analytic solutions; you just have to get good approximations. These are necessary for literally anything involving flow, e.g. movement of gases or liquids.

Oofta said:
But it still doesn't matter because it just depends on what scenarios you're validating and what the input variables are.

...except that it does matter. Because those are the exact questions the designers need to be asking. They need to know the input variables. They need to know the situations. That's how you test things! You're literally saying that because we can't get an analytic answer, no answer is possible. That's wrong! We can get numeric solutions, sometimes very very good ones. That's the whole point of modeling like this. Huge swathes of science today are, quite literally, built upon the back of creating very good computer simulations and then testing novel or unexpected variables to see what happens. That's how we do climate science, since we can't actually solve the differential equations involved and can't do meaningful experiments because we don't have a thousand other Earths to perform experiments on. That's how physicists test models of solar system formation, or the mechanics of how Earth's Moon formed, or literally anything at all involving gravity because the three-body problem does not have general solutions.

Assumptions will go into it. By definition, they must, because some went into the design of the game to begin with. As stated, this requires that you think very carefully about what questions you ask, how you ask them, what data you use to answer them, and whether the data actually supports any conclusions at all (let alone the ones you're looking for.) That's how statistical modeling works.

Just because it's statistical and simulated doesn't mean it's useless. It is exactly the opposite: that it is statistical means we can apply many useful things to it, which can help us seek useful results. Statistics and simulation are powerful tools; like any powerful tool, they must be used with care and diligence.

D&D General Requesting permission to have something cool

EzekielRaiden

Follower of the Way

nevin

Hero

EzekielRaiden

Follower of the Way

Oofta

Legend

nevin

Hero

EzekielRaiden

Follower of the Way

Oofta

Legend

EzekielRaiden

Follower of the Way

Oofta

Legend

EzekielRaiden

Follower of the Way

Similar Threads

Pets & Sidekicks