D&D General Lies, Darn Lies, and Statistics: Why DPR Isn't the Stat to Rule them All

Not to undercut your point too much, because I do think DPR can be useful for comparing class features, but I'd like to offer up an example of where even this can go wrong, using Great Weapon Master vs a +2 to Strength as an example

Let's say we have a PC with 30 hit points, who wields a standard greatsword (2d6) with +3 Strength modifier and a +5 to hit, and for some reason they only have 13 AC. They're about to face a monster who also has 30 hit points, wields a standard battleaxe (1d8) with a +4 Strength modifier and a +6 to hit, and who also has only 13 AC for some reason. For simplicity, lets assume the monster goes second in the initiative order.

If we calculate the average damage for each, the PC deals 6.85 DPR against monster, and the monster deals 6.175 DPR to the PC. If we run the numbers based on just that, we find the PC has a 66% chance of winning.

Now, lets look at the case where the PC has the option to take either a +2 to their Strength score or a feat that gives them a power attack for -5 to hit and +10 to damage (think Great Weapon Master but without the bonus action attack on crits).

If the PC takes the +2 to Strength, their Strength modifier goes to +4 and their attack bonus goes to +6, and their average DPR increases to 8.05. And, if they take the power attack feature, their attack bonus goes to +0 and their damage modifier increases to +13, giving them an average DPR of 8.35.

Clearly, 8.35 DPR from the power attack is higher than 8.05 DPR from the Strength increase, so that must be the better choice. Right?

Well, no.

If we run the numbers on the encounter again, their chance of winning with the power attack is 71%. That's a nice improvement over their initial chance of winning of 66%, but how does it compare to the +2 Strength option? Turns out, its 6% lower. Running the numbers on the +2 Strength build results in a chance to win of 77%!

It's not just better, it's quite a bit better. How is that possible?

The reason is pretty simple. While the average is slightly higher for the power attack over the Strength improvement, the standard deviation is significantly worse (10.5 compared to 5.9 for the Strength improvement). In other words, the results are more variable for the PC who took the power attack option. When the inputs for an encounter are more variable, unlikely outcomes become more likely. And, since this encounter already favored the PC, that means the odds of the monster winning were bound to increase.

DPR can be useful, but it can sometimes be misleading. Similarly, average outcomes can be useful, but they also can sometimes be misleading, especially when large differences in variability are concerned. Control spells are great, but they often carry with them high levels of variability that needs to be accounted for.
Far be it for me to complain about someone being loquacious, but this feels like the long way of saying, "DPR alone is like any raw statistic without error bars, not useless, but not as helpful as you might like."

So...yeah. If you're going to do a statistical comparison, you have to actually factor in the spread of the data, not just the center.

But in most of the things where DPR is used--that is, looking at how Class A handles a particular situation vs Class B--the spread of the distribution rarely matters, because we're averaging over dozens to (low) hundreds of rolls. E.g. when I've shown that a Wizard rarely needs more than about 3/4 of their total spell levels for a day to keep up with the output of the entire Battle Master (let alone something as weak as a Champion!), all while leaving many spell slots open for utility effects and having plentiful ritual access, the fact that the Fighter's spread will often be a bit looser than the Wizard's because spells usually roll d6 or d8, while the Superiority Dice grow to 6d12/rest, has little impact on the fact that the Wizard's average "make fights end sooner" performance will be consistently comparable to or better than the Fighter's.
 

log in or register to remove this ad

one thing I want to point out on Monte Carlo sims is that they are highly dependent on accurately projecting decision trees, or even worse, probabilistic decision trees.

Monte Carlo sims make it easier to be highly specific to a given scenario. However, the more specific one is to a given scenario (including decision trees) the less universally applicable the results are.

One could try to overcome this problem by running alot of different scenarios through Monte Carlo sims, but then you get to the problem of how to properly average this out. Not all scenarios (including decision trees) are equally likely after all.

DPR calcs are less precise at modeling a specific scenario but they eliminate much of the decision tree issue making their results more universally applicable. DPR is still scenario based, but just with much fewer parameters needed for the scenario.
 

I disagree with the premise. I watch a lot of optimization videos, and almost all of the major ones shifted away from DPR long ago. They usually value things like battlefield control, buffing, defense, tanking, etc.. about as much as anything else. It's been a very long time in fact since I've seen someone who used DPR as the one stat to rule them all. The will mention it, they will use it, but they will put it into a context of other things you can do well so you can balance DPR with those other things.

In other words, the most successful ones got more tools than just a hammer long ago, and I feel like you're behind the times and using an old sterotype which isn't really very accurate these days.
Perhaps you've missed the very lengthy thread(s) arguing how much worse the 2024 paladin is than the 2014 paladin due to its inability to nova to the same degree, despite receiving a host of buffs in other areas? Or the other about how wizards SUCK because they don't regularly out-damage sorcerers, despite arguments about versatility outside of combat?

Optimizer YouTubers may have largely moved on from DPR as the end-all/be-all, but that mindset is certainly still present in these fora.
 

Far be it for me to complain about someone being loquacious, but this feels like the long way of saying, "DPR alone is like any raw statistic without error bars, not useless, but not as helpful as you might like."

So...yeah. If you're going to do a statistical comparison, you have to actually factor in the spread of the data, not just the center.

But in most of the things where DPR is used--that is, looking at how Class A handles a particular situation vs Class B--the spread of the distribution rarely matters, because we're averaging over dozens to (low) hundreds of rolls. E.g. when I've shown that a Wizard rarely needs more than about 3/4 of their total spell levels for a day to keep up with the output of the entire Battle Master (let alone something as weak as a Champion!), all while leaving many spell slots open for utility effects and having plentiful ritual access, the fact that the Fighter's spread will often be a bit looser than the Wizard's because spells usually roll d6 or d8, while the Superiority Dice grow to 6d12/rest, has little impact on the fact that the Wizard's average "make fights end sooner" performance will be consistently comparable to or better than the Fighter's.
I do wish it was as simple as that, but, from my experience, people are quick to dismiss that kind of general statement without concrete examples to back it up.

That said, my point wasn't just that error bars and distributions matter. I picked that example to illustrate the difference between large scale averaging and practical application.

Combat in DnD isn't a boundless marathon of actions, its made up of many discrete encounters that are finite in length. The goal of making a strong character isn't to top some damage chart at the end of a campaign, it's to maximize the groups chance of surviving the encounters they face.

Looking at simple averages can be useful, but we can't forget the fact that the game we're trying to analyze has short term boundary conditions that need to be considered.
 

Combat in DnD isn't a boundless marathon of actions, it’s made up of many discrete encounters that are finite in length. The goal of making a strong character isn't to top some damage chart at the end of a campaign, it's to maximize the groups chance of surviving the encounters they face.
There’s a difference in maximizing the chance of no tpk and the chance no pc ever dies.

I tend to view strong characters as being about the later.

Looking at simple averages can be useful, but we can't forget the fact that the game we're trying to analyze has short term boundary conditions that need to be considered.
I mean I agree boundary conditions matter but I think you are over-emphasizing them. As you increase in levels monster hp goes way up and damage per attack doesn’t usually increase all that much. Couple that with PCs winning 90%, maybe even 99% of combats and outside of cherry picking examples where boundary conditions matter, what you’ll find is that performance of damage dealing PCs is almost always highly directional to DPR (assuming all else is mostly equal). Now this doesn’t tell us how to compare healing or control or aoe damage or buffing/debuffing, or defense or face skills, or initiative or stealth or range vs melee or nova vs daily damage output, etc but no one has really ever claimed it did (well it’s the internet so someone has but I think my point stands regardless).
 
Last edited:

I always liked the Battier effect. I recall a good article titled something like "The No Stats All-Star" about him.

In response of course basketball nerds made stats to capture the Battier effect. Advanced stats have come a long way. We no have things like Box Plus/Minus, measuring what happens in a game when you're on the floor vs when you're off the floor. And Value Over Replacement Player (VORP), Player Efficiency Rating (PER), Win Shares, and things like that.
Most of those stats don’t even work for NCAA basketball. There is an underlying structure needed for advanced stats to really work and d&d doesn’t have it. Level playing field. Like opponents. Same league. Etc. Given how differently 2 DMs can run the same module 2 games of d&d can be more different than high school basketball vs nba basketball and we definitely cannot compare stats from high school players to nba players.
You could capture some numerical data surrounding causing a foe to miss their turn, causing a foe to only be able to make one attack instead of multiattack like from a slow effect, causing them to have much increased chance of not hitting someone like from mirror image, and a variety of other values. You could probably capture many factors in a single stat by looking at the same fight over and over again, with a caster casting control spells and without a caster casting a set of spells.
The problem is that decision trees for each participant are inevitably entangled. Unless you make some serious assumptions about ally and enemy decision trees as well as the character in question then it doesn’t work at all and if you are then best all this is telling me is that when this particular group of characters faces this particular group of monsters and everything acts under these specific decision trees then X led to the team winning x% more while costing y more amount of team resource.

Even if the assumptions remotely map to my actual situation there’s still probably an infinite (or very high) number of decision trees remaining for this particular set of PCs and monsters. And we still have to have some way of deciding if x% increase in win rate was worth the likely additional resource expenditures.
 

Trending content

Remove ads

Top