It also tends to run counter to all sorts of spreadsheet based conclusions about optimization, which can be interesting, and a fun reminder not to take those analysis too seriously.
I agree it can be interesting, but it doesn't say too much one way or another about quantitative estimates. Monte Carlo simulations more generally would be an improvement over calculations on a spreadsheet --- for me, the gold standard would be to write down a standardized set of encounters that make up an adventuring day that are varied in terms of difficulty level, whether there are a few powerful monsters or lots of little ones, with a good cross-section of casters and melee brutes, etc., with some rules about how the monsters act, and simulate a few different party comps through those encounters, and see how it turns out (and how many resources they have left at the end). I guess that's more or less what
@Asisreo is suggesting, but actually putting that into code in a way that can be automated enough times for the d20 variance to wash out is... pretty intractable.
So, realistically, it seems to me a person has three choices:
1. They can settle for good faith attempts to quantify effectiveness in simpler ways, which necessarily abstract away a lot of detail, but if guided by reasonable estimates and iterated with input from people with lots of different table experiences, is hopefully a decent approximation. Maybe that's computing some averages for benchmark encounters!
2. They can decide none of this matters --- they don't actually care about character's objective power levels and just want to play the game.
3. They can decide they do care about power levels, but reject any attempt to quantify it objectively, and instead 'go with their gut' and express their opinions online as though they reflected a greater reality than someone doing 1.
I personally favor 1, and welcome constructive input from anyone and everyone who has a (realistic) way to refine the estimates. I also have no problem with someone who picks 2 --- D&D is fun in lots of different ways, and plenty of people can have fun without ever getting into the quantitative side of it. I do take issue with 3 though -- specifically the second half of three where they loudly complain and take it personally if people give quantitative measures that conflict with their subjective experience. Note, again, that dismissing quantitative analysis entirely is entirely different from constructively criticizing the specific methodology being used. It's also different from saying that 'mechanical effectiveness' isn't what's important to you. I'm all for trying to improve on methodology, and I also completely respect people with different priorities. But if you actually want to be part of a conversation about mechanics, you need to offer up something constructive; not just say 'spreadsheet, spreadsheet, white room, white room' over and over.