Maybe you haven’t seen this happen, but I absolutely have. Missing 5% more often is noticeable, especially over the course of an adventuring day, especially in a bounded accuracy system like 5e’s.
It'll be pretty noticeable after several adventuring days, but it feels like it shouldn't be that noticeable after just one day of 50 or 100 swings based on just the results.
If there's a 50% chance of hitting for one and 55% for the other, then the chance of having statistically significant evidence of a difference after 100 swings is only around 14% - and that assumes the weaker player suspects they're weaker. [The better one wins the total of the day about 74% of the time, the worse one about 22%, and a tie around 4%]. Only 50 swings in the day drops the chance of significant evidence under 10%. [The better one wins the day about 65% of the time, the worse about 27% of the time, and a tie about 8%).
If it's something hard with an 15% vs. 20% , then the chance of getting significant evidence goes up to19% for 100 comparisons (better wins 80% of days, worse wins 15%, and tie 5%). It's 10% with 50 (better wins 70% of days, worse 21%, tie 9%).
Crank it up to 500 swings, and you get a 64% chance of having statistically significant evidence for the 15% vs. 20% case, and the weaker one only has a 2% chance of winning or tying the match.
Of course if it's a 0% vs. 5% that's really noticeable... but it feels like they shouldn't be running into those that often.
Or another way, if you look at it in terms of an ELO chess rating, a 50% chance of success vs. 55% chance of success is only 36 points. For 15% vs. 20% it's about 54 points. Again, that doesn't seem like a lot.
Edit: Left out that I was using the arbitrary alpha=0.05 cutoff. Insert rant on why hypothesis testing is usually not what most folks actually want to do with their data
Last edited: