Upgrade your account to a Community Supporter account and remove most of the site ads.
Rocket your D&D 5E and Level Up: Advanced 5E games into space! Alpha Star Magazine Is Launching... Right Now!

Reply to thread

Message: <blockquote data-quote="Cadence" data-source="post: 8077096" data-attributes="member: 6701124">It doesn't seem obvious to me that relative risk is necessarily more useful in this case than the difference in absolute risk.  The classic example of where looking at just relative risk breaks down is in the extremes. If event A occurs with probability 0.0001 and event B occurs with probability 0.001. B's probability is 10x larger (it's gone up 900%). Even if you have A and B compete a 1,000 times, A will still win around 4% of the time and will tie B around 37% of the time in spite of B's win probability being a massive 10x that of A. Looking at that same relative risk with P(A)=0.05 and P(B)=0.50, where there's a much bigger difference in absolute risk,  B wins the match essentially all the time.  So, in something like treatment effectiveness, looking at just relative risk as a descriptive feels like it can give a very odd picture of the actual overall impact of the treatment.As I noted in a previous post,  if P(A)=50% vs. P(B)=55% were to compete 100 times, then B would have around a 74% chance of winning the title for the session, A around 22%, and they would tie around 4%. So there will certainly be a lot of times when even after a full hundred rounds that B hasn't shown better than A, let alone clearly so. (The expected number of extra hits for B over A over the 100 is of course 5, so even some of the times A is losing it is only by a few hits).  The probability of getting the arbitrary alpha=0.05 level statistical significance in this case is less than 20%, even if doing the one sided test because you think you know which is better. It feels odd to say that something is clearly noticeable based on just the successes and failures if the best test for finding a difference would only reject 20% of the time.   If that was enough power to be happy with a sample size, then that means one is happy with a 20% false discovery rate (0.05/(0.05+0.20), right?Making it more extreme, 15% vs. 20% does make it slightly more apparent, but even then A still has about a 15% chance of doing better over 100 trials, and 5% chance of tying.  The estimated power at alpha=0.05 is still only around 22%.Going to 5% vs. 10% A is down to winning or tying a total of only 10%, but the estimated power is still just around 30%. So it certainly does matter in the ends.  If everyone was  99% at something and I was only 94%,  it feels like it would stand out and the party would groan when it was my turn and I missed.   What percent of combat happens off in the tails like that?<Slap-dash R code at bottom in case my numbers are off. Also, please insert disclaimer about the arbitrariness of alpha=0.05 and how hypothesis tests aren't usually what you want... and also a that power seems like a relevant idea here anyway.>If Legalos had a 5% bonus over Gimli and they kept track over several game sessions, it feels like Gimli would be able to say he wasn't doing nearly as well after a few of them against hard to hit monsters. But against things in the middle it feels like it would take a while longer before his inner statistician would let him concede.   That it's hard to be confident in the difference after just 100, but easier when you get several times more, seems to fit in with what you might get in baseball - how does a .250 vs. .300 batting average feel after only 100 plate appearances at the beginning of the season for making long term decisions vs. after 500+ plate appearances?  (Well, I mean except for batting average being a horrible statistic).All that being said, it's hard for me to argue with the fact that the human brain isn't always big on caring what the probabilities say if it fits the story that it's working on:#nsims=number of simulation runs, I didn't feel like digging up the convolution of# different binomials#sz is the number of trials a and b have, where they succeed with probabilities pa# and pb#The first three numbers that are output are the estimated probability b wins, estimated# probability a wins, and the estimated probability they tie.#The next are the estimated power at a=0.05 for rejecting the null hypothesis that# they're equal using either the exact McNemar's test (since we know the order they# were in) or the usual two-sample z-test. As the pairing explains no variance#  I was a bit surprised the McNemar test was as different in a few cases.nsims=100000sz=100pa<-0.5pb<-0.55aplus<-rep(0,nsims)bplus<-rep(0,nsims)abeq<-rep(0,nsims)pmcn<-rep(0,nsims)pind<-rep(0,nsims)for (i in 1:nsims){x<-rbinom(sz,1,pa)y<-rbinom(sz,1,pb)aplus<-sum(x>y)bplus<-sum(y>x)abeq<-sum(y==x)pmcn<-binom.test(aplus,aplus+bplus,p=0.5,alternative="less")$p.valuepind<-prop.test(c(aplus,bplus),c(sz,sz),alternative="less")$p.value}sum(bplus>aplus)/nsimssum(aplus>bplus)/nsimssum(aplus==bplus)/nsimssum(pmcn<0.05)/nsimssum(pind<0.05)/nsims</blockquote>

Verification