D&D (2024) Class spell lists and pact magic are back!

You do realize that your research there is entirely irrelevant, right? You can replace lightbulb with just about anything else, because the key part wasn't lightbulb, it was 1 failure out of 10.

And the fundamental misunderstanding is that looking to a group of ten people, and finding one failure DOES NOT indicate a failure rate of 1 out of 10 when discussing a body of 40,000

Here's another good example. I was in a training class for work recently. Out of 16 people, there was one African American person. That would mean the US population has a rate of African American people of 6.25% right? Except, we know from REAL census data that it is 13.6%.

Mamba keeps, correctly, saying you can't extropolate out from this small sample size, but what you and them seem to be not getting is that finding ONE person in a subset of a subset of a population who has expeirenced something we absolutely expect people to experience (misunderstanding communication) is meaningless. You don't need to go and investigate whether or not your written communication is flawed because you found a single person who misunderstood it out of tens of thousands of people.
 

log in or register to remove this ad

And the fundamental misunderstanding is that looking to a group of ten people, and finding one failure DOES NOT indicate a failure rate of 1 out of 10 when discussing a body of 40,000
He never claimed that it was. He said that if you find a failure in the first ten attempts, you are going to take a hard look at what's going on to see whether it's a fluke or whether it might indicate some sort of systemic flaw. At that point you aren't caring what the actual failure rate is. That there was a failure at all indicates that something has gone wrong somewhere to some degree and you need to look harder to see what it was and how bad it actually is.
Here's another good example. I was in a training class for work recently. Out of 16 people, there was one African American person. That would mean the US population has a rate of African American people of 6.25% right? Except, we know from REAL census data that it is 13.6%.
So again this demonstrates that you never understood what @mamba was telling you.
Mamba keeps, correctly, saying you can't extropolate out from this small sample size
Correct.
but what you and them seem to be not getting is that finding ONE person in a subset of a subset of a population who has expeirenced something we absolutely expect people to experience (misunderstanding communication) is meaningless.
Incorrect. See my first paragraph above.
You don't need to go and investigate whether or not your written communication is flawed because you found a single person who misunderstood it out of tens of thousands of people.
You do need to investigate it, because you are a business and need to know if it was a fluke or something more. If you ignore it as you would do for your business and it turned out to be 10%, 5% or even 1% and you then widely distributed the flawed product, you could easily end up out of business. While you might want to risk the success of your business, most others will not.
 

as I said, you cannot extrapolate, but since you detected this in that small a sample size, it is worth looking into. You are simply wrong to dismiss it, just because the sample is small.

It isn't just because the sample is small. It is also because we fully expect people to misunderstand written communication. Your one person who misunderstood this isn't some anomaly that indicates anything, it is 100% expected by anyone who understands sending out written questions to large groups of humans.

What percentage do we expect to stumble? Should we maybe work on making it harder to misunderstand the survey?

I don't know the percentage. There are dozens of academic papers written on the subject. And studying on the feasibility of fixing it. Here are some links. Go wild



I'm sure your utter brilliance will outshine the people who have been studying this for decades.

Sure, because a sample size of 10 is too small to reliably detect a problem, it's just that we already managed despite the small size. If we had a thousand and found the problem, it still would be a problem, we just managed to do so in 10 already.

Do you think they would say 'oh, it is only 10 out of the 1000 bulbs, we can ignore that'?

And here again, you don't seem to understand why sample size actually matters.

Let us say you find 1 out of 10 lightbulbs has a problem. That's bad right? But then you take a larger, more relevant sample size and find that it is actually 1 out of a 1000. Then that ISN'T bad, it can be easily ignored. This is why companies who do this sort of research into their products ALWAYS START with a significant sample size.

What you are doing is the equivalent of noticing the third light bulb you put into your home burnt out too quickly, and calling the company to demand why they sell such obviously faulty products that 1 in three of them burn out, and demanding they investigate the obvious problem they clearly have.

You are ignoring the fact that they likely did quality testing before you "noticed something". You are ignoring that they likely have better data than you. You are ignoring that they have absolutely sent out thousands of other products without any issue. You are assuming incompetence because you noticed a statistically insignificant event in a shallow sample size.

I gave a rationale, you are basically saying 'you have motive, you have opportunity, you have circumstantial evidence, but you have no DNA at the crime scene, so it could have been anyone'. I have no access to the proverbial crime scene... If you want to dispute the circumstantial evidence, be my guest.

What motive does WoTC have to ruin their own playtest with garbage data? How could that in any possible way achieve their goals?

All you have is opportunity, and one guy who says "I swear that guy commits crimes". You don't even HAVE a crime, you want to go looking for a crime scene. In police work, you are doing the equivalent of demanding a fishing expedition. And I don't need evidence to prove that there is no crime scene, when there is no reason to assume that there is.

sure, but it still pales in comparison to all responses, and the % is the aggregate of all of them, so the few written opinions have only a small influence on the result

Can you prove that? You threw out 5% with no evidence. What if 50% of people leave comments, then what? What if ratings with comments are weighted at double the impact? What if they figure that at least 20% of non-comment responses likely had similar opinions to the comments?

All of that would make a difference. So, show me what WoTC's processes for sifting through their data is. Prove they don't know what they are doing.

yeah, that is your claim, but 'unfortunately' correlation is not the same as causation, so you still will have to show that this is due to how great the playtest is working

I don't need to show that their success is because of the playtest working. Firstly, the product this playtest is for isn't even out yet. Kind of hard to show the playtest gave us a successful product when the product isn't released.

Secondly, I CAN show that this same survey method can lead to successful products, because... we have multiple successful products that have been released that followed this survey method (Tashas, Xanathars, ect)

Thirdly, even if I cannot show that the survey led to those successes.... since they are successes I can extropolate that the survey didn't HURT them. It was not a negative. And being neutral is just as bad for your position, because if the surveys are neutral, then they are not causing harm, and your argument falls apart again.

No more or less than mine, or rather, if anything it is less so, because you are not making a case like I did, you just make a claim. And yet you are very comfortable with dismissing mine. Guess I feel the same way about yours.

I doubt we will get to an agreement here. How about we turn this around? Why are you so opposed to improving the process? After all that is all I am asking for here...

Because I'd rather them work in the game than theoretically improving a process that might theoretically not be perfect, into a version that might theoretically be slightly less imperfect. They do not have infinite time and infinite money after all.

Let's see if we can agree on something here...

1) What is WotC really interested in answering? To me it is A) do you like this idea better than what we have today (nevermind the balancing)? B) Do you like the execution enough for us to add it as is, or does it need improvement?

Do you agree / disagree? If you disagree, what are they looking for?

I disagree that that is what they seem to be looking for. They especially do not seem to be asking us if they should improve their ideas or not.
 

No it is not. We are talking about a group selfishly weapon using the survey and playtest process .
I don't think such a group exists and I don't think you've provided a scintilla of evidence it exists.

Shaping the playtest is an element incapable of being part of optimizing a character build.tgst is certainly not something the survey weighting should be providing an over 200% amplification to.
It's not providing any amplification to it, another claim you have not supported. And I still have no idea what the difference is between a power build and optimization?
 

He never claimed that it was. He said that if you find a failure in the first ten attempts, you are going to take a hard look at what's going on to see whether it's a fluke or whether it might indicate some sort of systemic flaw. At that point you aren't caring what the actual failure rate is. That there was a failure at all indicates that something has gone wrong somewhere to some degree and you need to look harder to see what it was and how bad it actually is.

So again this demonstrates that you never understood what @mamba was telling you.

Sure, if I was making a product and there was a failure, I might look into it before putting it out to the public. But here's the difference. We aren't making the product. We are consuming it. So, since we know for a fact some people will misunderstand, and we know for a fact that all statistical polling data has an error margin... what does finding a single "failure" mean?

Nothing.

It means nothing, because finding a single failure does not indicate something is wrong with the product, because we EXPECT that there will be a small number of failures. Until you can demonstrate a statistically significant amount, you are using a small sample size to make false predictions. Because as we keep demonstrating, small sample sizes don't tell you anything about the larger product. And we are the consumers, not the producers, so with only a single failure, we cannot declare a fundamental error.

You do need to investigate it, because you are a business and need to know if it was a fluke or something more. If you ignore it as you would do for your business and it turned out to be 10%, 5% or even 1% and you then widely distributed the flawed product, you could easily end up out of business. While you might want to risk the success of your business, most others will not.

Okay. So can you prove that WoTC has never investigated this over the last 10 years? Can you prove that they are ignorant of the fact that they may have some respondents who give junk data? Can you prove that they never account for this, and have no clue how much of a problem this might be? Can you prove that this MUST be a large problem, because we found a single instance of this problem?

What you are talking about is business 101. Something even dumb lay-people like me know. So why are we assuming a multi-million dollar company, owned by a multi-billion dollar company has been using this method for a decade WITHOUT DOING THE MOST BASIC QUALITY CONTROL. If this is business 101 under what possible set of ideas do we imagine a company like Hasbro has no conception that they should have done this?
 

Sure, if I was making a product and there was a failure, I might look into it before putting it out to the public. But here's the difference. We aren't making the product. We are consuming it. So, since we know for a fact some people will misunderstand, and we know for a fact that all statistical polling data has an error margin... what does finding a single "failure" mean?
You know he's discussing WotC's product and it's polling flaws, right? He's not talking about himself and what he does.
It means nothing, because finding a single failure does not indicate something is wrong with the product, because we EXPECT that there will be a small number of failures. Until you can demonstrate a statistically significant amount, you are using a small sample size to make false predictions. Because as we keep demonstrating, small sample sizes don't tell you anything about the larger product. And we are the consumers, not the producers, so with only a single failure, we cannot declare a fundamental error.
Again, this is wrong. It means something because it CAN mean something is wrong with the product. It's a very foolish company that ignores a failure found in the first 10 QC attempts and doesn't investigate further.
Okay. So can you prove that WoTC has never investigated this over the last 10 years? Can you prove that they are ignorant of the fact that they may have some respondents who give junk data? Can you prove that they never account for this, and have no clue how much of a problem this might be? Can you prove that this MUST be a large problem, because we found a single instance of this problem?
An example was given from the last video. Crawford expressed confusion that their polling indicated that a class or subclass(I can't remember which now) indicated a neutral view when voting on the individual abilities(ie one got somewhat favorable and the next somewhat unfavorable), yet the class as a whole was voted somewhat unfavorable.

That confusion shows very clearly that they don't understand their polling very well, because of the flaws in the way they go about it.
What you are talking about is business 101. Something even dumb lay-people like me know. So why are we assuming a multi-million dollar company, owned by a multi-billion dollar company has been using this method for a decade WITHOUT DOING THE MOST BASIC QUALITY CONTROL. If this is business 101 under what possible set of ideas do we imagine a company like Hasbro has no conception that they should have done this?
The methodology which many have been talking about as being flawed for years AND the video showing that their main designer doesn't understand it. Their main designer not understanding it is pretty bad.
 

I don't think such a group exists and I don't think you've provided a scintilla of evidence it exists.


It's not providing any amplification to it, another claim you have not supported. And I still have no idea what the difference is between a power build and optimization?
The pages and pages you've been arguing to ensure that the survey weighting provides them an outsized level of say in striking down power reductions for any reason no matter why they were done no matter what the reduction improves or allows says otherwise about the existence of said group.
 

You know he's discussing WotC's product and it's polling flaws, right? He's not talking about himself and what he does.

Again, this is wrong. It means something because it CAN mean something is wrong with the product. It's a very foolish company that ignores a failure found in the first 10 QC attempts and doesn't investigate further.

An example was given from the last video. Crawford expressed confusion that their polling indicated that a class or subclass(I can't remember which now) indicated a neutral view when voting on the individual abilities(ie one got somewhat favorable and the next somewhat unfavorable), yet the class as a whole was voted somewhat unfavorable.

That confusion shows very clearly that they don't understand their polling very well, because of the flaws in the way they go about it.

The methodology which many have been talking about as being flawed for years AND the video showing that their main designer doesn't understand it. Their main designer not understanding it is pretty bad.


In the interest of being fair to JC... While I agree with everything in this quote there is a nonzero chance that the Jeremy Crawford quote in question is not a matter of JC personally believing that completely bonkers statement. The other possibility is a common scenario where the face/public spokesperson for a company it directed from on high (ie some level of senior management) that the company has decided to ignore all sanity in polling and adopt a view that aligns with some MBA's vision or the company's bottom line. The alternative in such scenarios is often quit and call victim to some contract already in place while the company just pays someone else to say it.

It eventually came out that some of 4e's peculiar choices were due to management on high pushing for the team to develop new IP rather than use existing well loved stuff so it's not outside the realm of possibilities.
 

And the fundamental misunderstanding is that looking to a group of ten people, and finding one failure DOES NOT indicate a failure rate of 1 out of 10 when discussing a body of 40,000
no one claimed it did. We all understand that

Mamba keeps, correctly, saying you can't extropolate out from this small sample size, but what you and them seem to be not getting is that finding ONE person in a subset of a subset of a population who has expeirenced something we absolutely expect people to experience (misunderstanding communication) is meaningless.
that is not true. If you expect to find, say 1 case in 10000 and you find one on your first try in a sample of 10, then you absolutely should investigate, because if the ratio really were 1:10000, that would be an extraordinary outlier

Does it tell you anything definitively? No, but it is a good reason to take a closer look. Not doing so is where you go wrong
 

It isn't just because the sample is small. It is also because we fully expect people to misunderstand written communication.
this is irrelevant, if you are not comfortable with a possible error rate as high as 1 in 5, then the fact that you found one in a sample of 10 means you should take a closer look, rather than ignore it

And here again, you don't seem to understand why sample size actually matters.
I fully understand that, you seem to not get that finding an issue is what matters here, not the small size

Let us say you find 1 out of 10 lightbulbs has a problem. That's bad right? But then you take a larger, more relevant sample size and find that it is actually 1 out of a 1000. Then that ISN'T bad, it can be easily ignored.
I am fully aware, please provide the sample of 1000 that shows this, until then this is at most wishful thinking

What motive does WoTC have to ruin their own playtest with garbage data? How could that in any possible way achieve their goals?
none, I do not believe I ever said they are doing this intentionally

Can you prove that? You threw out 5% with no evidence. What if 50% of people leave comments, then what?
then 1) I would be very surprised and 2) it would alleviate my concern. Now show me the 50%, I believe 5% is pretty high in reality.

I don't need to show that their success is because of the playtest working.
it was your claim, so yes you do

Firstly, the product this playtest is for isn't even out yet.
then why even bring the past success and 2014 playtest up, if you want to use this cop out?

Secondly, I CAN show that this same survey method can lead to successful products,
This is obviously what I was asking for in the first place. But can lead is not enough, is the reason for the success is what we are looking for. Otherwise you have shown absolutely nothing. The most ass-backwards playtest can lead to a successful product too... so can no playtest

Thirdly, even if I cannot show that the survey led to those successes.... since they are successes I can extropolate that the survey didn't HURT them
no, you absolutely cannot do that. Well, I guess you can, but it is wrong to do so

I disagree that that is what they seem to be looking for. They especially do not seem to be asking us if they should improve their ideas or not.
then why have the 70% threshold…
 
Last edited:

Remove ads

Top