The Pitfalls of D&D Beyond Data

FrogReaver

As long as i get to be the frog
So there have been a couple of recent threads about some new D&D Beyond Stats. This thread is meant to talk about what the major issues relating to their data. So to kick it off I'm going to start a list.

1. Multiclassing - There is never an explanation for how multiclassing is handled in the data and even if there were, any single method is going to inevitably skew the data in some direction. For example does a cleric 1/fighter 1 count as a cleric or a fighter or both or neither?

2. Subclasses - Classes that have a subclass at level 1 will be overrepresented compared with classes that get a subclass at level 3. (At least for datasets that include levels 1 and 2)

3. Product Costs - Any free content will skew results toward what is free.

4. Made vs Played - It's impossible to distinguish the purpose a user has for creating any given character.

Those are the 4 big issues I can think of. Any other issues anyone else can think of?

So to put this in perspective, clerics are one of the most multiclassed classes, they get a subclass at level 1 and they are included even in the most basic free content. Ultimately in relation to D&D Beyond Data that's being released, Clerics have a lot of positive data skewing.

Now consider the case for Circle of the Shepherd Druid. Druids are rarely multiclassed, the subclass in question isn't available till level 2. The subclass option also is behind additional costs. The Circle of the Shepherd druid is a character class that has a lot of negative data skewing.

The point is that the data D&D Beyond provides isn't going to be a very accurate representation of reality when it comes to playing D&D.
 

log in or register to remove this ad

Leatherhead

Possibly a Idiot.
Unlike others who merely read the headlines, I have been keeping up with the D&D Beyond broadcasts. :p
For example does a cleric 1/fighter 1 count as a cleric or a fighter or both or neither?
Both.
2. Subclasses - Classes that have a subclass at level 1 will be overrepresented compared with classes that get a subclass at level 3. (At least for datasets that include levels 1 and 2)
Yeah, point. But keep in mind, the vast majority (about 66%) of characters are from levels 1-4 to begin with. Tossing out that much data is going to cause problems in the other direction.
3. Product Costs - Any free content will skew results toward what is free.
In the latest charts, controlling for that factor didn't change the results significantly. While there is no doubt a significant number of people who make a free character, then buy (which would inflate the numbers for those results), you have to remember the free content is also the Basic PDF, which is a free PDF you can get from WotC, so such data would reflect the reality of someone who didn't use D&D Beyond.
4. Made vs Played - It's impossible to distinguish the purpose a user has for creating any given character.
They already have a method to account for that, which gives an even better data set than the question you proposed: When the Data says "Active Characters" it means they have controlled for characters who haven't been updated in the past X days. It doesn't matter if the character is meant to be an NPC or PC, it just maters if someone is actually going back to update it, all other characters are considered to be abandoned or a test character.
 



There's also a massive chunk of data missing, in that plenty of players don't use D&D Beyond. Example: I've made probably 20+ 5E characters, for one-shots, helping out friends make characters, and my own characters. I've only ever used an online character sheet two or three times, and that was always Roll20.

It might be the official D&D online tool for Wizards, but it's basically useless for someone like me. I prefer the hardcover books instead of online, I use a lot of homebrew, and I already have a VTT that I like.
 

FrogReaver

As long as i get to be the frog
Unlike others who merely read the headlines, I have been keeping up with the D&D Beyond broadcasts. :p

Both.

Maybe, but if a fighter 1/cleric 1 counts as both a fighter and a cleric and given that all their class percentages add up to 100% that means that thing single fighter 1/cleric 1 character also got counted as 2 total characters. Otherwise they would have had he sum of all their class percentages over 100%. That's a big issue!

Yeah, point. But keep in mind, the vast majority (about 66%) of characters are from levels 1-4 to begin with. Tossing out that much data is going to cause problems in the other direction.

That's exactly why character level for subclasses needs controlled for.

In the latest charts, controlling for that factor didn't change the results significantly. While there is no doubt a significant number of people who make a free character, then buy (which would inflate the numbers for those results), you have to remember the free content is also the Basic PDF, which is a free PDF you can get from WotC, so such data would reflect the reality of someone who didn't use D&D Beyond.

They didn't control for free vs not free.

They already have a method to account for that, which gives an even better data set than the question you proposed: When the Data says "Active Characters" it means they have controlled for characters who haven't been updated in the past X days. It doesn't matter if the character is meant to be an NPC or PC, it just maters if someone is actually going back to update it, all other characters are considered to be abandoned or a test character.

Are they also excluding made and never went back and updated characters?
 
Last edited:

G

Guest 6801328

Guest
I won't believe that conclusions drawn from D&DBeyond datasets are accurate until they reflect my own personal preferences.

EDIT: Actually, strike "D&DBeyond" from that.
 


Blue

Ravenous Bugblatter Beast of Traal
[MENTION=6795602]FrogReaver[/MENTION], I agree with all of your points and I'd like to add another - self selection. Just like ENworld we discuss a lot of points but we're just a subset of all players and not necessarily a representative sample.

I hadn't thought at all about the skewing, either from subclass choice level or multiclassing. That's a really good catch.

I wish they would release their data on github or somewhere so we could all examine it.
 

FrogReaver

As long as i get to be the frog
[MENTION=6795602]FrogReaver[/MENTION], I agree with all of your points and I'd like to add another - self selection. Just like ENworld we discuss a lot of points but we're just a subset of all players and not necessarily a representative sample.

I hadn't thought at all about the skewing, either from subclass choice level or multiclassing. That's a really good catch.

I wish they would release their data on github or somewhere so we could all examine it.

Agreed, that's definitely a problem if anyone attempts o make a claim about all d&d based on the data. I don't think they are directly guilty of that one, but I think threads on this forum and others often pop up that make such claims (like the 90% of games stop at level 10).
 

Remove ads

Top