D&D General Data from a million DnDBeyond character sheets?


log in or register to remove this ad


It just supports something I've said for years; the game needs a simple pew-pew "mage" that doesn't use bespoke spells for new players who like the wizardly image but not the complex gameplay, and a complex warrior type (doesn't have to be fighter) for the complexity-loving players who also enjoy martial tropes.

With respect, the data set doesn't speak to how long the player has been playing, or show us that after playing a simple fighter, the players go for more complex options. The "new player" thing is a supposition.

This scraped data set does align with what I remember from the times D&D Beyond spoke to "active players" (they can categorize that way by looking at sheets that have taken short/long rests, are levelled up over time, and so on). Fighters are popular. They work for folks. That's good.
 

So here's some info from fiddling with names:

The ten most common names are:

  1. <blank/> (4364 PCs)
  2. Test (1206 PCs)
  3. Bob (1003 PCs)
  4. test (505 PCs)
  5. Varis (469 PCs)
  6. Rhogar (461 PCs)
  7. DamienDM's Character (350 PCs)
  8. Steve (322 PCs)
  9. Ragnar (320 PCs)
  10. Jack (299 PCs)
From my first look at the data set after downloading, I've been thinking it needs some cleaning. There are several PCs in the first few rows with either all 8's for abilities or all 0's, which I was thinking should be excluded from any analysis. I would also exclude blank and test names. But what about this Damien guy? And how many people name their test PCs Bob?

Being a fan of gnomes with lots of name, I next looked at the number of words in each name (actually the number of spaces, so these counts could be a bit off):
  1. 538,739
  2. 565,092
  3. 70,796
  4. 18,805
  5. 6,177
  6. 2,332
  7. 1,102
  8. 492
  9. 284
When looking at the PCs with at least five words in their name, I was disappointed to find that gnomes only came in fourth (after humans, elves, and half-elves). So then I just decided to look at the longest names in the dataset by character count. The winner is Gragnok "Bob" Stone Crusher, Last Heir of the Thundering Holds, Keeper of the Tome of Rebirth, Herald of the Returning Steps. So clearly, not everyone named Bob is a test character. What interested me more was the second longest name: Zorkxire, Shield of the Land, Protector of the People, Servant to Skadrea, Folk Hero of the Realm, and Holder of the Shards. What is interesting is that it's in the data set three times. I really wish there was an anonymized player ID so I could see if Gragnok and Zorkxire were made by the same person. There's four more names like that, each with multiple entries, and then you get some names with tons of spaces after them.

In terms of data cleaning it might be seem reasonable to keep only one Zorkxire blah blah blah. But it doesn't seem to be a good idea to keep only one Rhogar. How detailed does a name have to be before you can call it a duplicate? And if you automate that, you could take out all of DamienDM's PCs, which may actually be unique.

Anyway, further data exploration is needed, and I would be hesitant to analyze this data raw.
 


With respect, the data set doesn't speak to how long the player has been playing, or show us that after playing a simple fighter, the players go for more complex options. The "new player" thing is a supposition.

This scraped data set does align with what I remember from the times D&D Beyond spoke to "active players" (they can categorize that way by looking at sheets that have taken short/long rests, are levelled up over time, and so on). Fighters are popular. They work for folks. That's good.

Yeah. I wonder how to match up "Player Sheet" to "Time Played". Also, there are different ways characters might be used which would impact the analysis: What characters are NPCs? How many are for one-shot sessions, or for organized play, or for campaigns? How many are for beginner players and how many for more experienced players?

I'm thinking additional information is needed to put the data to use for particular decision-making.

TomB
 

Yeah. I wonder how to match up "Player Sheet" to "Time Played". Also, there are different ways characters might be used which would impact the analysis: What characters are NPCs? How many are for one-shot sessions, or for organized play, or for campaigns? How many are for beginner players and how many for more experienced players?

I'm thinking additional information is needed to put the data to use for particular decision-making.

TomB
I think DnDBeyond would have that internally so it’s nice to see a sorta match with them.

Also can you tell how many sheets don’t have max HP? Might be a proxy for a lower bound of how many saw play.
 

With respect, the data set doesn't speak to how long the player has been playing, or show us that after playing a simple fighter, the players go for more complex options. The "new player" thing is a supposition.

This scraped data set does align with what I remember from the times D&D Beyond spoke to "active players" (they can categorize that way by looking at sheets that have taken short/long rests, are levelled up over time, and so on). Fighters are popular. They work for folks. That's good.
I mean, that's all true. But I was speaking specifically of their graph which showed the delta of class occurrence between forum surveys and the scraped Beyond data. I think making the supposition (which is all it is) that people who answer forum surveys on dedicated gaming sites are more likely to be experienced players than the average D&D Beyond user is a logical extrapolation.

"New players gravitate towards simpler options" isn't really something that needs to be proven, right? Almost every complex game I have, whether board game or computer game, provides some kind of marked new player option. assuming that there are options that exist on a gradient of complexity.
 


The abilities are harder to analyze. Subrace is not given, so we cannot distinguish between variant humans and normal humans, or different subraces of dwarves, which would have different ability adjustments. We can calculate how many ASIs they should have gotten based on their levels, but again you have the variant Human problem, and some people may be giving free feats at first level. So it's hard to estimate the starting abilities, to possibly analyze ability generation methods.

I can tell you this: 12.6% of the data has either all 8's or lower for their ability scores, at least in terms of point buy. That is, the point buy cost for their abilities is 0 or less. Oddly enough 52% of PCs have point buy score of 27, with 32% of the PCs having the standard array (the 32% is part of the 58%). This suggests racial bonuses were not applied. However, I am not seeing blank races in the data. There are also several point buy values below 27. They go as low as 18 before you start getting 0's. There 70,018 thousand of those, or about 5.8% of the data.

Edit: Duh, the low (but above 0) point buys could be bad rolls on 4d6 drop low.
 
Last edited:

Remove ads

Top