• The VOIDRUNNER'S CODEX is LIVE! Explore new worlds, fight oppressive empires, fend off fearsome aliens, and wield deadly psionics with this comprehensive boxed set expansion for 5E and A5E!

D&D General Data from a million DnDBeyond character sheets?


log in or register to remove this ad


ichabod

Legned
Looking at the link @ichabod provided above, there's a ton of data in there. Far more than what the data dump has. However, without a schema definition it would take significant effort to decipher. It seems to have basically all the data you need to recreate your character sheet, including all the class descriptive text.
Yeah, there's a lot of data there. It seems to have all the class features to 20th level even if you don't have a 20th level character. From what I can see, though, I think we can reconstruct the character's original abilities, all classes and levels for multiclass characters, and which blank backgrounds are custom backgrounds that D&D Beyond allows.
 

ichabod

Legned
Maybe it will help some to understand what I’m sitting up on my end.

I’m starting with the dup removed set I just use it as the starting point for further trims.

I then want the next dataset to be what we broadly agree with on trimming. Let’s call this the ‘Type 1 error dataset’. Goal is for it to not exclude any data that should be there. Which means basically when in doubt include.

I’m also good with a ‘Type 2 error dataset’ where we trim the data to the point where we are more or less certain what’s remaining is valid.

I’m good posting results based on type 2 dataset unless I want to talk about some of the data we excluded from it. For example it would be interesting to know that hypothetically 200 of 300,000 characters had all 18 stats.

Does that work for a compromise?
That sounds like a good compromise. I think before we start we should clearly define "data that should be there." I was thinking about another dataset while at lunch, a "by the rules" dataset. This may end up being the Type 2 dataset, but if it isn't, I think it would be a good dataset to make as well.

I see no problems with discussing odd subsets of the whatever data as long as it's clear what's going on. I did a double check and found 1,623 characters with all 18's in the UID data.

I think we should take a few more days poking around in the data for potential issues before working out the Type 1 & 2 criteria.
 


FrogReaver

As long as i get to be the frog
That sounds like a good compromise. I think before we start we should clearly define "data that should be there." I was thinking about another dataset while at lunch, a "by the rules" dataset. This may end up being the Type 2 dataset, but if it isn't, I think it would be a good dataset to make as well.

I see no problems with discussing odd subsets of the whatever data as long as it's clear what's going on. I did a double check and found 1,623 characters with all 18's in the UID data.

I think we should take a few more days poking around in the data for potential issues before working out the Type 1 & 2 criteria.
Agreed on all!
 

Lanefan

Victoria Rules
True, but since they are houseruling right out of the gate like that, then aren’t they automatically outliers?
Depends how common that houserule or variant turns out to be, doesn't it?

I mean, personally I'd say the true outliers would be those who play exactly by the rules as written with no variance whatsoever.
IOW if we’re looking at this data to see trends in how people are playing the game, shouldn’t we start out by ignoring people who aren’t actually using the rules of the game?
Absolutely not!

If you want to see trends in how people are playing the game then you need to look at how people are actually playing the game, which includes houserules and kitbashes they might have applied in order to make the game their own.
 

Lanefan

Victoria Rules
That is my point about the inventory. It is reasonable to conclude that a character intended for play, not just as an experiment, will have all the starting bases covered: race, background (even if custom), class, maybe subclass (depending on class and level), proficiencies, starting equipment, starting gold, bio stats (height, weight, etc.).

The question is: how many characters not meeting the above criteria are experiments, and how many are future frameworks for existing characters?
Another thing to keep in mind is that some of those incomplete-looking character sheets might be bare-bones versions for online-play purposes only, or for the DM's quick reference, with the real character sheets kept physically by their players.

We don't use DDB but with roll20 that's what some of us have: quickie online sheets for the DM to reference while the full-ride sheets are on paper with the players.
 
Last edited:

Lanefan

Victoria Rules
I think it's good to look at house rule characters and see if they deviate significantly from the other data in other ways. One thing I am really concerned with is the 44.9% of characters who are level one. I think that's where your mass of unplayed characters are. Fifth edition is not lethal enough to kill half of all starting characters.

If we look at the full data set, but trimmed of duplicate character IDs (which I am going to call the UID data from now on, and I think would be a good baseline for discussions), 44.9% are level 1. If we look at the ones without backgrounds, 50.5% are level 1. That's suggestive, but not a huge difference. However, if we look at characters with the default name ('<username/>'s Character'), 61.9% are level 1. I think that's enough of a variation to exclude those characters, at least if they are level 1.
There's no way of knowing whether those were true experiments or whether they were actual characters rolled up for campaigns that then never got off the ground. Or, less likely but still possible, the campaigns were specificially intended as one-shots. I'd say that characters rolled up for campaigns that quickly collapsed are legit for data purposes.

IME anyway, if for whatever reason a campaign's going to collapse that collapse happens within the first few sessions before the characters have got past 1st level.
 


Voidrunner's Codex

Remove ads

Top