D&D General Data from a million DnDBeyond character sheets?

ichabod · Jul 21, 2023

Oofta said:
I'd be curious how this data was actually derived.

I've been trying to avoid posting this, but screw it. This link gets you the JSON for the character that I've been talking about: https://character-service.dndbeyond.com/character/v5/character/XXX. Just replace XXX with the character ID. This allows you to see how the abilities in the CSV data match the abilities in the JSON, and the ASIs and racial bonus are stored separately in the 'choices' section. So I believe that's how they built the data set.

It's my understanding that there are limits on how often D&D Beyond lets you pull that link, but I haven't tried any actual web scraping to see what those limits are. I was waiting until we had a consensus on a trimmed data set before doing that.

FrogReaver · Jul 21, 2023

ichabod said:
So am I, but are you fine with having a character that started with all 18's in the data? I'm not.

I am. Especially since the number of such characters is low enough that their inclusion won't really effect other results.

ichabod · Jul 21, 2023

FrogReaver said:
I can’t speak for all but if I go through the trouble of creating a character it’s likely one I’m at least interested in playing even if I don’t ever end up playing it due to real life constraints, ex: time, no campaign to play him in, etc.

So I’m my mind it’s probably just as important to see what characters people are interested to play as it is what they actually played, and even more so when we don’t have a clear delimiter around which is which.

But it's not trouble to create a character on D&D Beyond. A couple clicks and you have a character. I think there's a lot of junk in there with people just playing around with the character generation system.

FrogReaver said:
I am. Especially since the number of such characters is low enough that their inclusion won't really effect other results.

That argument goes both ways. Their exclusion won't affect the other results much either.

Oofta · Jul 21, 2023

ichabod said:
I've been trying to avoid posting this, but screw it. This link gets you the JSON for the character that I've been talking about: https://character-service.dndbeyond.com/character/v5/character/XXX. Just replace XXX with the character ID. This allows you to see how the abilities in the CSV data match the abilities in the JSON, and the ASIs and racial bonus are stored separately in the 'choices' section. So I believe that's how they built the data set.

It's my understanding that there are limits on how often D&D Beyond lets you pull that link, but I haven't tried any actual web scraping to see what those limits are. I was waiting until we had a consensus on a trimmed data set before doing that.

Yeah, if I were to do this, I'd be careful to set up something on a timer. They may not have a limit on how many you can pull, they may track how many bad calls you make and I certainly don't want to accidentally do a DOS attack.

FrogReaver · Jul 21, 2023

ichabod said:
But it's not trouble to create a character on D&D Beyond. A couple clicks and you have a character. I think there's a lot of junk in there with people just playing around with the character generation system.

I’m sure there are. I don’t think there’s a good way with the data we were given to parse which is which.

ichabod said:
That argument goes both ways. Their exclusion won't affect the other results much either.

That’s true but IMO it’s easier to have them included in the base dataset and then later exclude them if desired than it is to remove them from the base dataset and then include them later.

FrogReaver · Jul 21, 2023

Oofta said:
Yeah, if I were to do this, I'd be careful to set up something on a timer. They may not have a limit on how many you can pull, they may track how many bad calls you make and I certainly don't want to accidentally do a DOS attack.

Theoretically we have a list for what should be valid calls. So maybe that helps a bit!

ichabod · Jul 21, 2023

FrogReaver said:
I’m sure there are. I don’t think there’s a good way with the data we were given to parse which is which.

That's exactly what we're doing if we remove characters with all 8's in their abilities, which I think we are agreed on. Certainly we're agreed on all 0's. The disagreement we are having is how much further to go.

Here's the thing, FrogReaver. I think we both agree that the data needs to be trimmed, but we disagree how. I think it would be helpful if we were working from the same trimmed data set when we talk about the data. If you agree with that, how can we come to a compromise on what to trim from the data and what not to trim from the data?

Oofta · Jul 21, 2023

Looking at the link @ichabod provided above, there's a ton of data in there. Far more than what the data dump has. However, without a schema definition it would take significant effort to decipher. It seems to have basically all the data you need to recreate your character sheet, including all the class descriptive text.

If I get really, really bored I may try parsing out one or two of my characters just to see what exactly we could get.

FrogReaver · Jul 21, 2023

ichabod said:
That's exactly what we're doing if we remove characters with all 8's in their abilities, which I think we are agreed on. Certainly we're agreed on all 0's. The disagreement we are having is how much further to go.

Here's the thing, FrogReaver. I think we both agree that the data needs to be trimmed, but we disagree how. I think it would be helpful if we were working from the same trimmed data set when we talk about the data. If you agree with that, how can we come to a compromise on what to trim from the data and what not to trim from the data?

Maybe it will help some to understand what I’m sitting up on my end.

I’m starting with the dup removed set I just use it as the starting point for further trims.

I then want the next dataset to be what we broadly agree with on trimming. Let’s call this the ‘Type 1 error dataset’. Goal is for it to not exclude any data that should be there. Which means basically when in doubt include.

I’m also good with a ‘Type 2 error dataset’ where we trim the data to the point where we are more or less certain what’s remaining is valid.

I’m good posting results based on type 2 dataset unless I want to talk about some of the data we excluded from it. For example it would be interesting to know that hypothetically 200 of 300,000 characters had all 18 stats.

Does that work for a compromise?

FrogReaver · Jul 21, 2023

Oofta said:
Looking at the link @ichabod provided above, there's a ton of data in there. Far more than what the data dump has. However, without a schema definition it would take significant effort to decipher. It seems to have basically all the data you need to recreate your character sheet, including all the class descriptive text.

If I get really, really bored I may try parsing out one or two of my characters just to see what exactly we could get.

Please have it on my desk by EOD. Thanks.

D&D General Data from a million DnDBeyond character sheets?

ichabod

Legned

FrogReaver

The most respectful and polite poster ever

ichabod

Legned

Oofta

Legend

FrogReaver

The most respectful and polite poster ever

FrogReaver

The most respectful and polite poster ever

ichabod

Legned

Oofta

Legend

FrogReaver

The most respectful and polite poster ever

FrogReaver

The most respectful and polite poster ever

Similar Threads