So here's some info from fiddling with names:
The ten most common names are:
- <blank/> (4364 PCs)
- Test (1206 PCs)
- Bob (1003 PCs)
- test (505 PCs)
- Varis (469 PCs)
- Rhogar (461 PCs)
- DamienDM's Character (350 PCs)
- Steve (322 PCs)
- Ragnar (320 PCs)
- Jack (299 PCs)
From my first look at the data set after downloading, I've been thinking it needs some cleaning. There are several PCs in the first few rows with either all 8's for abilities or all 0's, which I was thinking should be excluded from any analysis. I would also exclude blank and test names. But what about this Damien guy? And how many people name their test PCs Bob?
Being a fan of gnomes with lots of name, I next looked at the number of words in each name (actually the number of spaces, so these counts could be a bit off):
- 538,739
- 565,092
- 70,796
- 18,805
- 6,177
- 2,332
- 1,102
- 492
- 284
When looking at the PCs with at least five words in their name, I was disappointed to find that gnomes only came in fourth (after humans, elves, and half-elves). So then I just decided to look at the longest names in the dataset by character count. The winner is Gragnok "Bob" Stone Crusher, Last Heir of the Thundering Holds, Keeper of the Tome of Rebirth, Herald of the Returning Steps. So clearly, not everyone named Bob is a test character. What interested me more was the second longest name: Zorkxire, Shield of the Land, Protector of the People, Servant to Skadrea, Folk Hero of the Realm, and Holder of the Shards. What is interesting is that it's in the data set three times. I really wish there was an anonymized player ID so I could see if Gragnok and Zorkxire were made by the same person. There's four more names like that, each with multiple entries, and then you get some names with tons of spaces after them.
In terms of data cleaning it might be seem reasonable to keep only one Zorkxire blah blah blah. But it doesn't seem to be a good idea to keep only one Rhogar. How detailed does a name have to be before you can call it a duplicate? And if you automate that, you could take out all of DamienDM's PCs, which may actually be unique.
Anyway, further data exploration is needed, and I would be hesitant to analyze this data raw.