If we asked the DMs in this forum to, to best known numbers, each report how many characters of each species and of each class their games have seen over the years, with luck we'd end up with a pretty decent amount of numbers which in aggregate would give us a pretty good idea of what's been/being played in the wild.
The actual study of data collection suggests otherwise. Eyewitness report is some of the most trusted evidence around, but it's also some of the most
flawed evidence around. The vast majority of overturned convictions on the basis of DNA evidence, for example, had the original conviction heavily based on eyewitness testimony.
I don't know whether the VTTs track the number of sessions played by each game; if they do, it wouldn't be much further to track by character-session.
To the best of my knowledge, they do not. Even if they do, most sessions are simply...people being active on their account. Such data is often not carefully tracked. You could probably
find it, by combing through the database, but it would take rather a while to do even if you could just sit with a static copy of said database.
It's not that it isn't
computationally possible--it totally is, and depending on how you do it, perhaps even relatively easy--it's that that baseline of "code it so it collects that data" is often not done because it's not a useful end-user feature, nor a useful back-end feature for debugging or the like. It's only useful for people wanting to do the kind of analysis we're talking about here, and thus it just...doesn't get collected.
Yes, there'd be things like that, but they would apply (one would think) roughly equally across the board in aggregate.
Which is a major assumption about the data. "I'm going to
assume this has no impact on the data," without a specific reason why (other than "because it doesn't seem like it should have a bias," I mean) is exactly the kind of assumption that makes sociology and human psychology so
fraught as sciences. It's not that they
can't be useful, but that even when an assumption seems small and simple and unproblematic, it can very very easily be...not any of those things.
We'd be looking for subtle differences, but I very much suspect they'd be there to find.
Okay, but the point stands. If they're subtle differences, and we're already allowing for a known and serious fuzz factor (premature campaign ending), it's quite possible for those subtle differences to disappear in the noise.
There's a quantum mechanics experiment that comes to mind which fell prey to this. TL;DR: people claimed that QM violates the "pigeonhole principle." In layman's terms, the pigeonhole principle says, "if you have N boxes to put objects in, and put at least N+1 objects in those boxes, where N is a natural number (1, 2, 3, etc.), then
at least one box MUST have more than one object in it." The experimenters created a setup that they claimed violated this principle: they sent three electrons down two pathways, but the detector at the far end showed no measurable drift compared to sending each electron one at a time. Hence, somehow, three electrons were flying down two paths, but all three acted like they were alone on their path, contradicting the pigeonhole principle.
Then a second group came along and said, "Hey, wait a minute. What would it
look like if the electrons DID push each other apart, exactly?" And it turns out that the amount of disturbance you would expect from these three electron beams was orders of magnitude smaller than the
pixel size of the detector they were using. (Equivalently, they would have needed electrons that had orders of magnitude stronger forces pushing each other apart.) The whole experimental setup was betrayed, not by its concept, not by the physics or the
way they collected the data, but by the
imprecision of the data they collected. Their tools couldn't see the difference between a positive result and a negative one.
We agree that the effects of various character options (race, class, whatever) on PC-character longevity would be real subtle. We also agree that there are confounding variables that would be difficult or even impossible to control for. Those two are the one-two punch of "this question, even if it has a clear answer, may not be discoverable because of the limits of how we collect our data." Adding in the further imprecision of relying on DM self-reports and frankly all bets are off.