Except it doesn't. That's my whole point. It DOES NOT do that. It makes ABSOLUTELY ALL characters of a given race have that trait. But that's ridiculous SPECIFICALLY for the reasons I've cited. NOT all members should have that trait. That's literally the point. Just as, even though real-world men are on average meaningfully stronger than real-world women, it is trivial to find men who are not just "not as strong as average woman" but comparable to the bottom end of female strength.
That's what I'm saying. You are simply, factually INCORRECT to say that this +2 whatever ACTUALLY represents the difference in central tendency. Because you SHOULD see basically the entire spectrum. That's the point. You SHOULD see Str 8 orcs sometimes. You SHOULD see Dex 8 elves sometimes. They should not be common, but the fact that they aren't common IS what "the average Orc has +2 Str" MEANS. It does not, and never has, meant that absolutely every orc has an innate +2 Str.
THAT is the gamist abstraction I am railing against. Because it DOES NOT conform to the way actual, living populations work. It elides the real, measurable behavior of actual populations for a gamist simplification, abstracting all "is an X" characters in the exact same way.
Averages represent what is likely. That's the whole point of central tendencies. They represent what is likely. They do not, and cannot, represent the spread of the data. That's not the function averages (of any kind--means, medians, whatever) DO. They literally do not perform that mathematical function.
Certainly, there is no reason a location shift by a constant should work that well for modeling the difference between two arbitrary distributions.
Does it work sort of ok for going between heights of adult men and adult women as they are roughly normal and the standard deviations aren't ridiculously far apart? (With any truncation for biological constraints so far out it doesn't particularly come in to play for general modeling.)
On the other hand, it is ludicrous for going between the height of 3yo boys and adult human males. Not only is the mean very different, but so are the standard deviations.
Is the case of halfling strength vs. human strength vs. Goliath strength better modeled for the die rolling case by using different dice? Maybe the Halfling is 2d4+d6, the human 3d6, and the Goliath d6+2d8 (or whatever). Perhaps let each roll an extra of the most common die type and discard the lowest roll. That gives the means and standard deviations both growing with size, as well as the maximums but not minimums.
Coming up with a percentile based table for STR point buy based on that shouldn't be hard if the goal is maintaining percentiles. If the goal of point buy is to control total modifiers, then it could be done by limiting Halflings to, say 14, Humans to 18, and Goliaths to 24.
If it's of concern the distributions are skewed you could take best of some dice, or disregard some numbers rolled, but that seems a lot of work to model only that approximately.
Assuming one isn't capping things or using extra dice, is a location shift down for halflings (minimum 3) and location shift up for goliaths (maybe not capping) closer than doing nothing?