For what it's worth, here's my opinion. For frame of reference, I'm coming at this from a 3/3.5E standpoint. There are some differences between 3E and 4E, and I'll point them out where they are significant, but for the most part Strength scores are comparable between editions. So...
I don't think that mechanical representations of gender differences are necessary, but if a group wants that, it's not that difficult to do without overly penalizing anyone. But, if one is going to impose "realism" because of Gender, one should also impose realism as far as reasonable Strength limitations and minimum body weights.
I did a quick and dirty (somewhat scientific) comparison of Human Male and Female Strengths based on body weight, and in reference to current Olympic Weightlifting World Records (Clean & Jerk). Unfortunately, the IOC doesn't keep track of such records for other character races.

This is the results:
Maximum real world Strength in D&D terms is: Males = 23, Females = 21
Amazingly, D&D agrees with the real-world here, as Maximum Human Strength Score in D&D is 23 (20th level, from the character advancement charts for both 3E and 4E). If you count Epic (above 20th level for both 3E and 4E) that can go as high as 26. That's significantly greater than the real-world, but since we're talking about mythical Hercules and such, it still works for me.
Real-world, I'm making the assumption that both Men and Woman have essentially the same physical potential as far as Dexterity goes. I'm making this assumption based upon a definition of Dexterity as only the efficiency of a persons mind-muscle neural connections and "Fast-twitch" muscle response. Granted, Men have an increased running speed potential than woman - but running speed in the real-world is as much an element of "strength" and cardio-vascular/pulmonary capacity as it is "quickness" (unlike D&D which mostly bases it on Dexterity only).
Also real-world: though there are differences between male and female brains as far as how we process, view the world, etc. - I don't believe there is a quantifiable mechanical difference between men and women as concerns Intelligence and Wisdom. So, I'm assuming men and women are equal in this regards also.
Constitution is a very general Ability (but then again, so are all the others) that combines many things into physical toughness, such as: ability to resist disease (bacteria, viruses, environmental damage, etc.), ability to resist poison, a quantification of structural/physical toughness, etc. If you look at each thing that makes up Constitution, there may be some that Men are more resitant to than Woman, but I think that the opposite may be true for others. Then if you break it down into different specific things (different viruses, diseases, etc.), you'll find differences there also. So in the end, I believe it's a wash.
Charisma combines too many things (personality, charm, attractiveness, etc.), that are all so subjective from the point of any individual viewer, that there's no way to say Men or Woman are "objectively" more or less Charismatic in comparison. There's no doubt that Men and Woman are different in how they project and utilize Charisma, but I don't think it's possible to nail down an objectively quantifiable difference. So, I'll call this one a wash also.
So, IMO the only thing that seems to present a clear and objective difference both mechanically and quantifiably, is Strength.
But for balance purposes, if one is instituting a penalty, one should also probably institute an offsetting bonus. So, with the above in mind, here's my Human Gender Adjustment Houserule(s):
Human Male: Race as written.
Human Female: -2 Strength, +2 Dexterity
or
Maximum Strength at 20th level (without magical or other enhancement): Male = 23, Female = 21
Maximum Dexterity at 20th level (without magical or other enhancement): Male and Female = 23
Strength Score / Male Minimum Weight / Female Minimum Weight
18 / 100+ lbs. / 100+ lbs.
19 / 110+ lbs. / 125+ lbs.
20 / 120+ lbs. / 160+ lbs.
21 / 135+ lbs. / 200+ lbs.
22 / 160+ lbs.
23 / 210+ lbs.
Also, height should be set accordingly. It's highly unlikely that a 120 lb., 6' tall Man is going to have a 20 Strength (5' would be more realistic).
This allows for starting scores making Males stronger and Females more dextrous, and carries those differences throughout charcter advancement (unless a player decides not to focus on those Abilities), but still limits those abilities at real-world maximums. It also requires a realistic body weight for comparable strength. So you don't have a an average Female Human (weighing say, 150 lbs.) having a 21 Strength (like Zena
).
Or, you could ignore the bonus/penalty portion, and only enforce the Maximum Strength Limits. This way, you can have female characters that can start the game just as strongly as male characters (with 18 Strength), but still adhere to real-world limits.
Also, for your enjoyment or fodder (depending on each individuals preference

), attached are the charts I used to record my "scentific" research. (Source was the list of current Olympic Weightlifting World Records: Clean & Jerk, from Wikipedia
here).
Snatch and
Clean & Jerk are the most comparable to the D&D "Lift Over Head", with "Lift" being a "Dead Lift" and "Drag/Pull" being self-explanatory. I used the Clean & Jerk category as the weights lifted were universally higher than the Snatch (and we are talking about Heroes here - they're obviously going to use the method with the best results

).
These charts are based on 3/3.5E (where applicable). 3/3.5E and 4E are mostly comparable for ability scores/carry capacities, just with 4E using a more simplified method of calculation (that does cause some differences at different Strength scores, though nothing significant enough to change a Strength Score compared to real-world World Records). Also, 4E doesn't differentiate a "Lift Over Head" like 3/3.5E does - so, since the basic carry and lift weights are mostly comparable, I've made the assumption that the "Lift Over Head" weights would be comparable also.
Enjoy.
