I have to say, this would be incredibly unscientific with "a group". I take methodological issue with:
1. The construct of "masculine challenges". How is that defined?
2. A lack of control group. I think it'd be important to have 3 groups: all female, all male, and mixed. (Studies have shown strong differences in behaviors of both genders in mixed groups rather than homogeneous groups).
3. A small "N". I think you'd be testing the personalities of these few females if you did have but one single group.
4. A well designed study would also include "feminine challenges" to counterbalance "masculine challenges".
5. To understand "masculine and feminine challenges", one would need to read the latest on gender research. Unsurprisingly, a lot of the earlier findings were pretty flawed (boys are not better than girls at math, for example, nor are they more aggressive overall -mayyyybe more violent, but not more aggressive).
6. Coming off of point 5., if designing this, it might be more appropriate to develop "challenges of various types, assessing different constructs"...so rather than "masculine" or "feminine" have constructs in game that assess "time limited problem solving" or "multi modal problem solving" etc. Then use statistical analysis to determine if there is a gender factor.
Just as an aside, most of the research shows that individuals are not very different dependent upon gender, but that social situations often call for "expected" gender differences to be expressed.
An additional factor in roleplaying games that would confabulate this study would be whether the players were playing characters of their own gender, the other gender, or a group with a mix of their own versus the other gender. I could imagine imposed stereotypes shaping how the "character" responds to a situation, rather than the "player" responding to a situation.
Even with the above, avoiding the question of "what gender is the character"... often (and especially for newbs to rpgs) characters are cliches or sterotypes. If I'm playing my very first elvish archer, I might be playing Legolarias...who just responds to situations as I would imagine Legolas might. I honestly think that'd be a much larger factor than the gender of the character or the gender of the player.
There's a lot to think about, and I don't see a viable study being done without a lot of research being done to carefully design the study as well as a lot of time and effort running numerous games with a large number of people.
EDIT: To further confabulate this, if you're selecting a group of your female friends, that's pretty clearly a "non-random" group - this will say little about "women as a whole" and much more about "your friends". Also, rereading your OP, and I'm trying to say this gently...I don't mean to cast blame...there's some bias evident. By this, I mean that your study is defining "masculine challenges" and your very hypothesis based upons societal conceptions (and misconceptions). "Will girls act girly?" seems to be the question, with they hypothesis being "yes". It doesn't seem to draw upon what is "girly" from literature (i.e. actual gender differences) so much as what marketing has tossed out there.
One additional confabulation that occurred to me: the gender of the DM, and in your case, the fact that the DM has a previously established relationship with the subjects.
It might be an interesting blog. I don't think it's offensive. But a scientific study? I'd ask you to be cautious in framing it that way (with the intimation that one might be able to extrapolate results to a larger population). An interesting project, yes. Scientific method? Maybe not so much.
