Hey, I was wondering if you could make the raw data available? The analysis, while useful, is still flawed. Means are susceptible to outliers, of which I know there are several in the MM. I'm interested in seeing the median, the quartiles, and the standard distribution of the data.
This, I feel, is more useful for the DM, to learn what really is a challenge, what is exceptionally weak, etc. In addition, I have a gut feeling the data isn't centered around a single median, but has a few clumps.
The raw data I used is just the full text of the MM from the pdf (available here or here). I selected the entire pdf, then copied and pasted it into a text file.
What I can provide is the perl script I used for parsing. My first script, which I used for most of the statistics, operated on a text file that had a lot of somewhat manual pre-processing done to it, so that script wouldn't be very useful. The latest analysis script, though (for monster attacks), is much cleaner and operates directly on the pdf text, so I've included that script.
I'm hoping to spend a little more time on this soon and do some of the extra analysis that has been requested, and in the process I'm planning to convert the rest of the analysis over to the new style and add comments. If/when that happens, I'll have a much more thorough perl script I can post. Until then, this script at least contains the basics for separating out monster entries and such.
parse_MM_attacks.pl:
[sblock]
Code:
#!/usr/bin/perl
open MMFILE, $ARGV[0] or die("usage: $0 <filename>\n");
read MMFILE, $mm_text, 1e7 or die;
@entries = ($mm_text =~ /(^[^\n]*Level \d+ (?:Elite |Solo )?(?:Artillery|Brute|Controller|Lurker|Minion|Skirmisher|Soldier).*?Cha \d+ \(.?\d+\))/gsm);
for $entry (@entries)
{
$entry =~ s/\s+/ /g;
$entry =~ s/Refl ex/Reflex/g;
$entry =~ /Level (\d+)/;
$level = $1;
if($level <= 10) {
$tier = 0;
} elsif($level <= 20) {
$tier = 1;
} else {
$tier = 2;
}
$count[$tier]++;
$count[3]++;
@attacks = ($entry =~ /(\+\d+ vs\.? (?:AC|Fortitude|Reflex|Will))/g);
@def_seen = ();
for $attack (@attacks) {
$attack =~ /\+(\d+) vs\.? (AC|Fortitude|Reflex|Will)/;
if($2 eq "AC") {
$def = 0;
} elsif($2 eq "Fortitude") {
$def = 1;
} elsif($2 eq "Reflex") {
$def = 2;
} else {
$def = 3;
}
if(!$def_seen[$def]) {
$def_seen[$def] = 1;
$def_count[$tier][$def]++;
$def_count[3][$def]++;
}
$attack_minus_level[$tier][$def] += $1 - $level;
$attack_minus_level[3][$def] += $1 - $level;
$attack_count[$tier][$def]++;
$attack_count[3][$def]++;
}
if($def_seen[0] == 1 && $def_seen[1] == 0 && $def_seen[2] == 0 && $def_seen[3] == 0) {
$def_count[$tier][4]++;
$def_count[3][4]++;
}
}
for $def (0 .. 4) {
for $tier (0 .. 3) {
printf "%10.1f", 100 * $def_count[$tier][$def] / $count[$tier];
}
print "\n";
}
print "\n";
for $def (0 .. 3) {
for $tier (0 .. 3) {
printf "%10.2f", $attack_minus_level[$tier][$def] / $attack_count[$tier][$def];
}
print "\n";
}
Last edited: