As I stated earlier you have been looking for a way to "justify" your belief not for a way to actually "measure" anything for a true comparison.
That is a flawed basis.
I've defended my dislike of the Theurge style classes on a variety of bases. I needed no additional methods. In the end, though, arguments pro and con still were matters of
opinion.
SPR wasn't conceived with an intent to justify my dislike, it was
discovered. I didn't create the feat, and its not like I cooked the books or anything.
Anyone using the same methodology and doing the math (correctly, of course) would receive the same answers.
IOW, no bias.
My dislike enters into things only in the sense of a revelation, as in "Perhaps that's what's really been bugging me."
Perhaps a different measurement <snip>
That analysis has much merit, though you still have the problem that not all spells of the same level have the same quality.
Another issue with the SPR is that it is using feats that have built in limiting factors as a basis.
Sure, but so does comparing individual spells- again, which is better, Summon Monster 1, Magic Missile, or Sleep?
Each spell has limiting factors- range, duration, stacking limits, SR and even the nature of their intended targets. Some are more effective on fighters than on wizards and vice versa. Certain energy types are more likely to be resisted than others. Some spells are more effective on undead, while others are utterly useless against them. What is the value of immediate damage versus damage over time? (OK, that last one gets a LOT of analysis by the guys at Goodman Games.)
All of those variables make it difficult if not impossible to directly compare one spell's quality to another- they lack common statistically comparable ground, a common denominator.
If this were a scientific experiment, what would you do if the evidence didn't support your theory?
What a good scientist does- go back to the beginning and formulate a new theory.
The thing is, as I pointed out with the physics example, there
is support. The Theurge classes may lack a certain punch, like the ton of feathers, but the arcane potential is still there, like the 1 ton steel weight.
Had 4Ed not come along, its entirely possible that that potential could have been released in an abusive way by a new 3.5 product (if it hasn't already- I have lots of stuff, but not everything).
People, as I've said often enough, this was never intended as the end-all-be-all measurement. Goodman Games' product is full of metrics that let you compare the Wizard and Sorcerer- it doesn't venture beyond the 3.5PHB- and it still has no definitive answer as to which is "better."
This is just another metric, as (potentially) are the Innate Spell analysis or Spell Level Squared.