That would certainly be ideal, for somebody who knows a lot more than I do about machine learning.If you want to do this seriously, you'd probably want to brew up an AI learning system. You'd have them fight a sequence of random toughness encounters; maybe you'd feed the AI the CR of the encounter, as PCs know that an ancient dragon is different than 3 goblins.
For a given loadout of a party, you'd let the AI optimize how long they last against such a random sequence.
Feed something like that to Alpha or similar learning AIs and you should get resource management falling out of it. I mean, they can feed it starcraft and it beats grandmasters.

That said, it might be possible to do a much less sophisticated variant using an evolutionary strategy. Give the simulated players a set of weights to plug into a formula for determining when to use a spell (number of targets available, CR, et cetera), and then do as you propose and run a series of "gauntlets" to see which strategy gets you farthest before TPK. Randomly adjust the weights, re-run, and pick whichever variant performs best. Repeat to zero in on the optimal values.
I might try it sometime. Not right now, though.