I made an encounter simulator in python. This program calculates average damage output of two parties in 4e. I put 4 level 3 PCs (a cleric, fighter, ranger, and wizard) against a young green dragon. Before I added 'frightful presence' to the simulator the PCs did bad against the dragon on average. But after I added that feature it was a *lot* worse. This power really makes a big difference in potential outcome.
A 4 PC level 3 party against a young green dragon level 5 solo is an n+3 encounter. This is considered hard. So by definition, a TPK is a likely outcome.
Does your program heal unconscious PCs, have PCs stay out of range of area attacks, have a defender try to hold down the dragon while the other PCs either pot shot it or aid the defender in some way, allow the PCs to use Daily items, etc.?
One has to take such results with a grain of salt unless the program is designed to seriously emulate player equipment, tactics, and racial abilities.
Your "do not go nova" results seem to not less newsworthy. Players will almost always go nova against a dragon, so those results seem interesting, but not important.
Every dragon encounter in our campaigns has resulted in the PCs winning and sometimes winning quickly. Sure, they will be beat up, use up a lot of resources, and it is definitely a challenge. But, we have not run into the string of bad luck rolls that would result in a TPK. We have done that with lesser encounters, but because they were lesser encounters, it still did not result in a TPK. Just a lot of resources because the dice are cold.
I have noticed that during the beginning of a campaign when the players are not as familiar with their powers and the abilities of their fellow PCs, that they are merely a set of numbers on the character sheets. Everyone more or less does their own thing. As they get experienced with each other, there is a teamwork that is greater than the sum of the parts where PC tactics start getting seriously stronger. I suspect that your program does not take this into account and just compares damage and conditions.