Unfortunately, your solution is untenable, for the reason you outlined (WotC won't make the changes you want). If you won't accept any other solution, you've put yourself in a tough space.
I do not share your confidence in this claim.
Particularly given the apparent runaway success of BG3, which is nearly perfect for making a virtual testing environment that can run numbers and give a useful spread of statistical results. It includes terrain (obstacles, impassable barriers, hazards, slowing effects, height difference, etc.), a wide variety of implemented monsters, many abilities including instant-death ones (e.g. intellect devourers), utility magic, conditional effects, all sorts of stuff.
It--obviously!--cannot handle the sheer creative potential, and careful decision-making, of actual human beings. It cannot generate new ideas, and in all likelihood, even some things it could theoretically test will simply be too difficult or cumbersome to actually express within its engine's code. But it can do a hell of a lot, and it can collect that data at lightning speed, allowing you to do the equivalent of hundreds or
thousands of hours of live playtesting with the push of a button. If coupled with some fancier tricks (like, say, something which automatically generates varied terrain and encounters indexed by intended difficulty), it can even be used to do something like actually getting some kind of feel for how impactful terrain features can be on encounter difficulty. If that's feasible, it could open up room for an entirely new set of tools and advice for DMs on how to make their encounters both better
and more fitting to their vision for their campaigns.
This really isn't
that big an ask in a world with computers. Particularly since they've already expressly said that they're making their own virtual tabletop. Basic statistical modeling. I'm not even talking ANOVA, I'm genuinely just saying basic tests like hypothesis testing, goodness-of-fit tests, and proportion tests. Basic survey design, e.g. you don't make a push polls and shape your questions so it's not actually possible to voice relevant criticism. And some basic consistency on their standards for what gets multiple attempts vs what gets crapcanned on the first pass, e.g. you don't spend six to eight months trying to make Specialties or that martial bonus dice thing work only to quietly abandon both (and suffering serious consequences as a result of dropping them), while literally completely abandoning two whole classes and
never making another public attempt simply because things didn't go well on the first try.
None of this is hard. None of it is complicated. They're already doing some of it, and have access to tools that can do much of the stuff they haven't yet. A single survey consultant could fix up their survey design stuff right quick. You don't even need a stats consultant (though that would of course be incredibly useful)--just basic Stats 101 stuff is all you need.