Follower of the Way
To be clear, I don't mean looking at player data. I mean using it as an in-house virtual testbed.Based upon the press coverage, I am reasonably certain that any data mining of BG3 will only tell us what various race and class combinations will result in people saying, "Yeah, I'd totally hit that."
Full-time in-house playtesters can only do a handful of encounters per day. A machine running BG3 code (without actually showing the 3D models) could run hundreds of combats an hour easily, perhaps far more. Different setups would yield different kinds of data. E.g. create monsters that you are very confident do exactly the kinds of damage they're supposed to, on average, during a fight (various "yardstick" encounters, so to speak.) Then test different group compositions against these yardsticks. Run the same scenario ten thousand times each for two groups that are identical except for having a Champion vs a Battle Master, and then do a couple hypothesis tests to see if the two distributions are consistent with one another or not (e.g. a goodness-of-fit test). Iterate this over various different setups so you can get actual data on the performance of various characters in combat, and make appropriate tweaks as needed to ameliorate unintended weaknesses or strengths (every class and subclass should have weaknesses and strengths, but unintended ones are likely a problem.) Out-of-combat stuff, as stated, requires actual human brains behind it--but now you can have your playtesters focus on that stuff, rather than spreading their efforts around.
Then, once that phase of testing is done, aka once you've shifted from asking how strong PCs are to knowing how strong they are, set up some standard party slates (say 10 parties with widely varying composition), and then use those as yardsticks for testing more interesting and diverse monsters. Again, some things cannot be tested in a code environment, they require human judgment to work correctly, and those are the exact things that you sic your live playtesters on. The stuff that doesn't critically depend on human judgment, however, can be run through the simulation a zillion times to check that the spread of results it generates fits the intended results. E.g., ten thousand runs each of parties A-J showing that ghouls are massively more deadly than they should be (aka, what actually bit the 5.0 design team in the ass when they tried some live playtesting) would mean that ghouls either need to be toned down a bit, or need to be marked as higher CR, or some other thing.
As stated, if we can get some randomization thrown into the mix as well, beyond just the dice roller (e.g. something that can generate reasonable random encounter maps, so it's not just five-a-side on a featureless flat plane), you could go even further. Use standardized monsters AND standardized parties, and then add terrain effects. Hazards, traps, muck, ice, kaiju stomach acid, the works. Check how these things compare to the exact same maps without said hazards, and boom--you can generate a mathematical representation of how much tougher features can make a fight. Naturally, as with all these numbers, it would be an approximate and statistical fit, not a diamond-perfect solution (in math jargon, it is a "numeric" solution rather than an "analytic" solution.)
And, to reiterate, this is not perfect. It cannot completely replace human playtesting, and it should never be expected to. Human judgment, preferences, needs, and interests will always be central to the process of making a good game. But an automated virtual testing environment does wonders for the parts of the game that really are just a lot of random-number generation and mathematical calculation, and combat balance is precisely that. Then, precious live-playtester time can be focused on the stuff that benefits most from live playtesters. What could have been months or even years of public playtesting can potentially be reduced to days, even hours of virtual playtesting.