Let's try it this way.
A player shows up with two characters, with identical stats. One of them was made with point-buy (and therefore is balanced, by your definition) and the other was rolled (and therefore not balanced, by your definition). The player won't tell you which one was which; only that one set was generated randomly and the other was selected. How will you know which character is balanced and which one isn't?
I imagine your response will be something like "that would never happen because because because," and skip the problem itself: there are two identical sets of numbers that were generated in two different ways. To me, this suggests its not the ability scores themselves that are unbalanced, as it sounds like you're saying, but it's actually your perception of their creation that feels unbalanced.
@CreamCloud0 already covered this, but I thought I'd throw my hat in the ring too.
Your premise is flawed, by way of a fallacy of equivocation.
We can talk about the balance of a method, which will need to take into account the
spread of possible results. A single character cannot, even in principle, have a "spread of possible results". They have the results they have. Hence, the sense of "balance" which applies to a generation method cannot be 1:1 identical to the sense of "balance" which applies to a generated result.
We can talk about the balance of a stat array, which will need to take into account its relative strength in context. A method--which does not have a singular result, other than Standard Array--cannot, even in principle, have a "relative strength
in context" because it is of its very nature decontextualized. That's what makes it a method and not a result. Hence, the sense of "balance" which applies to a specific result cannot be 1:1 identical to the sense of "balance" which applies to a generated result.
Two characters which coincidentally have the same stats, despite generating them via different methods, will have identical balance-in-context, even though the methods are not identically balanced. Similarly, two different methods may or may not be balanced against one another, but I would presume that method A at least
possibly generating results that method B would generate is a necessary prerequisite for the two methods to be balanced with one another.
Or, for a perhaps silly but hopefully illustrative example, consider the "All 18s method" vs "3d6 strict method". Technically speaking, both methods can produce the same result. It's nearly impossible for the latter to do this, e.g. odds worse than 1 in 101
trillion, but it is technically possible. Would you say that the two characters, the
results, are balanced, given the results are identical? Would you say "all 18s" is balanced, as a method, when compared to "3d6 strict"? As I said, this is intentionally extreme to prove a point: two methods can be
wildly out of balance with one another, despite the fact that each method might (regardless of probability) potentially produce the same result.