In my experience at the table, the answer and most important litmus is success percentage.
It really means nothing if the PCs are OP, underpowered, someone has a powerful magic item, etc. so long as you understand that normally you want to give the players a 45%-70% chance of succeeding at anything in the game. When the circumstances warrant a probability outside those ranges, skew probabilities accordingly.
The reason for this system of thought, is that I found through play testing that the players and myself had the most fun within these ranges. Understanding the math allows you to tweak things a little in the background when you think it will improve the player's experience. DC, armor class, NPC stats, ability checks, whatever you can think of may benefit from this strategy, so I've explored that a bit.
Success percentage is only useful as a metric if:
1. All relevant actions are equally attempt-able by all participants;
2. All relevant actions require a roll of this kind in order to succeed;
3. No actions which do not meet the previous two requirements are relevant to play.
Each of these assumptions is false for D&D. There are a great many actions which can only be attempted by people who can cast spells; there are very few (I would argue zero, but I'm sure there are others who will balk at this claim) that someone who cannot cast spells cannot attempt. Even among people who
can use magic, most classes cannot equally access all spells, so it is still possible for there to be within-magic-users an imbalance. As with this and all following paragraphs, please treat "character who takes a spellcasting subclass," "character who has received items created by a spellcaster," and "character who has collected enough bling that they can
emulate a spellcaster to a meaningful degree" are all, as far as I'm concerned,
weak spellcasters rather than non-magic-using characters who have "expanded" or whatever.
Spells are just the most
obvious example of actions that often (though far from always) do not require a roll-for-success.
Fly simply works; it does not require rolling for success to see if the target
gets to fly.
Tongues simply works; there is no need to roll to determine whether the spell actually permits speech. Etc. Notably, there are essentially zero things non-spellcaster classes can do which achieve automatic successes of this nature, except by....finding ways to cast spells themselves. They are stuck on the "did you succeed or fail" paradigm; magic-users are capable of transcending that paradigm, and do so quite frequently. (Hell, they can even transcend it
in the area of combat, that's literally what
magic missile does!)
And even if we aren't considering spells, there's a whole host of touchy-feely type actions that cannot be accounted for under the "does it have at least a 60% chance to succeed?" metric. For example, do you permit your martials to perform crazy not-IRL-physically-possible stunts with the aforementioned percentage chance of success? Do you let your magic-users invent new spells, or creatively re-use spells that exist? Whether or not it is even
possible to do these things is, in fact, part of the game experience and can absolutely lead to terrible,
terrible imbalance if poorly handled. It's one of (several) reasons why people fight so bitterly hard for actual, in-mechanics enforcement of making martial characters awesome, because DMs being doubtful of the possibility of things that
are actually physically doable IRL is a huge, HUGE problem with D&D as it stands. Many groups must endure DM adjudications that don't even let Fighters do things IRL Olympic athletes can do, let alone things actual fantasy-physics warriors could theoretically achieve. It's just not "realistic" to so, so many DMs, who are thus actively perpetuating the problem.
-----
As for my own definition of "balance"? It is when a game has:
1. Established clear, testable design goals,
2. Determined the range of reasonable/acceptable results around those goals,
3. Iteratively tested these design goals through rigorous playtesting and serious statistical data-gathering, and
4. Altered the design and numbers of the system until the results fall within the reasonable/acceptable range.
Actual, serious design, which sets a goal and genuinely pursues it, is a pretty major undertaking. It requires a concerted effort, and ideally, both a solid background in statistics and someone on hand who is actually
trained in constructing and analyzing surveys for truly useful feedback. Few games have ever been designed with actual, serious design. 4e is one of them, and that's why it's still my favorite system to this day. Some of the chosen design goals were not wise for what players were wanting from the system, e.g. monsters were designed to be too "safe" and thus fights tended to drag early in the edition. However, by taking that feedback, they were able to go back, adjust the numbers, and produce new results that
did in fact meet the new design goals (in this case, combats tending to last 3-4 rounds, rarely 5, as opposed to 4-6 rounds, rarely 7+.) And just in case someone takes me as a 4e partisan here: OD&D was
also a pretty damn well-balanced game, it just had zero interest in
transparency (whereas 4e was very transparent); indeed, it almost sought out being as opaque as possible, hence the advice in some early version of the DMG (I want to say it was the 1e version but I could be mistaken) where it basically said "if your players have ever read the DMG, punish them, they have forbidden knowledge and should not be allowed to exploit it!" But OD&D is very, very well-designed
for its intended goals.
5e was not designed in this way, and it is really,
really easy to see that it wasn't. Even if we hadn't had things like the "ghoul surprise," there have been Significant Issues in the math and design of 5e from before it was ever called "5e" officially. What is particularly frustrating about this is that the designers
wasted almost two years faffing around doing almost-random
stuff, when they very easily COULD have been doing actually serious design, and almost all of the so-called "surveys" they collected were absolutely
atrocious push-polls designed to confirm what they wanted to hear, being absolute
garbage for learning how people really feel.
Serious game balance does not come at the cost of making a game dry or dull, despite the many assertions to the contrary. Serious game balance just comes at the cost of
not permitting some options to be dramatically superior to others.