Ovinomancer
No flips for you!
Okay, but this seems a distinction that isn't teasing out a real difference -- it's still a menu, it's just whether or not the players feel bound to pick from the printed menu or if they make choices that are checked against the hidden menu. You still aren't going to be doing anything the GM hasn't prepped in these games, yes?The difference is table expectations: if, for a given strategic decision, one is expected to pick from those options laid out by the GM (obligating the PCs to follow the GM's lead on campaign direction), then that decision would be non-sandboxy and lower the campaign's sandbox percentage. If instead the expectation is that it's ok for the PCs to make any strategic choice they want (obligating the DM to follow the PCs' lead on campaign direction), then that decision would be sandboxy and increase the campaign's sandbox percentage. (Campaigns run in a style where picking from a laid out list or making an open-ended decision is not a valid dichotomy for how IC strategic decisions are made simply wouldn't fall anywhere on the spectrum I've specified.)
I'm not sure this is useful unless there's an analysis of what off adventure choices are actually available to the players. You seem to be comparing two things, either follow the GM's breadcrumbs OR do whatever you want the GM will oblige. I'm not sure this is realistic, or that a good sandbox has a high percentage of the latter as a matter of course. If I pull out a setting book with lots of details, for instance, and let players pick which things they want to go engage with, then I'm not really letting them do anything, I'm providing exposition based on the menu from the setting book. But, if I'm running a game like Dungeon World, where the game is entirely reactive to what the players are doing, I don't think this actually qualifies as even more sandboxy. But it does under your criteria.That's why I'm focusing on campaigns rather than modules/APs/systems. Any given table running a module can decide how often to expect the PCs to follow the module, vs how often to expect the GM to adapt/expand the module's setting to accomodate whatever the PCs decide to do.
For example, a table could use a module and decide that when making top-level strategic decisions on what to do, the players are expected to choose to engage with the module's content. But that same table could simultaneously expect the GM to adapt to unorthodox ways to tackle the content in the book, including (e.g.) travelling off the module map to go on a diplomatic tour to raising a multinational army. That campaign would have a much higher sandbox percentage than a campaign where the table instead expects the players to not only choose to engage with the module's content, but also to stick to one of the expected paths through that content. Conversely, it would have a lower sandbox percentage than a campaign where the GM is expected to follow the players even if the players decide to ignore the module's content entirely.
So, I think there's something lacking in the evaluation, here, and that this missing part is actually 1) quite important and 2) kinda disables the evaluation you're trying to make.
I'm cool with discussions on, and disagreements about, the spectrum's merits as an analytical/discussion tool. I'm just trying to show, against argument to the contrary, that the sandbox spectrum exists and has utility.

I think many games will be in the middle, though, so a tool that only effectively categorizes the extremes is -- well, not a very good spectrum.And sure, the middle of the spectrum is messy. It's definitely too messy (i.e. imprescise) to make it a useful large-scale cataloging tool, but I think it's still useful for comparing a small number of campaigns to each other (if all such campaigns are of a type that fits on the spectrum, of course). If there happens to be disagreement about which of two campaigns has a higher sandbox percentage, then the further analysis provoked by trying to place the campaigns on the spectrum will itself likely be illuminating, showing either disagreement about what the table expectations are for a given campaign, conceptual differences about what counts as an open-ended decision, or how/whether to weight certain types of decisions over others. In other words, I think the spectrum is useful despite its messiness/imprecision.