You seem to be assuming that each play tester will play one build of one class over 20 levels, in the time remaining. I suspect that the "focused, directed play" that Mearls mentioned will be "fairly boring" in comparison. Something closer to each tester plays the same two levels of the same class over and over again with different builds, weapons, and items. Or running the same encounter four times with different levels of resources already expended, or adding one monster each time, etc.
I imagine there'll be some of that. And that is very important. Running a party through the same fight twice, seeing how it works and rebalancing, revising the rules, and then running them through it again. And that is valuable and necessary.
But they're likely doing that internally, and not getting the Alpha/Friends 'n' family testers doing it as well. And there's no real reason they couldn't do that
and continue the public test.
Plus, a rigorous regimented test doesn't give you feedback on creative spell uses, combos, unusual monster combinations, unusual tactics and the like. There are so many things that work fine under ideal circumstances but might break or play weird under nonstandard play.
For example, the 4e paladin's mark worked and played fine by the book. Until you didn't acted in the expected and could run around kiting a monster or getting free damage.
There's other worrying things, such as the absence of so many races and classes. They've worked so far to make sure the classes feel right and overlooked the bard, which might be one of the hardest classes to get right. They have to get the bard right the first time, because they have to use that feedback to design the final draft of the bard. If it's terrible they don't have a new packet coming to reveal a second revision version of the bard to meet with people's approval.