It would be a tool that I and a group of people worked on from 2008-2010 before the project ended and was left in largely vaporware state.
That VTT could use a 2d map, scanned or otherwise imported. It would attempt to recognize objects and then prompt for user input to flag make changes in case something identified as a wall, floor, door, window, grass, cobbles, roofs, trees etc. needs correction.
The tool will then convert that 2d map to a 3d map, with wall heights, etc.. The user can then drag over seamless textures to make corrections to the resulting geometry. A breathtakingly LARGE asset base of textures is required though, else everything starts looking the same. I need variation, and I need that randomized and implemented via AI so I don't have to spend a lot of time on it. That's the key to 3D. Once you need specialized modeling skills or have to take a lot of time technically tweaking things? It changes from a Tool of the Many to a Tool of the Few.
That's what has to be avoided with 3d. Fail that test? It's not a useful tool for gaming other than for a specialized niche-of-a-niche of customers..
A tool like this can take any 2d map, including one in blue and white from an old module and make a 3d playable map of it in a modest amount of time. Lighting can be dragged over as well, as you can do now in Foundry VTT.
I wouldn't need much more than that grafted onto the existing elements in 3d Foundry VTT to make me insanely happy. Improved animations, spells, and sound effects. And a VAST library of 3d minis to use and populate my battlefields, too? Sure. That would be great.
Some other nice tweaks? Be able to click an area on a map with a bounding box to flag as "public" and adjust the slider for population density. The software will populate that area with a VAST assortment of commoners just going about their business, depending on the time of day. Allow me to define the sort of area the "public" area is in terms of who is most likely to be there (sailors and fisherman, monks, woodsmen and farmers, etc.. This would bring city streets and other buildings alive without requiring my attention to do it.
I used to want an awesome voice modulator, thinking that this combined with a virtual face to send as an image of the NPC over Zoom/Skype doing the talking would be great, too. And it would be great if it worked -- but my satisfaction level with current modulation and virtualization of faces has been very poor to date. Still, a tool which does that which is amazingly easy to use and makes use of excellent AI as models for visual depiction and infinite variation to faces and voices would be awesome -- if it worked.
Add all of that to a Microsoft Game Table style setup? Even better.