Even if his initial prompt is a perfect descriptor of the composition and contents, the generator would almost certainly fail to deliver it in one try. But if they masked various regions out (either to alter or not alter), and meticulously calibrated all the settings of how much the changed portion could differ from what's being fed in and did iterative work; and/or then brought it into GIMP and did a paintover in some parts and put it back in and gave it refinement prompts (masking out or not masking out sections to re-render), he could eventually get there. The iterative process he described was doable before these cloud generators existed, when I tested it out on my aging GPU. That's why I assumed Maxperson was running an image generator on their own hardware. It would be impractical to do all that through cloud generating services.