D&D General DALL·E 3 does amazing D&D art

Wenta of Hreyhawk prototypes:

_cd1e2080-1d11-49da-9816-7a1c6ae843f3.jpeg
_1d159e96-b31b-44ec-b10e-a92ccf650a68.jpeg
_6d7aabf4-496c-403f-ba9f-fc76c7d56e86.jpeg


And final result

Priests with vestments decorated by wheat symbols offer a mass with beer and large loaves of bread on a wooden altar with a giant painted wooden statue of Wenta in a fantasy tavern filled with diverse worshippers cheering and chugging beer. Wenta is a young, rosy-cheeked, robust woman with straw in her hair, holding a large mug of beer. high quality digital painting.

_efbac80f-724e-46be-bb7a-f61aa7d8d927.jpeg
_405a768b-ec58-4b73-88c8-13226b098bfe.jpeg
_8e99e476-9f30-4fd4-b049-30ac99ab11a1.jpeg
_30b00b03-b5c7-483f-8f69-bc76867a140e.jpeg
 

log in or register to remove this ad

Greed...is God.

Puritan minister in a pulpit in front of a giant statue of Zilchus in an incense filled gilded mosque chastises diverse medieval merchants in german garb. Zilchus is a well-dressed Medieval fappearancrchant of plain appearance, middle-aged, with brown hair and eyes, tanned skin, and a dignified demeanor, and smiling slyly, clothing is expensive without being ostentatious, carries a gold purse and a flail with which to chastise the dishonest. high quality digital painting

_9e0b8d5f-a734-42a5-8960-1f2aadf74479.jpeg
_902f43c2-62a2-43c3-8646-3902f4ced58b.jpeg
_5e8d7d03-6eaa-467b-a8d5-9bb8673aadce.jpeg
_4f5e49d6-31e3-46bf-84e7-97250899017f.jpeg
 


@Reynard: Would you be willing to share the different art styles for these?

I don't get why "floating eyeballs dripping with ichor" is acceptable but "bare feet" is not.

While D3 is good at avoiding concept bleed, we don't really know what's behind the hood. Depending on the computing power available in the language analysis part of the image generator, it can be good or bad at identifying words that go together. I suspect Microsoft is using a lot of expensive hardware for this so it's good, but it might not be perfect and can produce mistakes. That's when you get accidentally get NSFW images.

For example, there is nothing wrong with generating a man with blue shirt and a green tie and black pants, when the prompt was for man with green shirt, black tie and blue pants. It's not what is wanted but that's just the engine collating the word wrongly. There is also nothing wrong with not understanding that someone breathing fire has is head in a camfire while you wanted a dragonlike fire breath. It's just the engine failing at correctly understanding the word salad we call natural language.

HOWEVER, imagine what can happen if you use innocuous words and they get applied to another context. "A girl with bare feet with her hand crossed above her chest" can become embarassing if bare gets applied to another word.

Also, concept bleed also can happen where the prompt is just not precise enough. AI doesn't know the objects themselves. They are just statistically generating something that is often close to the thing prompted and "hope" to get the desired outcome. So, if you give in your prompt no informaiton on the girl clothes, and you just mention "bare feet", in your mind that means she is clothed except for her bare feet. But the AI is drawing bare feet and then statistically decides that the feet is connected to a leg, about which he has no information. Statistically, the AI will decide that the leg should have the same skin tone than the foot (because when he doesn't do this, the meatbag isn't happy) but... sure, next to a bare feet there is a good chance that the leg is covered by trousers. So most of the time, it will draw trousers, even if you didn't mention it. But there is a statistical chance that the bare feet is connected to a bare leg. Then he'll draw upward and look at the crotch... and well, you can see what the problem is.

Even if it occurs only one time in a thousand, Microsoft can't have bing be the latest teenager tool for drawing nude girls and get banned from schools and parental blockers.


So, I think they took measures to prevent anything bad happening in case of goof, which can block perfectly innocuous images.
 


In my long-running Dungeon of the Mad Mage campaign, the PCs have a rival named Rex the Hammer. He is a bully and views his party members as expendable, so every time the PCs meet him, he's got different allies. Halaster uses Rex and co to finish up what the PCs start but can't seem to finish.

Because the PCs are now 16th level, I've given Rex some beefier allies. He's still a warlord, but I'm going to give him a Red Wizard (using the free Sofina stats), a shadar-kai gloom weaver, and a MotM abjuration wizard.

Here's my first attempt at generating some female Red Wizard of Thay art for the not-Sofina character:

Gritty comic book
_4ad2004e-9e19-4e4e-b77f-f6e2937023c0.jpg _3858fc26-6477-45d5-877a-69b8625eabac.jpg _46ed8d1c-060f-4091-903a-4d59ca9dc88a.jpg _dda9e0d3-893f-47ba-9d24-428a1b9406cc.jpg

Oil on canvas
_6a1039bd-fd37-4074-a275-6f593b3516af.jpg _f6ad5806-8f59-44e3-a53c-636ad32ca728.jpg _f92e7cc4-b7c1-431f-bab1-5ca433d0f34f.jpg _fe19839b-bc7f-42bb-89b7-c13e388179b2.jpg

Some tough choices, but I think I might go with the second or third oil on canvas image.
 
Last edited:

muscular brahmen offer puja to a giant statue of Kord in a classical gymnasium, while muscular grevo-roman wrestlers clap and sing uproariously stomping their feet in a rhythm. Kord is a hugely muscular god standing nine feet tall, with a red beard and long red hair, wears a fighting girdle made from a red dragon's hide, gauntlets from a white dragon's hide, and boots from a blue dragon's hide. high quality digital painting

_2bd1d9df-5316-4611-b9a2-6b933279e557.jpeg
_03cc8783-e599-4e19-aa1b-f2e547e52d4b.jpeg
_3b34e924-c886-492f-b890-e8bca9214828.jpeg
_87910c0f-e2ab-49de-8c98-2c5f0d765d49.jpeg
 

Thanks!


Is it really?! I did not know that.
Yes, that's the answer I was dancing around. It's a surprisingly popular fetish topic, as referenced in the hilarious "Welcome to the Internet" song by Bo Burnham. I strongly suggest you do not go down the rabbit hole on this one, but it is something of an open secret that "<famous person> feet pics" is popular enough as a search term for Google to recommend it to people some of the time. The incidence appears to be about 3-4 times greater among men and LGBTQ folks than heterosexual women, so...yeah, <famous person> tends to be women specifically.
 

fantasy clerics offer puja to altar with a 20 foot tall marble statue of Murlynd with various fantasy knights and wizards and nobles bowing in worship. Murlynd is wearing a cowboy hat and denim jacket and jeans holding two six-shooters and wearing a sherrif's badge. sunlight cathedral with geometric stained glass. high quality digital painting

_3b807598-f84e-410e-af7f-69d50b996f20.jpeg
_468aba5b-51e4-4fe4-a7ef-c6e93dd98b17.jpeg
_9b2d06ad-6098-4957-9ae2-f3b9dd39743a.jpeg
_db3c2939-e1f9-471b-98e3-a54259b8c15b.jpeg
_3a3b9f4f-065b-471a-8534-ed61b735d5f0.jpeg
 

Also, someone mentionned not having a lot of control on the output. Honestly, they could. But it would require getting the generated image and having an AI trained on NSFW images to detect if the generation is OK or not for general viewing. It is something that can be done. From what I understand, it is a way of training the models by having an adversarial AI identify the good outcomes from the bad. But training requires a lot more computing power than just infering images, so I doubt they can afford to run such a system with both great efficiency and very short generational time.

For Stable Diffusion installed at home you can do things with the SDXL model with 8 GB of VRAM and a card starting at GTX 30x0 series. But it will push your machine to its limit. The older, SD models, especially the fine-tuned ones, can give good results with much lower requirements. Avoid AMD graphic cards at the moment (it's just not optimized and it will take agonizingly long time to generate an image).

ATM, I think the best use is to get a good image from D3 as a starting point, with basic prompt, and improve the details (like faces, when they all come up as clones in D3) and remove the odd things like I had once where one of "the students taking notes" in a magical university was taking notes on a Macbook. This "touch up" work is easier on hardware. Also, the strength of SD is the large amount of fine-tuned models and "extensions" to models. While a lot of them are NSFW by design, and others are just... odd (there is an extension to turn characters into the texture of croissants, what the...) you can get very good specialized results (like differentiating among the kind of elves by loading an extension that teaches that HouseElf is a word, associated with the Potter imagery, while DDElf is another valid word, pointing toward our more regular elves.

Also, many tool for composing the scene, but I think starting from D3 is probably easier.
 

Remove ads

Top