D&D General DALL·E 3 does amazing D&D art

So I discovered that I can run flux locally

Flux gives better result with long, detailed prompts, unlike SD or SDXL. It's closer to Dall-E developped prompt in what it wants. Do not hesitate to mention all the details you want. Or use another AI to elaborate a 200-word description around your initial prompt (in my experience, that's Flux sweet spot, 150-200 words. Above that, it starts to miss a lot of things).
 

log in or register to remove this ad

Kannik

Hero
AI is bad at centaurs. Dall-E isn't the only one to struggle, even the current SOTA model (Flux) can't. They need specialized training, probably we don't have enough centaurs taking selfies to post on their social medias.
Totally, 'taurs of all kinds need to start posting more selfies! ;) I've recently started playing around a bit with SD locally and created one half decent 'taur thus far (below, as an experiment, still needs work to finish), but I'm really intrigued to give it more attention both with some 'taur LORAs as well as prompting with an img2img conversion (sketching out a rough 'taur shape and seeing if it can fill in the gabs) and/or generating upper and lower bodies separately and stitching them together through an img2img.

Simbataur - In Progress.png
 




Flux gives better result with long, detailed prompts, unlike SD or SDXL. It's closer to Dall-E developped prompt in what it wants. Do not hesitate to mention all the details you want. Or use another AI to elaborate a 200-word description around your initial prompt (in my experience, that's Flux sweet spot, 150-200 words. Above that, it starts to miss a lot of things).
Can you give me an example? I'm still dumb on this.
 



Can you give me an example? I'm still dumb on this.

Well, of course :)

In a former post you used the prompt ""(((full body visible))) 20 yr old woman with pink dyed hair dressed as an assassin in a burning factory, intricate, highly detailed,8k ultra-realistic, colorful, painting burst, beautiful symmetrical face, a nonchalant kind look, realistic round eyes, tone mapped, intricate, elegant, highly detailed, digital painting, art station, concept art, smooth, sharp focus, illustration, dreamy magical atmosphere,4k, looking at the viewer"

This style of prompt generate tokens that the model use but it doesn't know the relationship between them. That's why earlier model either performed better when they had a single subjects (all the keyword pointed to the only subject or the image in general) or they faced concept bleed. If you had need two women, adding "wearing a yellow totebag" would lead the AI to randomly decide which girl should wear it. You could improve the odds by removing colons and putting the keywords close together, but the limitations of the "text encoder" part of the AI model showed quickly. The original text encoding model (clip) was 500 MB in size and could only do limited encoding of text into token usable by the image-generating model.

Newer models use a much larger text encoder, T5-XXL in the case of Flux, that is around 45 GB in size unpruned. It is able to understand much, much more natural text and relationship between words in order to generate tokens that can represent more complex things. However, if you prompt this newer encoder with prompts in the old style, he still doesn't know which woman is holding the yellow totebag, because it wasn't in the prompt. Also, to make sure the large encoder is used well, the image-generation part of the model is trained on image with much, much longer description, so it can learn concept more easily, so it responds better if he's prompted in a natural language. I'll post a few images and prompts to illustrate later today.
 


Split the Hoard


Split the Hoard
Negotiate, demand, or steal the loot you desire!

A competitive card game for 2-5 players
Remove ads

Top