D&D General AI Art for D&D: Experiments

Jfdlsjfd · Apr 26, 2025

Kichwas said:
So yesterday I wanted to make a pic for a token for an upcoming one-shot.

I fed the same prompt into ChatGPT, Microsoft Designer, and a local copy of Stable Diffusion using the 'DynaVision XL' SDXL model checkpoint (essentially the base training model).

Got three extremely different results:

ChatGPT:

View attachment 403467
Microsoft Designer:
View attachment 403468
Local Stable Diffusion:
View attachment 403469
"A vibrant 1970s fantasy style RPG illustration of a freckled dark brown-skinned dwarf woman, age 29, in a fullbody shot. She wears a platemail breastplate, and her black wavy hair is tied into a ponytail, adding to her fierce and majestic appearance. She holds a giant fantasy warhammer in both hands, standing confidently in a bustling underground dwarven cavern fortress. The cobblestone streets are lined with lively market stalls, filled with various goods from weapons to food. The underground town is filled with dwarves in medieval merchant attire, engaging in various activities, creating a lively and dynamic atmosphere. The lighting is moody, casting a deep tone over the scene. The dwarves are seen bartering and socializing, with the overall atmosphere being one of camaraderie and industriousness. The camera captures her full body."

While I am not sure it will give you the style you're after, SDXL-based models are usually not responding well to long-ish prompt.

This prompt does well with more recent models, but also fails to describe the style. Maybe more description of the actual style would help?

cbwjm · Apr 26, 2025

Jfdlsjfd said:
While I am not sure it will give you the style you're after, SDXL-based models are usually not responding well to long-ish prompt.

View attachment 403540

This prompt does well with more recent models, but also fails to describe the style. Maybe more description of the actual style would help?

This looks more like a human hanging out with a bunch of dwarves; like Snow White if she was more likely to smash your skull in instead of eating an apple.

Nightfly · Apr 26, 2025

Twiggly the Gnome said:
Did ChatGPT see the word "socializing" and turn the dwarves into Communists?

Socializing, hammer, industrious, goods, camaraderie.

I can actually sort of understand how the A.I. got "communism" out of that. Kind of almost maybe. Like if you can imagine the sensibility of a robot, but crossed with a student barely paying attention in Poli Sci, you get something that would read that prompt and say, "Oh! You mean communism!"

Jfdlsjfd · Apr 26, 2025

cbwjm said:
This looks more like a human hanging out with a bunch of dwarves; like Snow White if she was more likely to smash your skull in instead of eating an apple.

Yeah, I think there aren't enough image of female dwarfs for the model to have a clear idea of what it is. Maybe one should actually describe a short woman with beard instead of "female dwarf".

Scribe · Apr 26, 2025

Nightfly said:
Socializing, hammer, industrious, goods, camaraderie.

I can actually sort of understand how the A.I. got "communism" out of that. Kind of almost maybe. Like if you can imagine the sensibility of a robot, but crossed with a student barely paying attention in Poli Sci, you get something that would read that prompt and say, "Oh! You mean communism!"

I actually love that it made that leap.

Hammer? Clearly you mean Hammer and Sickle.
Socializing? Nay, SOCIALISM.
Industrious? Take control of the tools of Industry!
Camaraderie?! Comrade amirite?!?

Jfdlsjfd · Apr 26, 2025

Lanefan said:
What's with the stupid-big weapons in the second and third images there?

I think it's adhering to the prompt correctly. It mentions "a giant fantasy warhammer" held in both hands. It kind of missed the "both hands" part, but did a giant fantasy hammer, the hammer equivalent of the huge, oversized manga swords.

Kichwas · Apr 26, 2025

Jfdlsjfd said:
While I am not sure it will give you the style you're after, SDXL-based models are usually not responding well to long-ish prompt.

View attachment 403540

This prompt does well with more recent models, but also fails to describe the style. Maybe more description of the actual style would help?

What was your prompt, models, and setting for this one? Looks like you've landed on a much better set of models and loras that I had.

Jfdlsjfd · Apr 26, 2025

Kichwas said:
What was your prompt, models, and setting for this one? Looks like you've landed on a much better set of models and loras that I had.

I used your prompt, minus the elements that don't really represent graphical elements:

"A vibrant 1970s fantasy style RPG illustration of a freckled dark brown-skinned dwarf woman, age 29, in a fullbody shot. She wears a platemail breastplate, and her black wavy hair is tied into a ponytail, adding to her fierce and majestic appearance. She holds a giant fantasy warhammer in both hands, standing confidently in a bustling underground dwarven cavern fortress. The cobblestone streets are lined with lively market stalls, filled with various goods from weapons to food. The underground town is filled with dwarves in medieval merchant attire, engaging in various activities. The lighting is moody, casting a deep tone over the scene. The dwarves are seen bartering and socializing. The camera captures her full body."

The model was HiDream-Full, one of the best open source model currently available.

Kannik · Apr 26, 2025

Scribe said:
Is that the same thing as Bing/Dall-E?

Adding to what @Kichwas said, from what I can tell bing.com/create is the Dall-E 3 purely text-based one so you can't feed it an image or edits or tell it to take the previous image and iterate on it, but you get 15 fast credits per day before it switches to slow mode.

Hmm, looks like there's a Flux model now for generating RPG stuff, wonder how well it works. Could be useful to generate a base to then i2i it in another model for the style desired?

Jfdlsjfd · Apr 26, 2025

Kannik said:
Hmm, looks like there's a Flux model now for generating RPG stuff, wonder how well it works. Could be useful to generate a base to then i2i it in another model for the style desired?

In my experience, to significantly alter the style of the image with an image-to-image workflow, you need enough denoise that only the basic composition of the image remains. I have had good success using i2i with first generating a basic outline with a model very apt at understanding my prompt and then using i2i with a less precise, yet more pretty model, but I don't think starting with flux and i2i in a model with the right style would work that well. The fine details would be lost during denoising.

I've tried doing an i2i by telling chatGPT 4o to identify the style of the second image in the original post and apply it to the image I had made earlier with HiDream, and the result was this one: