The older clip model was limited to 77 tokens (theoretically) and the weight of those token diminished quickly.
Let's imagine I want a weretiger (a concept I know the model isn't trained on) battling to heroines in a cemetery at dusk , one woman a redhead ninja with blue outfit and the other woman weargin a formal dress, all fighting with swords, in a pre-raphaelite painting style It's pushing the models to the limit voluntarily.
If I try with this:
An hybrid of man and tiger, with bloody broadsword, battling two woman, wielding swords, one woman red hair and blue ninja outfit, the other woman brunette with a white robe, the hybrid wears a loincloth, cemetery, pre-raphaelite painting, dusk
It contains the information needed, but the best image out of 8 that I got was:
I think it got lots of things right, but my ninja outfit is forgotten, no sword is bloody, and my weretiger, while nice, isn't what I had in my head, more of a loincloth wearing man with a tiger head. TBH, I can't swear that he is indeed wearing a loincloth, but I'll say so if the Internet police asks. The keyword "dusk", at the end of the prompt, had so little influence on the image that it... isn't dusk at all. While present, the cemetery is minimalistically present. Also, they don't really seem to be fighting, but the model is weak on violent scenes -- it's a model that is one month old, few finetunes are available yet.
But the larger text model understands more word, so I can be more descriptive. Let's try with this longer prompt, and apply the same best out of 8 selection.
The scene is set in a cemetery at dusk, painted in the style of a Pre-Raphaelite artwork. At the center of the image, a hybrid creature, half-man and half-tiger, faces one woman at each side. The hybrid figure, muscular and fierce, grips a large, jagged broadsword in one hand, dressed only in a loincloth, showcasing his tiger-like face with fur, claws, and a menacing snarl. He holds a huge broadsword covered in blood.
The women at the left has striking red hair, wearing a sleek, form-fitting blue ninja outfit with intricate black accents, a high collar, and a hood that completely conceals her face except for the eyes, enhancing her stealthy appearance. She wields a slender katana.
The other woman on the right, a brunette, grips a curved sword while dressed in a flowing white robe, creating a graceful yet powerful presence.
Their swords clash under the eerie light of the cemetery, surrounded by ancient, weathered tombstones and overgrown vines. The sky above is a fading orange and purple, signaling dusk, with a mystical, almost surreal atmosphere enveloping the entire scene.
There are errors. The redhead ninja isn't really a ninja outfit, and the model thought close-fitting meant tight at the hum, I guess the polite English word is "lower back"? They still don't look like they battle, really, but outside of that, there are several more details that it got correctly: the weretiger is more to my idea of the beast, the large broadsword is indeed covered in blood, it's more golden hour than dusk but at least it's better, the cemetery is more present, the ninja has a facemask (or more exactly, the dress concept didn't bleed on her)...
Statistically, longer prompt get interpreted better and have the benefits that you can fit more details than with previous models.
Let's imagine I want a weretiger (a concept I know the model isn't trained on) battling to heroines in a cemetery at dusk , one woman a redhead ninja with blue outfit and the other woman weargin a formal dress, all fighting with swords, in a pre-raphaelite painting style It's pushing the models to the limit voluntarily.
If I try with this:
An hybrid of man and tiger, with bloody broadsword, battling two woman, wielding swords, one woman red hair and blue ninja outfit, the other woman brunette with a white robe, the hybrid wears a loincloth, cemetery, pre-raphaelite painting, dusk
It contains the information needed, but the best image out of 8 that I got was:
I think it got lots of things right, but my ninja outfit is forgotten, no sword is bloody, and my weretiger, while nice, isn't what I had in my head, more of a loincloth wearing man with a tiger head. TBH, I can't swear that he is indeed wearing a loincloth, but I'll say so if the Internet police asks. The keyword "dusk", at the end of the prompt, had so little influence on the image that it... isn't dusk at all. While present, the cemetery is minimalistically present. Also, they don't really seem to be fighting, but the model is weak on violent scenes -- it's a model that is one month old, few finetunes are available yet.
But the larger text model understands more word, so I can be more descriptive. Let's try with this longer prompt, and apply the same best out of 8 selection.
The scene is set in a cemetery at dusk, painted in the style of a Pre-Raphaelite artwork. At the center of the image, a hybrid creature, half-man and half-tiger, faces one woman at each side. The hybrid figure, muscular and fierce, grips a large, jagged broadsword in one hand, dressed only in a loincloth, showcasing his tiger-like face with fur, claws, and a menacing snarl. He holds a huge broadsword covered in blood.
The women at the left has striking red hair, wearing a sleek, form-fitting blue ninja outfit with intricate black accents, a high collar, and a hood that completely conceals her face except for the eyes, enhancing her stealthy appearance. She wields a slender katana.
The other woman on the right, a brunette, grips a curved sword while dressed in a flowing white robe, creating a graceful yet powerful presence.
Their swords clash under the eerie light of the cemetery, surrounded by ancient, weathered tombstones and overgrown vines. The sky above is a fading orange and purple, signaling dusk, with a mystical, almost surreal atmosphere enveloping the entire scene.
There are errors. The redhead ninja isn't really a ninja outfit, and the model thought close-fitting meant tight at the hum, I guess the polite English word is "lower back"? They still don't look like they battle, really, but outside of that, there are several more details that it got correctly: the weretiger is more to my idea of the beast, the large broadsword is indeed covered in blood, it's more golden hour than dusk but at least it's better, the cemetery is more present, the ninja has a facemask (or more exactly, the dress concept didn't bleed on her)...
Statistically, longer prompt get interpreted better and have the benefits that you can fit more details than with previous models.
Last edited: