Every Night I Dream Deeply

@deepdreamnights / deepdreamnights.tumblr.com

The Deep Dream/Deep Style/Neural Network art of Trent Troop, yeah, that guy. Still. Primary Tumblr is @therobotmonster the talking dinosaurs are at @dinoknights.

If you've enjoyed a big swig of my Gatorade Sports Broth...

Or any of the other stuff I hurl into the void to baffle and amuse whether through here, Melinoë Labs or DeepDreamNights, money is never not-tight, and tips/gifts are welcome.

Thank you for enjoying, and remember:

Doom Toots as He Pleases.

I need help with rent and with my phone bill $300 for the former, $50 for the latter.

Please help. I'm doing the best clowning I can under the circumstances.

Literally anything helps.

In a hilarious irony...

A robot mistook me for one of its own and I caught a ban from Midjourney of all places for using 3rd party access/automation when it was just me, in pain and with insomnia, stimming with MJ and sorting my recent project raw gens.

I've filed an appeal, but because a ban blocks you from the discord server, and all the tech support goes through discord, it will be anywhere up to 2 weeks(!?!) for it to be processed.

Normally, I'm very positive on Midjourney as an easy-access AI art system, but this customer service snafu is a poor oversight and/or user hostile design.

If you catch a ban in error you should be able to contact the support team.

Assembling your Cast in Vidu: "My Reference" First Impressions and Tutorial

(Disclaimer: I am in the Vidu Creative Partner Program)

For a long time, one of the major barriers for any kind of generative video is character consistency. The solutions were either using image-to-video for start frames or having a general reference option, usually the former, sometimes both.

Vidu uses both, but have recently updated their reference system. We're starting to get to the level of complex interface that I've been talking about in previous blogs.

Vidu's reference-to-video and other similar features were a good start, but with important limitations: You only had three images to reference per generation, and the AI had to guess who was what based on your text prompt. Combined with some quirks around how it handles model-sheet type images (more on that later) and it was a good start, but required a lot of finagling.

Vidu updated with a major improvement to the reference tool, one that lets you build profiles for each person or location you're using, consisting of up to three images, a short style prompt (about 1 line) and a description-prompt. These are saved to your profile so you can call them up at any time (saving a lot of redundant prompting of character details).

The robot will fill in the description for you, but I suggest editing it to your needs.

You can use up to three references per shot (scale between references is a little tricky), and, in a very helpful feature, you reference those characters by tag in the prompt body.

In the old reference system you'd tend to have to use a single shot for a character, then a model sheet, because if you did just the model sheet the characters would tend to "twirl" while animating in chaotic and unnatural ways, meaning you'd generally lose two slots to one character, but here you can load everything in one place.

I've only played with it a little bit thus far, but here are my basic recommendations:

  • The first image should be a single shot of the character, and the second and third images can be model-sheet character assemblages. If you're doing a reference specifically for talking head shots, you'll want the face-shot in the first slot, and full-body reference in the latter ones.
  • Try and use multiple-angle reference where at all possible. This just increases stability of the character in general, even if they don't show the alternate angles being shown. I have a tutorial for using Vidu to produce this kind of reference here.
  • Use the extra prompt space for specific visual instruction. The more direction you give, the less chaos and weirdness you get. (Assuming that's your goal.)

Making a Monster - A Midjourney/Photoshop Tutorial

Today, I'm going to be breaking down how I use Midjourney for character design.

I've recently figured out what I want to do with my joust at the Poke/Digi/Rancher-of- Mon type concept. A lot will be coming from that soon, I don't have a name yet, but the mathematical formula is:

"oops, all Gardevoirs" multiplied by LadyDeviMon + the square root of MOTU over Bluth.

A couple of BioCritters are getting ported over to this new concept, specifically the Waifusaurus evolution branch:

Which was itself a parody of pokemon that are essentially just ladies anyhow, so its probably more accurate to say Waifusaurus spawned unnamed LadyMon Project.

I made most of the first BioCritters in Dall-E3 through Bing Image creator, and Bing makes finding old prompts a pain, and while Straifu, the Lois Griffin parody form, is the subject of today's process, they followed the same prompt format I used with the Flintstones-inspired base Waifusaurus form:

vintage animation cell, a slender dinosaur-anthro housewife on flinstones, resembles humanoid dino the dinosaur, blue dinosaur-lady, purple tigerskin housedress, holding rolling-pin-made-of-rock 1963, in the style of 1960s hanna-barbera TV animation, character cel on white background, posed in a determined ready fighting stance

However, I have a specific look for this new project in mind, so its time to evolve the design.

Step one was to start with basic prompting. I built a new prompt that described what I wanted:

fullbody original production cel, white border all around, vintage animation cel, lavender humanoid woman-creature with large t-rex legs and tail, fan of feathers at the end, wearing teal button up blouse, bob haircut, clawed hands, lois griffin as a pokemon, female character design vintage cartoon screen capture (1993) by AKOM and TOEI , white background, beautiful variable-width black line art with cel shaded vintage cartoon color, painted backdrop, official media, UHQ 1996, official media, UHQ

I ran this in Niji 6, using the style moodboard I'd made for the purpose: --p m7298241701452185637 - largely a mix of full body character designs I'd generated in the style I wanted and 1980s animation model sheets. Moodboards are an expanded version of style prompting, which I outline here.

Examples from runs that produced nothing remotely like what I wanted. I could spend some time tinkering with the prompt to get closer but character prompting is right there, why not just load the original design in?

Yeah, Midjourney/NijiJourney cannot, as of v6, grok abstract cartoony art styles, and character reference always pulls at least a bit of art style as one of its limitations, so the general grotesqueness of its interpretations of these designs leak through.

Now, there's still several things I could do here. The easiest would be to take the straifu on the right and use a combination of in-painting and gradually swapping out the character prompt for a mix of "closer" options over a series of variations until it became something relatively close.

But around about now, I started getting ideas on how I wanted her to look. I liked the idea of a sort of "dinotaur", so I rendered up a regular rex, and combined it with one of the first-wave prompt-only failures and a recolored head in the style I was going for in photoshop.

This was a Q&D mockup, only intended for prompting purposes.

Using a combination of the mockups, the original design, and various results from their iteration-chain, I was able to get very close to the basic concept, only to run into two major issues: One, the huge lower body effect wasn't coming across as intentional (either disappearing into standard thicc-cartoon milfness or looking like AI screwups) and two: the design was boring when divorced from its bug-eyed cartoon aesthetics.

Now, you can do a lot with a flawed design, but a boring one means you need to start re-conceptualizing.

Which begins under the fold:

I notice you never go into detail about the prompts,models or tools you use is this intentional?

Avatar

Well, it's not intentional because I'd argue that assessment isn't correct. Go through my backlog and check under the folds.

But to address in general:

I don't think my tool sets are particularly mysterious, a handful of background removers, upscalers, and face-mod tools on huggingface, Midjourney for most of my direct image-gen, occasionally Dall-E3, ancient versions of photoshop, audition, and premiere, Suno (with audio remastering from bandcamp) for most of my music gen, and for video I mainly use Vidu and occasionally MinMax/HaiLuo (still not sure which one's the company and which one's the generator there).

When it comes down to most stuff I'm showing off there usually isn't a prompt, there's bunches, with the prompt changing as I iterate, as I inpaint, as I composite multiple images together, use others as character references, style references, etc.

At that point any single prompt isn't going to be representative. I don't run a local Stable Diffusion install or the like so I can't really go into models much more than indicating whether I've used Midjourney #6.1 or Niji-6. My prompts tend to be big, at least a full paragraph long, and quickly fill up a tumblr post.

For music and videos you don't just have to deal with a huge prompt or three, but very involved processes.

Each one of those little boxes is a clip or a merged sequence of clips that went into my Open Letter to the SCP Foundation video, each of which started with at least one image made of one of more prompts and manual editing, combined with the video prompts that went along with the starting frame and reference-to-video posts, which is then combined with a manually edited and refined soundtrack that was made using multiple suno passes and SFX elements added in post.

I don't think a novella of prompt information for my goofy 3 minute long comedy song is of interest to enough people to be worth spending all the time compiling it.

Especially since even the song was typically generated in chunks with both lyric-prompt and prompt-prompt changes at every stage, then remastered inside suno with another prompt entirely, and then covered, re-extended, etc, with dozens upon dozens of false starts and scrapped verses.

I left the specific prompts off a few of my recent Rom video tutorials because my focus was more on the general process and my thoughts around the creative exercise under it, but most of my blog's front page is tutorials right now, and my image posts almost always have full prompts.

Template Prompting

This post is mostly to show off gifs but will get wordy, enjoy this tune while you read.

Did the Academy award me this... award? No. Did the henchmen who have you at disintegrator-point? Yes. And isn't that what really matters?" - Dr. Underfang

One of the ongoing hurdles for any generative AI service is that prompting up something cool tends to take practice yet most people will get frustrated if their first few attempts don't bear at least promising fruit.

The answer: Several Sizes Fit Most

A lot of services, Vidu included, Do this via templates where most of the prompting is already done for you and you just add a few words or an image reference. A menu of these builds up over time as an easy "try out" feature.

I normally don't use those, but as part of Vidu's Creative Partner Program, they gave me about a music video's worth of credits to show their Oscar templates off (full disclosure) and so I'm here to opine.

For this one, you just dropped in a pic and whatever it was either walked down the red carpet or accepted a reward in fancy duds. This is all straight-to-the-meme type stuff, but we can still learn some tricks from it.

On the left is the result of using the painted Pizazz Jem & the Holograms character art, and on the right is the result of using the Transformer Steeljaw's animation model sheet.

The more painted and shaded look of Pizazz gives the AI enough to interpret it as a real, albeit very vibrant, person and fills in the blanks accordingly.

Steeljaw, by comparison, is interpreted with all his animation-lines in place, and with the flat coloration, gets reinterpreted as a sort of half-toon or living pepekura creature. Which is an effect you can use by using obviously-illustrated reference with live action backgrounds or prompting.

To produce a particularly unique, dreamlike effect.

But it's not super-consistent across different types of designs. Both the Baroness and Full-Boar here were flat, animation model sheets with no shading. Baroness, however, is more of a person with makeup and costuming trying to make her look like a toon while Full-Boar is a fully cel-shaded toon composited into the scene.

You can control that process more if you're controlling all the prompting, but templates don't really let you do that. On the other hand, all my red carpet pics are first attempts, while the in-scene ones took multiple attempts and tweaks to the process.

And it is surprisingly robust in what it can adapt to, even if Deadeye's doing the 'trying not to shit myself' walk because that toy's legs were purely decrative coverings for wheels, that's impressive.

Sense of scale is way off in hilarious ways, even if one assumes they're people in costumes they're like 2 feet tall.

the process

a lot of people like to ask me about my process and how ai can be "creative" because they're under the impression that it's just kind of a big slot machine. you pull a lever and art uncontrollably comes out. well, let me show you my process

this is going to be a long thread tagged with #long post, blacklist that if you want to skip it.

so how it starts like most art is that i have an idea. in this case, earlier i made a post about witch-knights "surfing" on swords, so i'm going to try and make that - a witch-knight flying through the air atop one of her swords.

it starts with this picture.

i think this picture is dogshit so i discard basically all of it to try and find something closer to my original intent. there's a couple of uninteresting regenerations so it's clear i have to go back to the drawing board and teach the machine what it is i'm trying to do

let's start with a witch-knight on a broom. it's definitely not great but it gives us a better pose that i can work with.

i start by erasing the broom and replacing it with a skateboard - the machine understands skating better for what i need it to do.

there's a ton of small, subtle errors in this image and it overall looks like dogshit but the most important part right now is blocking and the overall pose structure - i need her "surfing" a large, lengthwise object, in the sky. i start by erasing pieces of the skateboard

now we have a sword, which is good. but the sword itself looks... bad. i'll spare you the abortive attempts at selective regeneration of the sword and just show you what happened when i rolled it back a couple of times from this pose and let it regen entirely.

again, tons of small little shitty errors, but this is something i can work with. i do another regen for a less shitty sword. her boob armor gets replaced with, like, generic scale mail.

this image has a great sword and decent pose but like... everything else is kind of futzy and i dont like it. instead of trying to pick and choose i just throw it back into the oven for a second. much better! but now she's going to cut herself on the sword, oh no!

again, i'll save you the agonizing thirty minutes of trying to get it to understand where the foot should go. unlike before i didn't really have a choice except to muscle through. there! now she's surfing safely :)

so it's done, right? well, i mean, i could post this. and it would probably do okay. but *i'm* not satisfied with it. there's stiffness. dozens of minor errors. the eyes look weird when you zoom in. let's start by fixing her hat, and then maybe her hands?

but she's missing fingers on her left hand so let's go ahead and fix that too. and i don't really like the tip of her sword and the ocean looks really flat and boring. so, VERY CAREFULLY, i have to etch out the parts of the sword and her body i have to keep, and also write an entirely new prompt to tell it "i want an ocean w/ rolling waves please :)"

this is better but not great. i try again - serendipitously, it makes this really cool variant with a shadow over the water, but i know working with that will take more wrangling so i'm considering it an evolutionary dead end and discarding it for now.

i proceed to spend 30 minutes trying to make the ocean look better but it's really not working imo. i'm gonna go back to the shadow version and see how that works

i'll spare you the other 8 minutes - i'm satisfied with the following picture. the sword isn't *perfectly* straight, her eyes aren't perfectly textured, the scale mail is... weird, in texture, but anything else would be greasing the wheel and i think beyond the machine's ability to do fine detail.

i've also attached the starting picture for comparison - it has better, "higher quality" clouds and ocean but i personally cared more about the pose and the sword surfing - the background is mostly tangential. could i get back ocean and clouds of that quality with another two hours of painstakingly cutting and re-generating bits of the background without destroying any of my existing work on the pose? probably. but i don't want to.

total time spent on this piece from start to finish was one hour and twenty one minutes. and now you know!

Among the many downsides of AI-generated art: it's bad at revising. You know, the biggest part of the process when working on commissioned art.

Original "deer in a grocery store" request from chatgpt (which calls on dalle3 for image generation):

revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover):

I had it restore its own Jesus fresco.

Original:

Erased the face, asked it to restore the image to as good as when it was first painted:

Wait tumblr makes the image really low-res, let me zoom in on Jesus's face.

Original:

Restored:

One revision later:

Here's the full "restored" face in context:

Every time AI is asked to revise an image, it either wipes it and starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system - or go in with a mentality that anything that fits the brief is good enough.

I'm not surprised that there are some places looking for cheap filler images that don't mind the problems with AI-generated imagery. But for everyone else I think it's quickly becoming clear that you need a real artist, not a knockoff.

A lot of this is spot on, but I think the choice of tools has skewed the data here, however, and may give an incorrect impression of how AI art tools on the whole work, because Dall-E 3 isn't representative of the options at large.

Dall-E 3 is pretty (if you make it be) and has good coherency and works with more naturally written prompts, but it is wholly uncooperative and has almost zero workflow accommodations for artistic applications, heck, It can't even do non-square aspect ratios (at least through Bing).

Playground AI beats it out on those fronts, and Midjourney and Stable Diffusion piledrive it into the concrete.

The only reason to ever use Dall-E 3's chat GPT interface is if you want to run up to three jobs simultaneously on a free bing account. It routinely misinterprets input, even direct "replace this word in the prompt with this other word" requests. It's essentially playing whisper-telephone with two hallucinating robots instead of one.

Here's how the process would go down in Midjourney:

Gonna be making a 90s fighting game version of my DeinoSteve character. Goal is to keep him largely on-model, minus changes that work for the style change from 70s comic/80s cartoon to 90s fighting game.

Prompt: fullbody character design, a velociraptor-anthro fighting game character, in the style of 1996 Capcom (Darkstalkers 3/Street Fighter Alpha 3) promotional art, fullbody on white background. Red scales, yellow shirt, black leather jacket, long tail, clawed feet, long tail, retrosaur asthetics, vector art inks with flat anime shading

Aspect ratio 3:4, Niji model v6, style 50:

Some pretty good stables at DeinoSteve (left image), but nobody's quite right. I've got several ways to get him kitted out right, but we'll use the same techniques CGPT was trying to use: inpainting and variations.

Upper right is closer to the character concept, but I like upper left's pose better, so we'll start by doing some variations on the subtle setting (right image)

You'll notice the variation problem from Dall-E 3 isn't there. Using Chat-GPT to change the prompt means you're at the mercy of two hallucinating robots instead of one, and it's probably been altering the prompt wildly with each revision, thus the slow degradation of mural-jesus.

Now, I'm just demonstrating in-system tools here, but on a piece that I was doing for finalization, I'd be upscaling all the ones that had features I liked, for later composition.

Steve looks... okay here, but he's off model in several major ways. Pants wrong color, no full-t-shirt, the spines on the tail, etc.

So here we'll do some inpainting. Unlike the GPT setup, I'm laying out what areas I specifically want changed at each go. Starting with the pants, I forgot to mention in the first prompt he wears blue jeans, so I add that to the prompt as well. If you're out to do your own post-editing, you can hit more parts at once and just composite from fewer gens.

I like #4 (left)'s swagger, so we'll repeat the process to get him full sleeves on his jacket, and to remove the spiked wrist cuffs. #4, again, is my winner. Now, I can keep varying individual bits, but I can also return to doing general variations, and the influence from the current version will carry over.

Now, it will me re-doing some bits, but #2 is pretty sweet, so he'll be where I tinker next. Note how a bit of his tail and claws are cropped out. I can fix that with outpainting.

If I were instead going for an edit in post, I'd probably have taken the best 5-6 chunks, merged, dropped to line art, then recolored by this point.

Now, I can keep tinkering on him bit by bit, put part of using the AI system is knowing when you're going to have to go manual. I know from experience that my chance of getting him to have his raptor-foot claws is going to require me to go in and do 'em manually.

If its not the horns and claws on dinosaurs, its hairstyles and clothing details on humans, the nature of using a randomized system is that you're going to get random hiccups.

But there are ways to mitigate that, depending on your toolset. Stable Diffusion has controlnet, different versions of which let you control things like poses, character details, and composition more directly, (as I understand it. I haven't messed with it myself, don't have that kind of beastly graphics card)

MJ's answer is presently in alpha: character reference. It's an extension of their image-prompting system (which isn't the same as image-to-image), wherein an image is examined by the AI's image-identification/checking processes and the results are used as part of the prompt. For Character reference, it tries to drop everything that isn't connected to character design.

A quickie iteration using a handful of previous DeinoSteve pics made the image on the left, while re-running the prompt with the semi-final DeinoSteve above as a character prompt produced the images on the right.

Of course, even with the additional workflow tools you get with non-DallE generators, doing anything with long-term consistency is going to require manual editing.

Anything with narrative? Well, this panel has over 20 individual gens in it, from across two generators (MJ and Dall-E 3)

The AI systems will get better over time, but there's an inherent paradox that I don't think they'll escape. To get to complex results you need complex control, and to gain complex control you need, well, complex controls

The more stuff an AI generator can do, the more literal and figurative buttons and/or menus you need to use those features. The more complex the features, the more knowledge and practice it takes to utilize them. Essentially: Any tool capable of (heavy air quotes) "replacing" an artist will wind up requiring an artist to operate it at that level.

As with other force-multipliers for art (photography, digital image manipulation, automated image touchup/filtering, etc) the skill gradient never goes away.

original character designs vs sakimichan fanart

Time to offer an actual breakdown of what's going on here for those who are are anti or disinterested in AI, and aren't familiar with how they work, because the replies are filled with discussions on how this is somehow a down-step in the engine or that this is a sign of model collapse.

And the situation is, as always, not so simple.

Different versions of the same generator always prompt differently. This has nothing to do with their specific capabilities, and everything to do with the way the technology works. New dataset means new weights, and as the text parser AI improves (and with it prompt understanding) you will get wildly different results from the same prompt.

The screencap was in .webp (ew) so I had to download it for my old man eyes to zoom in close enough to see the prompt, which is:

An expressive oil painting of a chocolate chip cookie being dipped in a glass of milk, represented as an explosion of flavors.

Now, the first thing we do when people make a claim about prompting, is we try it ourselves, so here's what I got with the most basic version of dall-e 3, the one I can access free thru bing:

Huh, that's weird. Minus a problem with the rim of the glass melting a bit, this certainly seems to be much more painterly than the example for #3 shown, and seems to be a wildly different style.

The other three generated with that prompt are closer to what the initial post shows, only with much more stability about the glass. See, most AI generators generate results in sets, usually four, and, as always:

Everything you see AI-wise online is curated, both positive and negative.*

You have no idea how many times the OP ran each prompt before getting the samples they used.

Earlier I had mentioned that the text parser improvements are an influence here, and here's why-

The original prompt reads:

An expressive oil painting of a chocolate chip cookie being dipped in a glass of milk, represented as an explosion of flavors.

V2 (at least in the sample) seems to have emphasized the bolded section, and seems to have interpreted "expressive" as "expressionist"

On the left, Dall-E2's cookie, on the right, Emil Nolde, Autumn Sea XII (Blue Water, Orange Clouds), 1910. Oil on canvas.

Expressionism being a specific school of art most people know from artists like Evard Munch. But "expressive" does not mean "expressionist" in art, there's lots of art that's very expressive but is not expressionist.

V3 still makes this error, but less so, because it's understanding of English is better.

V3 seems to have focused more on the section of the prompt I underlined, attempting to represent an explosion of flavors, and that language is going to point it toward advertising aesthetics because that's where the term 'explosion of flavor' is going to come up the most.

And because sex and food use the same advertising techniques, and are used as visual metaphors for one another, there's bound to be crossover.

But when we update the prompt to ask specifically for something more like the v2 image, things change fast:

An oil painting in the expressionist style of a chocolate chip cookie being dipped in a glass of milk, represented as an explosion of flavors, impasto, impressionism, detail from a larger work, amateur

We've moved the most important part (oil painting) to the front, put that it is in an expressionist style, added 'impasto' to tell it we want visible brush strokes, I added impressionism because the original gen had some picassoish touches and there's a lot of blurring between impressionism and expressionism, and then added "detail from a larger work" so it would be of a similar zoom-in quality, and 'amateur' because the original had a sort of rough learners' vibe.

Shown here with all four gens for full disclosure.

Boy howdy, that's a lot closer to the original image. Still higher detail, stronger light/dark balance, better textbook composition, and a much stronger attempt to look like an oil painting.

TL:DR The robot got smarter and stopped mistaking adjectives for art styles.

Also, this is just Dall-E 3, and Dall-E is not a measure for what AI image generators are capable of.

I've made this point before in this thread about actual AI workflows, and in many, many other places, but OpenAI makes tech demos, not products. They have powerful datasets because they have a lot of investor money and can train by raw brute force, but they don't really make any efforts to turn those demos into useful programs.

Midjourney and other services staying afloat on actual user subscriptions, on the other hand, may not have as powerful of a core dataset, but you can do a lot more with it because they have tools and workflows for those applications.

The "AI look" is really just the "basic settings" look.

It exists because of user image rating feedback that's used to refine the model. It tends to emphasize a strong light/dark contrast, shininess, and gloss, because those are appealing to a wide swath of users who just want to play with the fancy etch-a-sketch and make some pretty pictures.

Thing is, it's a numerical setting, one you can adjust on any generator worth its salt. And you can get out of it with prompting and with a half dozen style reference and personalization options the other generators have come up with.

As a demonstration, I set midjourney's --s (style) setting to 25 (which is my favored "what I ask for but not too unstable" preferred level when I'm not using moodboards or my personalization profile) and ran the original prompt:

3/4 are believably paintings at first glance, albeit a highly detailed one on #4, and any one could be iterated or inpainted to get even more painterly, something MJ's upscaler can also add in its 'creative' setting.

And with the revised prompt:

Couple of double cookies, but nothing remotely similar to what the original screencap showed for DE3.

*Unless you're looking at both the prompt and raw output.

Suno had a breakdown and I rolled with it.

We all make mistakes.

Lyrics under the fold.

Suno's ReMi lyric engine occasionally makes believable pop songs... Provided they're pop songs being sung in English as a second language ala Eurovision.

Most of the time I massage the lyrics into something like Plastic Spiders or Venus Can't Wait. Or I use the generated lyrics for meter and rhyme scheme and completely rewrite.

But this one (prompted with "Plop plop, fizz fizz, oh what a relief it is") only required a few adjustments for tastefulness (adding a few bleeps).

Pairing it with some of my stored outtake-gens was the next step.

Enjoy.

You are using an unsupported browser and things might not work as intended. Please make sure you're using the latest version of Chrome, Firefox, Safari, or Edge.