Assembling your Cast in Vidu: “My Reference” First Impressions and Tutorial
(Disclaimer: I am in the Vidu Creative Partner Program)
For a long time, one of the major barriers for any kind of generative video is character consistency. The solutions were either using image-to-video for start frames or having a general reference option, usually the former, sometimes both.
Vidu uses both, but have recently updated their reference system. We’re starting to get to the level of complex interface that I’ve been talking about in previous blogs.
Vidu’s reference-to-video and other similar features were a good start, but with important limitations: You only had three images to reference per generation, and the AI had to guess who was what based on your text prompt. Combined with some quirks around how it handles model-sheet type images (more on that later) and it was a good start, but required a lot of finagling.
Vidu updated with a major improvement to the reference tool, one that lets you build profiles for each person or location you’re using, consisting of up to three images, a short style prompt (about 1 line) and a description-prompt. These are saved to your profile so you can call them up at any time (saving a lot of redundant prompting of character details).
The robot will fill in the description for you, but I suggest editing it to your needs.
You can use up to three references per shot (scale between references is a little tricky), and, in a very helpful feature, you reference those characters by tag in the prompt body.
In the old reference system you’d tend to have to use a single shot for a character, then a model sheet, because if you did just the model sheet the characters would tend to “twirl” while animating in chaotic and unnatural ways, meaning you’d generally lose two slots to one character, but here you can load everything in one place.
I’ve only played with it a little bit thus far, but here are my basic recommendations:
- The first image should be a single shot of the character, and the second and third images can be model-sheet character assemblages. If you’re doing a reference specifically for talking head shots, you’ll want the face-shot in the first slot, and full-body reference in the latter ones.
- Try and use multiple-angle reference where at all possible. This just increases stability of the character in general, even if they don’t show the alternate angles being shown. I have a tutorial for using Vidu to produce this kind of reference here.
- Use the extra prompt space for specific visual instruction. The more direction you give, the less chaos and weirdness you get. (Assuming that’s your goal.)