Stable Video Diffusion to create videos from a single picture

Blurry george washington
Game: try to find the reason why i post this preview that is obviously bad (i don't need much to be entertained)

How did i use Stable Video Diffusion to generate videos from still pictures?

StabilityAI, the creator of "Stable Diffusion" released recently 2 generative models to be used to create videos thanks to a single image. Unlike "Stable Diffusion" this model doesn't work with the Automatic1111 interface (at least not now, please check this "feature request").


To use these models (here and here), it seems that you have to use ComfyUi another GUI for Stable Diffusion (link to a workflow for ComfyUI in this page).
To use Stable Video Diffusion I personally installed the portable Nvidia version of ComfyUI (new_ComfyUI_windows_portable_nvidia_cu121_or_cpu.zip) by clicking on "direct link to download", here then, I load the workflow SVD Workflow.json linked by the Author of this YouTube video (you should watch this video too).
I download the models listed here and here, I put them in the directory "ComfyUI_windows_portable\ComfyUI\models\checkpoints", then I restart the interface to select the model "svd_xt". I load then a picture that i want to animate and i click on "Queue Prompt" to start the work.

Alternatively, if you are looking for some kind of Txt2Vid, you can have a look at this page. It provides a "workflow" to create a picture from a prompt and it animates then the picture to a video. (or here)
Honestly, I played less with this option but the results are also pretty decent with this workflow of ComfyUI.

Are the results good ?
Well, some people tried recently to animate memes with a similar technic with Stable Video Diffusion and here are the results. (link to Gizmodo)



So the best way to illustrate my content was to make a video out of it.
Here are my comments for the video:

So what did I learn about Stable Video Diffusion ?

- First the video output of this model is far to be perfect.
One thing that "Stable Video Diffusion" does very often is that instead of really animating the picture, it looks like it takes a partial rectangular selection of the original and make the video by moving the selection in the picture.
(At least it gives you this impression.)
Very often, too, it takes your picture and hallucinates that the view is changing, like the view turns around a center.
However, it would be unfair to tell that Stable Video Diffusion just emulates a change of the point of view.

- The second example, with the Gatsbie's picture demonstrates that artefacts and things that are not really a part of the "real" picture are going to be used too. So if there is something that you don't want to animate, it is better when it is not here.
Also I suggest to use the same dimension for the picture and the animation (otherwise, it is likely that you will just change the POV of the picture.)

- For the Van Gogh painting, I expected more or less to get a similar result. But you need to be lucky with the seed to get it.
I feel that there is a lack of control (too many possibilities.)

- Here is a big "no, no": the model tends to change faces and make people look disabled while they weren't before. You will struggle with this.

- I hope you'll then enjoy Louis the XIVth (i guess pantyhose are not so common anymore for men). His face is modified.
- Then the arm of a character disappeared.
- The face of Napoleon is changed in a very ugly manner (like for Louis.)
- The AI is not for an emperor, not for a king but also not for a president because it changes again the face of President Macron in a very ugly manner.
(I tried with George Washington too, and it is the same thing so it doesn't happen because the characters are fleeing because they are French.)

The problem with the face deformity is a very recurent one.

- For animations, to make basic movements, i find it not that bad. At least, since i am more a coder than an artist. I feel that i could express myself with this kind of creation. But i am not an artist...
Maybe for a real artist, this is lame.

And finally: I said it before: not enough control, too many possibilities...

Cool ? Yes
Artistic?: I think so
Useful?: Not sure...