The DIY forge - AI content Tutorials and Coding - How to generate super-resolution versions of images ?

Details: Written by: Super User; Category: image to image; Published: October 17, 2024; Hits: 8083

Part 1: How to restore blurry/pixelated pictures with basic img2img in Automatic111 ?

I want to be able to do this famous Hollywood "zoom - enhance" trope trick at home.
I used various tools to get a result and this result varies a lot in function of the technic that I used.

In this article I will cover how to use the img2img tab of Automatic1111, an interface to use "stable diffusion" models, to generate enhanced pictures.

Unlike Hollywood, I don't want to do this "zoom - enhance" trick to reveal REAL hidden details.

I want to explore how to implement this concept, to improve esthetically the look of an existing image.

I see 3 main usages for that:

- improve the aspect of an image that looks pixelated and/or blurry (you want to remove something that is not pleasant to see)

- Increase the resolution of an existing non-blurry picture.

(like that, one could revamp old-school video games)

- Add non-existing details to a picture to create an image at a higher resolution.

(For example, if you magnify a pixelated picture of a face, you will have to reconstruct the picture with elements that are not here anymore, to make the face look realistic.)

During my tests, when I added non-existing details, I could notice that the result that I got was in function of what I did.
So I am perfectly aware that the details that I am going to add, describe more the models and the conditions that I used to get the result that it describes anything real about the starting picture.

To say it clearly: you CAN'T reveal hidden truths in pictures (without precautions) like Hollywood does.

****

What is this tutorial NOT about ?

If you came to this tutorial because you generated a picture where the background is too blurry:

Like this:

depth of field 1

But you had rather have a picture where the background is not blurry, like this:

depth of field 2

The concept you are looking for is the "depth of field", you can just regenerate your picture with a Lora that handles this.
I am happy with this one for example.

If you change the weight of this lora, you will also likely change other aspects of the picture besides the depth of field. This can be a problem.

****

What is this tutorial about ?

So if i take this picture:

reference picture

I downscale it to 25%

downscaled original picture

And upscale it again to 100% (100% means like before):

rescaled original picture

The resulting picture is blurry and i don't like this result... because it is blurry....

So in this tutorial i'll try to retreive my original pictures starting from the downscaled one.

What did we used to get the original picture ?

To get this picture:

original picture again

I used:

- a prompt
- a negative prompt
- a model of generation
- a lora to correct the model
- a number of steps
- a sampling method
- a schedule type
- a CFG scale
- a seed

Roughly most of the data, beside the prompt, is function of the generation model that is used. When the prompt is not available, i can try to reproduce it thanks to the "Interrogator" of Automatic1111.
Besides that, you also need to know what kind of models (+Loras ?) are able to generate the kind of picture that you are looking for.
Civitai is a good place to see what a model can generate and with what CFG, steps, sampling method and schedule type. Civitai displays examples of generations and that's what you need to make a choice of model.

First things that you have to do to try to restore a picture:

I have first to try to gather generation data. I go to a website that displays examples of generation models outputs (like CivitAI). The kind of output that I want needs to match the model.

Once I have the model, i need to find what are suitable parameters to use it. Especially, the number of steps, the CFG and the best sampling method.
For the schedule type, i put it on automatic. The seed is here to create some variation, but good settings should narrow the effect of the seed as much as possible.

Besides this, we need to find the prompt:

With the interrogator tab (prompt) of Automatic1111 and with the picture above, i get the following prompt:

blond woman in armor holding a candle in front of stained glass windows, artgerm ; 3d unreal engine, trending on artstatioin, portrait sophie mudd, cinematic bust shot, inspired by Ludwik Konarzewski Jr, mighty princess of the wasteland, amouranth, priest

That i will correct to:

blond woman in armor holding a candle in front of stained glass windows, artgerm ; 3d unreal engine, trending on artstation, portrait sophie mudd, cinematic bust shot, inspired by Ludwik Konarzewski Jr, mighty princess of the wasteland, amouranth, priest

For the model, i will keep the one i've been using to generate the picture: dreamshaper_8.safetensors (later i'll show what happens if i select other models).

The Analyze tab of "Interrogator" returns these main keywords:

a photorealistic painting
inspired by Mandy Jurgens
fantasy art
trending on Artstation
imogen poots as holy paladin

From this i decide that i'll use the following prompt:

A photorealistic painting of a blond woman in armor holding a candle in front of stained glass windows, imogen poots as holy paladin, fantasy art, artgerm ; 3d unreal engine, trending on artstation, portrait sophie mudd, cinematic bust shot, inspired by Mandy Jurgens, inspired by Ludwik Konarzewski Jr, mighty princess of the wasteland, amouranth, priest

Negative prompt: easynegative, bad-hands-5, nsfw, nude, big breasts

(The negative prompt is needed because Dreamshaper can generate adult content and it is not useful here).

If besides the prompt, i keep the following data (similar to those used to generate to initial picture):

A photorealistic painting of a blond woman in armor holding a candle in front of stained glass windows, imogen poots as holy paladin, fantasy art, artgerm ; 3d unreal engine, trending on artstation, portrait sophie mudd, cinematic bust shot, inspired by Mandy Jurgens, inspired by Ludwik Konarzewski Jr, mighty princess of the wasteland, amouranth, priest
Negative prompt: easynegative, bad-hands-5, nsfw, nude, big breasts
Steps: 30, Sampler: Euler a, Schedule type: Karras, CFG scale: 7, Seed: 1094900797, Size: 512x768, Model hash: 879db523c3, Model: dreamshaper_8, Clip skip: 2

I get this:

Picture
generated with just the settings

If i just retreive the prompt from the interrogator and unless i use a model to generate something very specific, it is unlikely that i'll get my original picture.

Since a picture generation uses constraints, let's see what other constraints i can use...
So I use the img2img tab, because my main starting point to generate a picture is another picture.

Go to "Generation", "img2img" tab, and load this picture: (resolution: 128x192)

downscaled picture used for picture generation

According to my tests, here are the best sampling methods for Dreamshaper 8 (sd 1.5)
DPM++ SDE (gives actually good result in img2img)
DPM++ 2S a
Euler a
Restart

I am going to set my sampling method to Euler_a, because it is a very generic sampling method and select 30 steps, because this amount of steps seems acceptable. Schedule type: Automatic
resize: from 128x192 to 512x768
CFG Scale: 7 (default)
Denoising scale: 0.75
Seed: 1094900797 (i select the same seed as before, because i want still to be able to compare the results)
Complete the data with what i set above.

This is the result that i get:

justimg2img and a prompt

The final picture is obviously too different from the original picture. If i reduce the denoising strength, my final picture will be closer from the original one, but it appears that my final picture will also be blurry if I reduce this parameter too much,
so the lowest denoising strength that i can use is 50:

withdenoising strength of 50

Second thing that you have to do to try to restore a picture: tune the denoising strength. Too much: the resulting picture is too different. Not enough: the resulting picture is not corrected.

Just for the reference, this is how close i can get with the same denoising strength of 50, with the DPM++ SDE sampling method:

dpm++sde

It looked pretty close to me, with just this data.

(back to Euler_a)

I am still quite far from the original picture. So what else can i use to add my information to generate the final picture ?

The only panel that seems useful in my version of Automatic1111 is the controlnet tab.
A controlnet that is useful is one that is suitable for the picture you want to enhance.

Example: if you can guess the line of a building on the picture, maybe MLSD is useful.
If there is a character, openpose can provide some help.
Controlnet are used to add information to convert the initial picture to the resulting picture.
Our temporary result is what you get when you just play with the denoising strength.

without controlnets
This is how far we got, so far (this is the same picture as before).

For the moment, i am not happy with the hair, the colored glass in the background and the candles are a bit different.
Intuitively, i would add a layer of linart, but let's see the intermediate states of the controlnets:

Here you have a good reference for the controlnets

Canny:

canny

Depth:

depth
anything

Depth anything

Depth Midas

Depth Leres

depth leres
++

Depth Leres ++

Normalmap:

Normal Bae

normal
midas

Normal Midas

Openpose: they are used to set the face and the posture of the character, it is not very useful

Dw openpose

Mlsd: i hoped that it could catch the glass, behind

mlsd
No luck

Lineart: lineart standard seems to be the most suitable to keep the general layout

lineart
standard

This is for the moment the only intermediate picture that displays her braid hair. So, usually i don't really need to understand what a controlnet model does, because if it has an intermediate picture, it should keep the features that are important to me.
From all these pictures that i display here, this is the only thing that you really have to remember: the controlnet has to catch meaningful data.
From here, i wouldn't need to test anything besides this, because for the moment, it is the only model that saved the property that i need. This doesn't mean that it will be interpretated correctly.

But since this is a tutorial, let's continue to explore the other controlnet models.

Line art
anime

Lineart anime: less acurrate for the hair.

Lineart
coarse

Lineart coarse: Seems interesting for the shape of the armor, but it is redundant to the lineart standard that has more information.

And this is lineart realistic, that has probably enough information.

Softedge:

Softedge
hed

Softedge HED

Softedge HED safe

Softedge Pidinet

Softedge Pidisafe

softedge
teed

Softedge teed

Some options seem usable. Softedge is a less accurate version of lineart. Sometimes, it is better to be less accurate.

Scribble/sketch

This is even more basic than softedge, it is not very useful here

scribble
hed

Scribble HED

Scribble pidinet

Scribble T2ia sketch Midi

Scribble
xdog

Scribble xdog

Segmentation: Categorize object in categories. Generated object will be from the same category.
Good for composition.

Seg
anime face

Seg anime face

Seg
ufade20k

Seg ufade20k

Seg of
coco

Seg of coco

Seg ufade20k

Shuffle: the generation is always different, it seems to add some chaos to the picture and generate a new picture from it
(like the same kind of image but different).

shuffle

Shuffle

Tile:
Tile doesnt seem to pay attention to details. Tile uses the overall structure of the picture and redoes the details. This is the most useful controlnet if you are restoring a pixelizated picture.
Since the structure of the original picture has been well kept, it is not useful here. If you use the blur preprocessor, you can adjust the intensity of the blur. If you feed it a blurry picture (without preprocessor), it can probably do something with it.
This is what i am going to do.

tile blur

Blur: adjusted blur

The results of the other tile preprocessors are versions of the picture itself.

Inpaint: i don't have anything to inpaint

Instructp2p: This is to change some part of the picture by giving orders. I don't think it is the place to do that.

Reference: i don't know what it is
Recolor: seem to be color related, so out of topic here
Revision: i don't know what it is
Ip-adapter: uses generally a reference, to influence a picture. This is not for here.
Instant id: i was unable to make it work.

Yeah, there are many that i don't know, but not all are working and (for the moment) i can't find anything about them.

So from all these preprocessored pictures and from the descriptions of the controlnet, for me the best mixture is:

Lineart standard + Tile/blur with no preprocessor

And this is going to be often the best mix, however this could need to be adapted.

Below is the result and it is as good as it can be.
(With the optimal models and the known seeds)


Rebuilt picture	Original picture

All of this from this starting picture:

At the end, if i compare the starting picture with the resulting picture, i can see that the color of the armor is different. Since the armor as more than one color, this is hard to set in the prompt.
Otherwise, the pictures feel a bit different, but if i look, the result can be convincing enough.
It doesnt look perfect, but mostly, everything is here.
I consider this condition to be an optimal, i wouldn't try to make it look better, because there is nothing more that I can do. The color of the armor can be easily corrected in another way.

Third thing that you have to do to try to restore a picture: add more information (for example with controlnet) to enhance the resulting picture.

Now lets see if the seed makes a difference:


Seed 1	Seed 2	Seed 3

What you should see is that the seed doesn't change much the result. If we know what we are doing, this is expected.

Now lets try with another model.

The Seed will now be ignored.

With Analogmadnessv70, i get a decent resullt.

generation with analog madness

I am still happy with the result.

perfectWorld_v6Baked, below

Perfect
world v6

MistoonAnime, below: for this one, you clearly see the anime effect

Misstoon

This is with westernAnimation_v1, below:

Western animation

Finally, let's switch back to dreamshaper 8: can we use other controlnets ?

First, i'll keep the tile/blur controlnet, because it is very suitable for a blurry picture.
However, if you try to use the lineart model on a pixelated picture, you will notice that this controlnet will tend to keep the pixels instead of reconstructing the picture. In this case, if you feel that your resulting
picture should respect a bit less the layout of the original picture (you want to get a rid of the noise), you could use a controlnet that is less accurate, like canny, scribble, softedge or t2ia_sketch.

So all the pictures below are done with the tile/blur controlnet and with or without another controlnet.

Tile/blur alone:

Tile
blue alone

As you can see, the hair are now different and the face of the woman is less similar that it was. The picture has less details than with Lineart.

I will use scribble_hed because it seems to be the scribble with the most information:

Scribble controlnet hed

The correction seems not sufficient.
Here, the arnor looks a bit better.

Here is with Canny:

canny
plus tile

So with canny, i was able to keep the hair style of the woman and also her face, however the picture looks a bit like a painting and seems less accurate.

Softedge teed seems to have some accuracy, according to the results of the preprocessors:

softedge teed bis

It is colorful, but obviously less accurate than lineart (note that mixing softedge with lineart, with the default settings, doesn't add any accuracy to the picture: I can't have the color of softedge and the accuracy of lineart just like that).

And finally, according to the result of the preprocessor, i could also use t2ia_sketch:

It feels a bit more blurry, but it is elegant.

Can i use smaller pictures with lineart and tile/blur ?

Again let's go back to our best settings for the controlnet: Tile/blur + Lineart

How small can we go before we can't recover the starting picture anymore ?

From a picture that has been divided by 8:

size divided by 8

With a target resolution of 512x768

I get this:

lineart
plus tile by 8

Beside the face, that doesn't look that blurry for some reasons, everything else is more blurry than before. (I could adjust the denoising strength but to get a better picture, you have to reduce the strength of lineart)

From this previously generated picture, I do a second generation (target resolution: 1024x1536),with canny and tile/blur as controlnet, this is as close that i can get:

resampling 2 with canny and tile

The armor starts to look different, but i am not so unhappy with this result.

From a picture that has been divided by 16:

Here it becomes difficult to get something that looks like the original picture and my results are too blurry with lineart.

To generate the picture below, i used tile/blur alone:

tile alone

Of course, here i can't really use the details of the original pictures as starting point to draw the fingers and the "bad-hands-5" of the negative prompt doesn't work well at this level.

Popular articles

How to generate super-resolution versions of images ? - Part 1: restore pictures with img2img