Skip to content

Support Inpainting #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
10undertiber opened this issue Dec 5, 2023 · 17 comments · Fixed by #511
Closed

Support Inpainting #105

10undertiber opened this issue Dec 5, 2023 · 17 comments · Fixed by #511

Comments

@10undertiber
Copy link

10undertiber commented Dec 5, 2023

It would be great to add input parameters to the current SD cli to specify an input and mask file to run the inpainting. For example:

./bin/sd -m ../models/sd-v1-4.ckpt -p "a lovely dog" --image ../input/alovelybench.png --mask ../input/alovelybench.mask.png

The input image:
alovelybench

The input mask:
alovelybench mask

The output:
alovelybench output

Here some references:

  1. https://stable-diffusion-art.com/inpainting_basics
  2. https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint
@FSSRepo
Copy link
Contributor

FSSRepo commented Dec 5, 2023

@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.

@leejet
Copy link
Owner

leejet commented Dec 6, 2023

@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.

It seems that img2img has some issues and the results are inconsistent with sd-webui.

@FSSRepo
Copy link
Contributor

FSSRepo commented Dec 6, 2023

@leejet I think we should first solve that problem before considering adding the inpainting feature.

Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.

@leejet
Copy link
Owner

leejet commented Dec 6, 2023

@leejet I think we should first solve that problem before considering adding the inpainting feature.

Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.

Yes, the first step is to fit the inpaint model. We can determine whether the weight currently loaded is the weight of the inpaint model according to the shape of the weight.

@Amin456789
Copy link

hope someone makes a cute gui for it as inpainting will be much easier to just mask it through the gui, not to mention cpp with gui will be great

@Amin456789
Copy link

@FSSRepo i see u r working on a webui [cant wait for it], if possible please add outpainting as well, it will be great to have it, also i have a question, i have dark reader on my browser, will the dark reader make ur webui's background dark too? as working with white background in the night is very hard imo

@programmbauer
Copy link

Would it be possible to implement a simple form of inpainting, where the user specifies a rectangular region using command line parameters? And only this region of the input image would be changed. For example, the user could pass four integers (x,y,height,width) which define the top left corner and the dimensions of the rectangular region.

@balisujohn
Copy link

@leejet I'm pretty interested in fixing img2img and adding inpainting; do you have any pointers as to why it's currently not matching stable diffusion webui?

@msglm
Copy link

msglm commented Dec 1, 2024

Does this project take bug bounties? I am willing to put money down to make this happen.

@Amin456789
Copy link

@aagdev could up please implent inpaintint here, i saw u added to ur project in mlimgsynth, thank u

@stduhpf
Copy link
Contributor

stduhpf commented Dec 4, 2024

I think I got it. PR incoming soon-ish

@stduhpf
Copy link
Contributor

stduhpf commented Dec 5, 2024

@msglm If you were serious about the bounty, lmk 😅

@msglm
Copy link

msglm commented Dec 6, 2024

@msglm If you were serious about the bounty, lmk 😅

Yeah, I'm looking to put some money down to make it happen, main concerns are:

  • Inpainting works with SDXL models (my primary example being pony diffusion)
  • The system works with no artifacting, absurd slowdowns, or other such jank on both ROCM and Vulkan
  • The interface for doing this just requires passing in an image of some kind (In code, it should be easy to implement into guis so projects like https://github.com/fszontagh/sd.cpp.gui.wx can take advantage of it).
  • In general, it should feel like how it does on the cli-style python-based implementations of Stable Diffusion
  • The project should not have any major changes to it so it can compile on GNU Guix (this means no needing to pre-generate anything to get it to work like with the Vulkan backend)

There's probably some other stuff as well that may get brought up during development, but if it works on my machine im willing to pay for it. I usually pay in XMR, but I could do BTC or some other payment method. Alternatively, some kind of issue-based bug bounty method where issues get funding that's released upon completion would work.

Long-term, my goal is to have an easy to compile, single binary application for working in stable-diffusion as a good workflow. This is a step in that.

@stduhpf
Copy link
Contributor

stduhpf commented Dec 6, 2024

  • it works with SDXL models
  • It works with Vulkan (I can't get ROCm to work at all with my old GPU), other backends should work too
  • Performance is OK, only noticable slowdown is because of the repeated VAE encoding (once for the base image, and once again for the masked image)
  • Interface is simple enough I think (same as img2img mode with the --mask argument to point to the image mask path)
  • No extra dependencies of weird code patterns, so it should work on any system that already support sdcpp

I don't have any crypto wallet though 😔

@Green-Sky
Copy link
Contributor

@stduhpf A wallet costs nothing. 😏

@stduhpf
Copy link
Contributor

stduhpf commented Dec 7, 2024

@stduhpf A wallet costs nothing. 😏

Good point.

@stduhpf
Copy link
Contributor

stduhpf commented Dec 30, 2024

@msglm
Now that it's merged....
42CcDxbASzWQe5hAryPffZZtwWVmSm1oqdSKwa87hTENBWf1dwHUWLD6wQ1pKtz2ejC3oqZBrwXyzQNzRBmnC9kV6VH9F92
image
(Don't feel obligated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants