Skip to content

Instruct-Pix2pix support #679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Instruct-Pix2pix support #679

wants to merge 7 commits into from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented May 15, 2025

ref: #61

sd.exe -M img2img --model instruct-pix2pix-00-22000.safetensors -p "turn him into a cyborg" --color --strength 1 -i .\example.jpg --steps 50 --cfg-scale 7.5 --guidance 1.2 --sampling-method euler_a

input output
example output

sd.exe -M img2img --model instruct-pix2pix-00-22000.safetensors -p "Make it a cat" --strength 1 -i input.png --steps 100 --cfg-scale 7.5 --guidance 1.5 --sampling-method euler_a --schedule karras

input output
input output

TODOs:

  • Classifier-free guidance (CFG) for two conditionings
  • Fix UX (probably best not to reuse distlled guidance for something completely different like img conditionning)
  • Check if implementation is correct

@rmatif
Copy link

rmatif commented May 15, 2025

Awesome! Could you please take a look at cosxl-edit as well? It acts as an ip2p, if I understood correctly. I think we're just missing the EDM VPred schedule

@stduhpf
Copy link
Contributor Author

stduhpf commented May 15, 2025

Awesome! Could you please take a look at cosxl-edit as well? It acts as an ip2p, if I understood correctly. I think we're just missing the EDM VPred schedule

I may take a look at it later.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

For some reason, the "image CFG" (controlled by --guidance flag for now) needs to be very high (>10) to get anything ressembling the input image, this behavior does not match the HuggingFace Demo, or the example on their github. I can't figure out what I'm doing wrong.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

Ah I think I found the issue. By default, the model samples the VAE distribution, bit pix2pix expects the mean of the distribution.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

I'm pretty sure it's working properly now. I think inpaint might be slightly improved too, especially when strength is set to <1 and with higher CFG.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

Awesome! Could you please take a look at cosxl-edit as well? It acts as an ip2p, if I understood correctly. I think we're just missing the EDM VPred schedule

thse ones might also be interesting, and they may be even easier to implement:
https://huggingface.co/diffusers/sdxl-instructpix2pix-768
https://huggingface.co/CaptainZZZ/sd3-instructpix2pix/tree/main

Edit: The SDXL one was pretty easy. Now, I can't figure out how to easily convert sd3.x models from diffusers format to the original format, so I cant test if it would work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants