Skip to content

Instruct-Pix2pix support #679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Instruct-Pix2pix support #679

wants to merge 12 commits into from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented May 15, 2025

ref: #61

sd.exe -M img2img --model instruct-pix2pix-00-22000.safetensors -p "turn him into a cyborg" --color --strength 1 -i .\example.jpg --steps 50 --cfg-scale 7.5 --img-cfg-scale 1.2 --sampling-method euler_a

input output
example output

sd.exe -M img2img --model instruct-pix2pix-00-22000.safetensors -p "Make it a cat" --strength 1 -i input.png --steps 100 --cfg-scale 7.5 --img-cfg-scale1.5 --sampling-method euler_a --schedule karras

input output
input output

TODOs:

  • Classifier-free guidance (CFG) for two conditionings
  • Fix UX (probably best not to reuse distlled guidance for something completely different like img conditionning)
  • Check if implementation is correct

(rebased on top of #683 for CosXL edit support)

@rmatif
Copy link

rmatif commented May 15, 2025

Awesome! Could you please take a look at cosxl-edit as well? It acts as an ip2p, if I understood correctly. I think we're just missing the EDM VPred schedule

@stduhpf
Copy link
Contributor Author

stduhpf commented May 15, 2025

Awesome! Could you please take a look at cosxl-edit as well? It acts as an ip2p, if I understood correctly. I think we're just missing the EDM VPred schedule

I may take a look at it later.

@stduhpf stduhpf force-pushed the ip2p branch 2 times, most recently from 1e25a9b to 75af1bd Compare May 16, 2025 01:48
@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

For some reason, the "image CFG" (controlled by --guidance flag for now) needs to be very high (>10) to get anything ressembling the input image, this behavior does not match the HuggingFace Demo, or the example on their github. I can't figure out what I'm doing wrong.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

Ah I think I found the issue. By default, the model samples the VAE distribution, bit pix2pix expects the mean of the distribution.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

I'm pretty sure it's working properly now. I think inpaint might be slightly improved too, especially when strength is set to <1 and with higher CFG.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 16, 2025

Awesome! Could you please take a look at cosxl-edit as well? It acts as an ip2p, if I understood correctly. I think we're just missing the EDM VPred schedule

thse ones might also be interesting, and they may be even easier to implement:
https://huggingface.co/diffusers/sdxl-instructpix2pix-768
https://huggingface.co/CaptainZZZ/sd3-instructpix2pix/tree/main

Edit: The SDXL one was pretty easy. Now, I can't figure out how to easily convert sd3.x models from diffusers format to the original format, so I cant test if it would work...

@stduhpf
Copy link
Contributor Author

stduhpf commented May 23, 2025

CosXL edit is now working properly.

@stduhpf stduhpf marked this pull request as ready for review May 25, 2025 22:20
stduhpf added a commit to stduhpf/stable-diffusion.cpp that referenced this pull request May 26, 2025
cosxl: smol cleanup

CosXL: fix schedule choice

Rename EDMVDenoiser

Avoid inf for EDMVDenoiser + discrete schedule

make parametrization flags public

Fix CosXL with empty negative prompts

Instruct-p2p support

support 2 conditionings cfg

Do not re-encode the exact same image twice

pix2pix: fixes for 2-cfg

Fix pix2pix latent inputs + improve inpainting a bit + fix naming

prepare for other pix2pix-like models

Support sdxl ip2p

CoxXL edit: fix reference image embeddings

Support 2-cond cfg properly in cli

fix typo in help
stduhpf added a commit to stduhpf/stable-diffusion.cpp that referenced this pull request May 26, 2025
cosxl: smol cleanup

CosXL: fix schedule choice

Rename EDMVDenoiser

Avoid inf for EDMVDenoiser + discrete schedule

make parametrization flags public

Fix CosXL with empty negative prompts

Instruct-p2p support

support 2 conditionings cfg

Do not re-encode the exact same image twice

pix2pix: fixes for 2-cfg

Fix pix2pix latent inputs + improve inpainting a bit + fix naming

prepare for other pix2pix-like models

Support sdxl ip2p

CoxXL edit: fix reference image embeddings

Support 2-cond cfg properly in cli

fix typo in help

Support masks for ip2p models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants