Skip to content

support sd3 fp8 safetensors #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
taotaow opened this issue Aug 5, 2024 · 8 comments
Open

support sd3 fp8 safetensors #329

taotaow opened this issue Aug 5, 2024 · 8 comments

Comments

@taotaow
Copy link

taotaow commented Aug 5, 2024

[ERROR] stable-diffusion.cpp:173 - init model loader from file failed: '.\sd3_medium_incl_clips_t5xxlfp8.safetensors'

safetensors from
https://huggingface.co/adamo1139/stable-diffusion-3-medium-ungated/

@rhjdvsgsgks
Copy link

the full error is

[DEBUG] model.cpp:807  - init from 'sd3_medium_incl_clips_t5xxlfp8.safetensors'
[ERROR] model.cpp:873  - unsupported dtype 'F8_E4M3'
[ERROR] stable-diffusion.cpp:182  - init model loader from file failed: 'sd3_medium_incl_clips_t5xxlfp8.safetensors'

@red-scorp
Copy link

+1 to this topic. I see FP8 models more and more, especially with introduction of FLUX monster models. I hope with SD.cpp it will be possible to run new model more comfortable on old HW.

@SkutteOleg
Copy link
Contributor

#359 seems to have added this functionality?

@Green-Sky
Copy link
Contributor

I totally forgot to check open issues, so I guess others had this issue before me :).
I did not test f8 for t5xxl specifically, but it should work, but keep in mind that it is upcasting to f16, so you should convert the model down to something like q8_0.

@Green-Sky
Copy link
Contributor

It got merged into master, go ahead and try it.

@taotaow
Copy link
Author

taotaow commented Aug 28, 2024

@Green-Sky thank you,It is ok.
[DEBUG] stable-diffusion.cpp:169 - Using CPU backend
[INFO ] stable-diffusion.cpp:184 - loading model from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors'
[INFO ] model.cpp:789 - load .\sd3_medium_incl_clips_t5xxlfp8.safetensors using safetensors format
[DEBUG] model.cpp:857 - init from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors'
[INFO ] stable-diffusion.cpp:224 - Version: SD3 2B
[INFO ] stable-diffusion.cpp:255 - Weight type: f16
[INFO ] stable-diffusion.cpp:256 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:257 - Diffsuion model weight type: f16
[INFO ] stable-diffusion.cpp:258 - VAE weight type: f16
[DEBUG] stable-diffusion.cpp:260 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1029 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1029 - clip params backend buffer size = 1329.29 MB(RAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1029 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1029 - mmdit params backend buffer size = 4114.77 MB(RAM) (491 tensors)
[DEBUG] ggml_extend.hpp:1029 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:387 - loading weights
[DEBUG] model.cpp:1526 - loading tensors from .\sd3_medium_incl_clips_t5xxlfp8.safetensors
[INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file
[INFO ] stable-diffusion.cpp:486 - total params memory size = 14857.47MB (VRAM 0.00MB, RAM 14857.47MB): clip 10648.13MB(RAM), unet 4114.77MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:490 - loading model from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors' completed, taking 129.76s
[INFO ] stable-diffusion.cpp:504 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:552 - finished loaded file
[DEBUG] stable-diffusion.cpp:1358 - txt2img 1024x1024
[DEBUG] stable-diffusion.cpp:1107 - prompt after extract and remove lora: "a lovely cat holding a sign says "Stable diffusion 3""
[INFO ] stable-diffusion.cpp:635 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1112 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:687 - parse 'a lovely cat holding a sign says "Stable diffusion 3"' to [['a lovely cat holding a sign says "Stable diffusion 3"', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:980 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:930 - computing condition graph completed, taking 33220 ms
[DEBUG] conditioner.hpp:687 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:980 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:930 - computing condition graph completed, taking 9685 ms
[INFO ] stable-diffusion.cpp:1236 - get_learned_condition completed, taking 42985 ms
[INFO ] stable-diffusion.cpp:1259 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1263 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:980 - mmdit compute buffer size: 1784.58 MB(RAM)
|==================================================| 20/20 - 240.22s/it
[INFO ] stable-diffusion.cpp:1295 - sampling completed, taking 4743.67s
[INFO ] stable-diffusion.cpp:1303 - generating 1 latent images completed, taking 4744.97s
[INFO ] stable-diffusion.cpp:1306 - decoding 1 latents
[DEBUG] ggml_extend.hpp:980 - vae compute buffer size: 6656.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:967 - computing vae [mode: DECODE] graph completed, taking 122.87s
[INFO ] stable-diffusion.cpp:1316 - latent 1 decoded, taking 122.87s
[INFO ] stable-diffusion.cpp:1320 - decode_first_stage completed, taking 122.87s
[INFO ] stable-diffusion.cpp:1429 - txt2img completed in 4910.89s
save result image to 'output.png'

@taotaow
Copy link
Author

taotaow commented Aug 28, 2024

[INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file

maybe this unknown tensor is not perfect

@Green-Sky
Copy link
Contributor

[INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file

maybe this unknown tensor is not perfect

I have seen this with flux too, but it works regardless.

keep in mind that it upconverts f8_e4m3 in place to f16, so you might want to experiment to convert it to q8_0. https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/quantization_and_gguf.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants