Skip to content

Chroma support (pruned Flux model) #696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented May 28, 2025

https://huggingface.co/lodestones/Chroma

Chroma is a Flux model with modulation layers pruned off, which makes it fit in a lower memory footprint. Unlike Flux, it doesn't use Clip-L, only t5-xxl.

Usage

.\build\bin\Release\sd.exe --diffusion-model .models\diffusion-models\chroma-unlocked-v33-Q5_0.gguf --t5xxl .\models\t5\t5xxl_q4_k.gguf --vae .\models\vae\flux\ae.f16.gguf -p "A Cute cat holding a sign that says: \`"Stable diffusion.cpp Now supports Chroma!\`"" --cfg-scale 4 --sampling-method euler --vae-tiling -W 1024 -H 1024

output

Advanced usage

The following environment variables can be set to change the behavior:

  • SD_CHROMA_USE_T5_MASK (default to "ON")
  • SD_CHROMA_USE_DIT_MASK (default to "ON")
  • SD_CHROMA_MASK_PAD_OVERRIDE (default to 1)
  • SD_CHROMA_ENABLE_GUIDANCE (default to "OFF", setting it to "ON" without --guidance 0 arg seems to break inference)

(closes #690)

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

Huh, It seems to kind of work (slowly) on CPU backend... Like the image is not good, but at least it kind of looks like the prompt if you squint your eyes. Maybe there's an issue with the Vulkan implementation of GGML, or there's somthing I'm doing that breaks when using GPU?

Can someone test with Cuda?

prompt:

'Extreme close-up photograph of a single tiger eye, direct frontal view. The iris is very detailed and the pupil resembling a dark void. The word "Chroma" is across the lower portion of the image in large white stylized letters, with brush strokes 
resembling those made with Japanese calligraphy. Each strand of the thick fur is highly detailed and distinguishable. Natural lighting to capture authentic eye shine and depth.'

(--cfg-scale 1 --sampling-method euler --vae-tiling --steps 16 --guidance 0)
output
(same settings produce a black image on Vulkan)

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

Still not working on Vulkan, but at least you don't have to squint to see the CPU result:
output

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

Ok, running the Vulkan build with preview on, I can say there's something very wrong going on. Sometimes (very rarely, I got this like twice in a hundred tests) the output looks correct after the first step, then it turns to noise, then a full black image (Probably NaN/inf). About half of the time it looks like noise from the first step already and then turns black after a few steps. The rest of the time it starts off with a black image and stays like that. It's extremely inconsistent.

Edit: it seems inconsistent on CPU too, but it works more often

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

Yippie I finally got a non-black image with Vulkan
output

@Green-Sky
Copy link
Contributor

Green-Sky commented May 30, 2025

Ran it on cuda.
output
(different prompt, but I dont think thats important 😄 )

edit: worth noting is that I do have d20f77f

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

@Green-Sky Is it the same broken image everytime you run it with the same settings, or is it inconsistent too?

@rmatif
Copy link

rmatif commented May 30, 2025

@stduhpf Ran it on CUDA too and I got this, it's inconsistent too, I ran it 10 times and I got same thing each time

output_chroma

@Green-Sky
Copy link
Contributor

It slightly varies.

trun2
trun1
test_run

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

Yeah that's odd. Why would it vary? It's supposed to be deterministic

@Green-Sky
Copy link
Contributor

Green-Sky commented May 30, 2025

I tried compiling it with ubsan and asan, but

[DEBUG] stable-diffusion.cpp:165  - Using CUDA backend
ggml_cuda_init: failed to initialize CUDA: out of memory
ggml_backend_cuda_init: invalid device 0
[DEBUG] stable-diffusion.cpp:188  - Using CPU backend

and thats the first thing that happens. looks like an upstream issue. also we should update ggml <.<

Oh and using cpu backend, it crashes with ERROR: AddressSanitizer: heap-use-after-free

Details
==84706==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d000006e80 at pc 0x7f4358071892 bp 0x7ffc2a93a850 sp 0x7ffc2a93a010
READ of size 1376 at 0x61d000006e80 thread T0
    #0 0x7f4358071891 in __interceptor_memcpy (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0x71891)
    #1 0x8217f7 in memcpy /nix/store/81awch8mhqanda1vy0c09bflgra4cxh0-glibc-2.40-66-dev/include/bits/string_fortified.h:29
    #2 0x8217f7 in ggml_backend_cpu_buffer_set_tensor /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml/src/ggml-backend.cpp:1877
    #3 0x821fda in ggml_backend_tensor_set /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml/src/ggml-backend.cpp:266
    #4 0x48eef9 in GGMLRunner::cpy_data_to_backend_tensor() /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml_extend.hpp:1138
    #5 0x48efa6 in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml_extend.hpp:1239
    #6 0x48f62d in Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/flux.hpp:1115
    #7 0x4987b6 in FluxModel::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, int, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, float, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/diffusion_model.hpp:178
    #8 0x49ab92 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:883
    #9 0x49b309 in ggml_tensor* std::__invoke_impl<ggml_tensor*, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*, float, int>(std::__invoke_other, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*&&, float&&, int&&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
    #10 0x49b309 in std::enable_if<is_invocable_r_v<ggml_tensor*, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*, float, int>, ggml_tensor*>::type std::__invoke_r<ggml_tensor*, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*, float, int>(StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*&&, float&&, int&&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:114
    #11 0x49b309 in std::_Function_handler<ggml_tensor* (ggml_tensor*, float, int), StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}>::_M_invoke(std::_Any_data const&, ggml_tensor*&&, float&&, int&&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:290
    #12 0x4801fa in std::function<ggml_tensor* (ggml_tensor*, float, int)>::operator()(ggml_tensor*, float, int) const /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:591
    #13 0x4c36ff in sample_k_diffusion /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/denoiser.hpp:543
    #14 0x4c63df in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:994
    #15 0x478088 in generate_image(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, float, float, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:1454
    #16 0x478948 in txt2img /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:1601
    #17 0x4242aa in main /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/examples/cli/main.cpp:948
    #18 0x7f432f23227d in __libc_start_call_main (/nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/libc.so.6+0x2a27d) (BuildId: ff927b1b82bf859074854af941360cb428b4c739)
    #19 0x7f432f232338 in __libc_start_main_alias_1 (/nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/libc.so.6+0x2a338) (BuildId: ff927b1b82bf859074854af941360cb428b4c739)
    #20 0x40d324 in _start (/nix/store/7kva3sp08b4pl8ll40lchlnnqr061nqn-stable-diffusion.cpp/bin/sd+0x40d324)

0x61d000006e80 is located 0 bytes inside of 2048-byte region [0x61d000006e80,0x61d000007680)
freed by thread T0 here:
    #0 0x7f43580dda88 in operator delete(void*, unsigned long) (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0xdda88)
    #1 0x4496fb in std::__new_allocator<float>::deallocate(float*, unsigned long) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/new_allocator.h:172
    #2 0x4e2d34 in Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}::operator()() const /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/flux.hpp:1112
    #3 0x4e2da7 in ggml_cgraph* std::__invoke_impl<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(std::__invoke_other, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
    #4 0x4e2da7 in std::enable_if<is_invocable_r_v<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>, ggml_cgraph*>::type std::__invoke_r<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:114
    #5 0x4e2da7 in std::_Function_handler<ggml_cgraph* (), Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:290

previously allocated by thread T0 here:
    #0 0x7f43580dcb88 in operator new(unsigned long) (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0xdcb88)
    #1 0x44bb97 in std::__new_allocator<float>::allocate(unsigned long, void const*) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/new_allocator.h:151
    #2 0x4e2d34 in Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}::operator()() const /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/flux.hpp:1112
    #3 0x4e2da7 in ggml_cgraph* std::__invoke_impl<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(std::__invoke_other, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
    #4 0x4e2da7 in std::enable_if<is_invocable_r_v<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>, ggml_cgraph*>::type std::__invoke_r<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:114
    #5 0x4e2da7 in std::_Function_handler<ggml_cgraph* (), Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:290

SUMMARY: AddressSanitizer: heap-use-after-free (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0x71891) in __interceptor_memcpy
Shadow bytes around the buggy address:
  0x61d000006c00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000006c80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61d000006d00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61d000006d80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61d000006e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x61d000006e80:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000006f00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000006f80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000007000: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000007080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000007100: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==84706==ABORTING

@Green-Sky
Copy link
Contributor

Output with ubsan (no asan)

trun3

since this looks like the vulkan output, I am guessing the issue is one and the same.

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

Thank you for trying that @Green-Sky . I believe it's working now.

Vulkan backend, 16 steps with cfg (cfg_scale =4), so 32 forward passes without anything breaking:
output

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

Ok it's very important to keep the distilled guidance scale to 0. For some reason the model still accepts it as an input, but it completely breaks apart if it's not zero (I double checked it with ComfyUI, it's not an issue with my code). Maybe I should just force it to zero for chroma to keep things simple?

@Green-Sky
Copy link
Contributor

Ok it's very important to keep the distilled guidance scale to 0. For some reason the model still accepts it as an input, but it completely breaks apart if it's not zero (I double checked it with ComfyUI, it's not an issue with my code). Maybe I should just force it to zero for chroma to keep things simple?

Go for it. Down the line we should put recommended/forced values into the gguf file.

@Green-Sky
Copy link
Contributor

Green-Sky commented May 30, 2025

chroma1

The q4_k is hurting it somewhat, as expected.

--sampling-method euler --steps 16 --guidance 0 --cfg-scale 4 --diffusion-fa -W 1024 -H 1024

[DEBUG] ggml_extend.hpp:1174 - flux params backend buffer size =  4824.80 MB(VRAM) (643 tensors)
[DEBUG] ggml_extend.hpp:1126 - flux compute buffer size: 854.00 MB(VRAM)
[INFO ] stable-diffusion.cpp:1628 - txt2img completed in 208.26s

edit: I used v33

@stduhpf
Copy link
Contributor Author

stduhpf commented May 30, 2025

The q4_k is hurting it somewhat, as expected.

Keeping the distilled_guidance_layer weights to high precision seems to help a lot. (for example: https://huggingface.co/silveroxides/Chroma-GGUF is using BF16 for distilled_guidance_layer, txt_in, img_in and final_layer, so comaptibility with GGML backends might not work, I had to update GGML to get bf16 to work on vulkan)

v32:

silveroxides q4_0 (+bf16) full q4_0 (requantized)
output output

Which begs the question: should we condsider making the convert tool "smarter", like it is in llama.cpp, with different quant types depending on the role of each tensor?

@Green-Sky
Copy link
Contributor

Which begs the question: should we condsider making the convert tool "smarter", like it is in llama.cpp, with different quant types depending on the role of each tensor?

We should.

I find this model with current sd.cpp quantization incredibly hard to prompt too, but that is probably the token padding / masking.

chroma_apls-q5_k

This is supposed to be q5_k quality, normal flux looks way better, even flux light looks better.

@kgigitdev
Copy link

Hi @stduhpf and @Green-Sky ,

Apologies for the question, but is this branch intended to work solely with vanilla master and master's version of ggml? I've been maintaining a little script for personal use that merges specific branches from here and there to get a more up-to-date build, and no matter what I do I can't get this branch to work, either on its own or with other branches.

Here's the extract from my script that shows the current state of what I have generally been merging in (with a few of them commented out, as you can see, but I've included them anyway since they do work in some other combinations):

BRANCHES="zhouwg/sync_with_latest_ggml"
BRANCHES="${BRANCHES} wbruna/fix_sdxl_lora"
BRANCHES="${BRANCHES} stduhpf/sdxl-embd"
BRANCHES="${BRANCHES} stduhpf/tiled-vae-encode"
BRANCHES="${BRANCHES} stduhpf/imatrix"
BRANCHES="${BRANCHES} stduhpf/lcppT5"
BRANCHES="${BRANCHES} stduhpf/unchained"
BRANCHES="${BRANCHES} stduhpf/dt"
# BRANCHES="${BRANCHES} stduhpf/diffusers"
BRANCHES="${BRANCHES} stduhpf/override-te"
BRANCHES="${BRANCHES} stduhpf/concat-controls"
# BRANCHES="${BRANCHES} stduhpf/ip2p"
BRANCHES="${BRANCHES} ImKyra/master"
# BRANCHES="${BRANCHES} Green-Sky/large_file_hardening"
# BRANCHES="${BRANCHES} rmatif/sigmas"

Now, I already suspected that this branch would probably not work directly in my script, since you said, "I had to update GGML to get bf16 to work on vulkan", and that probably conflicts with zhouwg/sync_with_latest_ggml. But in general, it has been harder and harder to get a working build with all the latest features and fixes in place.

Now, Chroma being a Flux derivative, I should also mention that for a while now I've not been able to get Flux to work either; the problems are generally of the form:

A gajillion of these:

[ERROR] model.cpp:1938 - tensor 'first_stage_model.decoder.conv_in.bias' not in model file

or a gajillion of these (with the --diffusion-model argument):

[INFO ] model.cpp:1897 - unknown tensor 'model.diffusion_model.model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale | f8_e4m3 | 1 [128, 1, 1, 1, 1]' in model file

(the doubled prefix probably indicating that something has automatically prefixed model.diffusion_model. where one already existed).

followed by:

[ERROR] stable-diffusion.cpp:441  - load tensors from model loader failed

From looking at ModelLoader::get_sd_version(), it looks like there's a ton of heuristics to determine the model type based on the tensor names, but that's all probably doomed to fail anyway if the tensor loading has previously failed. Or, I'm missing a branch somewhere that adds new entries to enum SDVersion.

And that's the point where I give up, since I don't know enough about the expected naming of the tensors.

Which also brings us to the elephant in the room that you're all too polite to talk about :-) : vis-a-vis #686 , given that I'm probably not the only one with the above problems, I'm sure that nobody would take offence if a temporary (friendly, and prominently attributed) fork were created to contain suitably-approved merged and conflict-resolved branches from all the developers who have pending PRs, as well as lots of useful work that is currently being duplicated across lots of forks.

@Green-Sky
Copy link
Contributor

@kgigitdev I feel you. For my use case I depend on the webserver api.
Can you open a separate issue for this? It seems to be gernally flux, like you said.

Now, I already suspected that this branch would probably not work directly in my script, since you said, "I had to update GGML to get bf16 to work on vulkan"

This does not seem to be in this pr.

Which also brings us to the elephant in the room that you're all too polite to talk about :-) : vis-a-vis #686 , given that I'm probably not the only one with the above problems, I'm sure that nobody would take offence if a temporary (friendly, and prominently attributed) fork were created to contain suitably-approved merged and conflict-resolved branches from all the developers who have pending PRs, as well as lots of useful work that is currently being duplicated across lots of forks.

If we ended up doing this, I would ask @ggerganov to host that project at the ggml org.
But let's wait a while longer and see what @leejet ends up doing.

@kgigitdev
Copy link

Hi @Green-Sky ,

Thanks for your swift answer.

If we ended up doing this, I would ask @ggerganov to host that project at the ggml org.

Gosh, yes, I hadn't even thought of that option. I think I had subconsciously assumed that stable-diffusion.cpp was only ever on the periphery of the llama.cpp people, because llama == serious work whereas stable diffusion == frippery and frivolity.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 1, 2025

A gajillion of these:

[ERROR] model.cpp:1938 - tensor 'first_stage_model.decoder.conv_in.bias' not in model file

That probably means the VAE is missing, or that the tensors from the VAE can't be found because their names are not the ones expected.

or a gajillion of these (with the --diffusion-model argument):

[INFO ] model.cpp:1897 - unknown tensor 'model.diffusion_model.model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale | f8_e4m3 | 1 [128, 1, 1, 1, 1]' in model file

(the doubled prefix probably indicating that something has automatically prefixed model.diffusion_model. where one already existed).

Yes, the doubled prefix means the tensor names are already prefixed in the model file you're using, which means you should use --model instead of --diffusion-model. This often happens with versions of Flux with built-in vae, in which case that would also fix the other issue.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 1, 2025

With attention masking Without
output copy 6 output copy 7

Somehow the results I get with the attention masking look a bit worse than what I had before without it. Maybe once I implement the mask modification to attend some padding tokens, it will fix itself? For now if you just want to generate better pictures, use an earlier commit.
The compute buffer size is also significantly increased so I can no longer generate 1024x1024 with Vulkan (It reaches the allocation limit)

Edit: I fixed the compute buffer size issue, and after trying different prompts I'm not sure which is the best actually. Maybe using masking is not so bad after all.

@wbruna
Copy link

wbruna commented Jun 2, 2025

Which begs the question: should we consider making the convert tool "smarter", like it is in llama.cpp, with different quant types depending on the role of each tensor?

FWIW, I've been working on an option for sd.cpp to choose the quant by tensor pattern, a la llama.cpp's overridetensors: e128cfa . The conversion itself already works; could be useful for testing.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 3, 2025

@Green-Sky in t5.hpp, line 438 (i used -HUGE_VALF, which is an alias for -(float)INFINITY

@Green-Sky
Copy link
Contributor

chroma_modify_mask_to_attend_padding() gets called multiple times, ending up unmasking more and more.

eg two step sampling:

[INFO ] stable-diffusion.cpp:1413 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1450 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:819  - Sample
[DEBUG] flux.hpp:1108 - Forcing guidance to 0 for chroma model (SD_CHROMA_ENABLE_GUIDANCE env variable to "ON" to enable)
[DEBUG] flux.hpp:725  - PAD: 1
[DEBUG] flux.hpp:734  - MASKED: 492
[DEBUG] ggml_extend.hpp:1138 - flux compute buffer size: 324.34 MB(RAM)
[DEBUG] flux.hpp:1108 - Forcing guidance to 0 for chroma model (SD_CHROMA_ENABLE_GUIDANCE env variable to "ON" to enable)
[DEBUG] flux.hpp:725  - PAD: 1
[DEBUG] flux.hpp:734  - MASKED: 491
  |=========================>                        | 1/2 - 124.44s/it[DEBUG] flux.hpp:1108 - Forcing guidance to 0 for chroma model (SD_CHROMA_ENABLE_GUIDANCE env variable to "ON" to enable)
[DEBUG] flux.hpp:725  - PAD: 1
[DEBUG] flux.hpp:734  - MASKED: 490
  |==================================================| 2/2 - 124.99s/it
[INFO ] stable-diffusion.cpp:1489 - sampling completed, taking 249.50s

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 4, 2025

chroma_modify_mask_to_attend_padding() gets called multiple times, ending up unmasking more and more.

Nice catch!

@Amin456789
Copy link

@stduhpf could please please release this in ur repo for now
i really wanna try it. i need avx2 compiled exe
thank u

@Amin456789
Copy link

@LostRuins please add this to ur great app too

@LostRuins
Copy link
Contributor

Indeed I will, once it's merged here. stduhpf is doing very fine work.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 4, 2025

@Amin456789 Done.

@LostRuins
Copy link
Contributor

LostRuins commented Jun 8, 2025

Well it works, sort of.
I dunno if it's because I am using lousy settings or a poor quant.

cat holding sign that says EVIL CAT

Prompt is cat holding sign that says EVIL CAT
This is chroma v35 at q4_0. Euler at 20 steps, cfg 5.0. 768x768
The cat's face looks really mashed.

Edit: Also, I had to double this to get it to work at higher resolutions

params.mem_size = static_cast<size_t>(10 * 1024 * 1024); // 10 MB

@Green-Sky
Copy link
Contributor

@LostRuins try adding a simple negative prompt.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 8, 2025

@LostRuins try adding a simple negative prompt.

Ah maybe my fix to avoid including more and more padding at each diffusion step messed up the empty negative prompts. I can't check it today.

@LostRuins
Copy link
Contributor

Hmm it is possible. I use the exact same seed and settings, but with the negative prompt ugly
parse 'ugly' to [['ugly', 1], ]

this is the result
777800-cat holding sign that says EVIL CAT ### ugly

@LostRuins
Copy link
Contributor

I will be happy to test any fix you have on the same prompt and seed again.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 8, 2025

I will be happy to test any fix you have on the same prompt and seed again.

You could try with the penultimate commit (efaa137). The padding is technically incorrect with this one, but it doesn't seem to matter in practice.

@LostRuins
Copy link
Contributor

777800-cat holding sign that says EVIL CAT

777800-cat holding sign that says EVIL CAT ### ugly

@stduhpf @Green-Sky Significant improvement for the case with no negative prompt

I used the same seeds for both cases. Top one has no negative prompt, bottom one does (same as earlier test).

Oddly, the one with negative prompt has changed too, it seems a bit worse now? Could just be variance though.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 8, 2025

Oddly, the one with negative prompt has changed too, it seems a bit worse now? Could just be variance though.

It could be because of the incorrect padding.

@LostRuins
Copy link
Contributor

I see. I will save my settings and try again to compare after it's fixed then. Will provide a good benchmark. Thanks!

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 9, 2025

@LostRuins Did you quantize v35 to q4_0 and convert to GGUF yourself or did you use someone else's quantized model?

(I'm trying to reproduce your issue, on my machine, on the lastest commit, it looks like it using silveroxides' v29 q4_0 , with the other settings same as you, without a negative prompt)
output
That doesn't look as broken to me. So maybe it's a problem with the model? I need to try with v35, but I'd like to know which one to download.

@LostRuins
Copy link
Contributor

LostRuins commented Jun 9, 2025

I used a preconverted model. You can try it here: https://huggingface.co/silveroxides/Chroma-GGUF/blob/main/chroma-unlocked-v35/chroma-unlocked-v35-Q4_0.gguf

T5 from t5xxl_fp8_e4m3fn.safetensors from Flux Dev
No Clip-L was used.
ae.safetensors VAE from Flux Dev
No LoRAs were used.

Generation params:
Prompt: cat holding sign that says "EVIL CAT"
Seed: 777800
Steps: 20
Sampler: Euler
Width: 768
Height: 768
CFG Scale: 5
Clip-Skip: 0 (which becomes default 2 by sd.cpp logic)
NO negative prompt (important)

The image and both cases are reproducible by me on the same settings.

@LostRuins
Copy link
Contributor

LostRuins commented Jun 9, 2025

But actually even for your example it does look at bit mashed too. Look at the ears, compare them if you revert the newest commit on the same seed.

image

Compare your cat ears with mine above

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 9, 2025

sd.exe --diffusion-model models/chroma-unlocked-v35-Q4_0.gguf --t5xxl models/t5xxl_q4_k.gguf --vae models/ae.f16.gguf -p "cat holding sign that says EVIL CAT" --cfg-scale 5 --sampling-method euler -W 768 -H 768 -s 777800
output

I'm using t5xxl_q4_k.gguf though, maybe that could be the cause of the difference? Or maybe you're using flash attention?

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 9, 2025

sd.exe --diffusion-model models/chroma-unlocked-v35-Q4_0.gguf --t5xxl models/t5xxl_fp8_e4m3fn.safetensors --vae models/ae.f16.gguf -p "cat holding sign that says EVIL CAT" --cfg-scale 5 --sampling-method euler -W 768 -H 768 -s 777800
output

Ok that's a bigger difference than I expected from just chosing a different T5 quant. But it's still not looking too broken on my end...

@LostRuins
Copy link
Contributor

LostRuins commented Jun 9, 2025

um, the paws in your last image are completely mashed. This is not flux quality heheh... Unless chroma itself is broken, but even Sd1.5 can generate better paws that that 😂

image

Can you try the last option, but first revert your stduhpf@e22c57c , keep the same seed and everything else the same, and compare the result? How does it look?

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 9, 2025

Can you try the last option, but first revert your stduhpf@e22c57c , keep the same seed and everything else the same, and compare the result? How does it look?

output

um, the paws in your last image are completely mashed. This is not flux quality heheh... Unless chroma itself is broken, but even Sd1.5 can generate better paws that that 😂
image

This could just be an effect of the quantization to be honest. q4_0 is not the best for quality.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 9, 2025

With chroma-unlocked-v35-Q48_0.gguf (and t5xxl fp8)
output

@LostRuins
Copy link
Contributor

Haha well I dunno... maybe someone else can chip in. Could indeed just be the model or quant maybe?

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 9, 2025

@LostRuins Are you using flash attention or not?

Because after looking quickly into it, it seems like flash attention only works with either causal attention masks or no mask at all, and in the case of Chroma, the mask is weirdly shaped.

@LostRuins
Copy link
Contributor

That requires SD_USE_FLASH_ATTENTION manually defined right? If so, I am not using it

@stduhpf
Copy link
Contributor Author

stduhpf commented Jun 9, 2025

Then I'm clueless.

@LostRuins
Copy link
Contributor

Alright, well we'll go with whatever solution you think works best. I'm just rather surprised at how wonky the chroma outputs are. Maybe it is just the model. But I think you'd agree that these are a big step down from Flux subjectively speaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support other fine tuned flux models and gguf versions
7 participants