Skip to content

error load model SDXL  #552

Open
Open
@kashimAstro

Description

@kashimAstro

Hi guys, thanks for the great work.

I recently downloaded the latest master from git
to try out the new features like inpaint.

I noticed that when I compile with CUDA and VULKAN and try to load SDXL models I get a segmentation fault.

I think it all happens near model.cpp
when i compile with cpu backend this error does not appear.

attach verbose cuda backend:

./build.cuda/bin/sd -m /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors --vae /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors -p 'a lovely cat' --vae-on-cpu -v

Option:
n_threads: 6
mode: txt2img
model_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:163 - Using CUDA backend
[INFO ] stable-diffusion.cpp:195 - loading model from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:230 - loading vae from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:275 - Weight type: f16
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:280 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4900.07 MiB on device 0: cudaMalloc failed: out of memory
[ERROR] ggml_extend.hpp:1101 - unet alloc params backend buffer failed, num_tensors = 1680
[INFO ] stable-diffusion.cpp:354 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1107 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:417 - loading weights
[DEBUG] model.cpp:1698 - loading tensors from /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
|=============> | 713/2641 - 11.36it/s
Errore di segmentazione (core dump creato)

attach verbose vulkan backend:

./build.vulkan/bin/sd -m /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors --vae /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors -p 'a lovely cat' --vae-on-cpu -v
Option:
n_threads: 6
mode: txt2img
model_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:172 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce GTX 1070 Ti (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
ggml_vulkan: Compiling shaders..............................Done!
[INFO ] stable-diffusion.cpp:195 - loading model from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:230 - loading vae from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:275 - Weight type: f16
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:280 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
ggml_vulkan: Device memory allocation of size 847096320 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:1101 - unet alloc params backend buffer failed, num_tensors = 1680
[INFO ] stable-diffusion.cpp:354 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1107 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:417 - loading weights
[DEBUG] model.cpp:1698 - loading tensors from /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
|=============> | 713/2641 - 71.43it/s
Errore di segmentazione (core dump creato)

I run sd.cpp with gdb to try to trace the error, but I'm not sure if that's the right place.

gdb cuda:

Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x0000555555940af2 in ggml_fp16_to_fp32_row ()
(gdb) where
#0 0x0000555555940af2 in ggml_fp16_to_fp32_row ()
#1 0x00005555555eb689 in ModelLoader::load_tensors(std::function<bool (TensorStorage const&, ggml_tensor**)>, ggml_backend*) ()
#2 0x00005555555ec8b9 in ModelLoader::load_tensors(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ggml_tensor*> > >&, ggml_backend*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)
()
#3 0x00005555556bf935 in StableDiffusionGGML::load_from_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, ggml_type, schedule_t, bool, bool, bool, bool) ()
#4 0x0000555555627ccc in new_sd_ctx ()
#5 0x00005555555736bc in main ()

gdb vulkan:

Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x00005555557a81d5 in ggml_backend_tensor_set ()
(gdb) where
#0 0x00005555557a81d5 in ggml_backend_tensor_set ()
#1 0x00005555555f4593 in ModelLoader::load_tensors(std::function<bool (TensorStorage const&, ggml_tensor**)>, ggml_backend*) ()
#2 0x00005555555f4d59 in ModelLoader::load_tensors(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ggml_tensor*> > >&, ggml_backend*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)
()
#3 0x00005555556c7d35 in StableDiffusionGGML::load_from_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, ggml_type, schedule_t, bool, bool, bool, bool) ()
#4 0x000055555563015c in new_sd_ctx ()
#5 0x00005555555991cc in main ()

in the cuda case if I follow gdb suggestion
it seems that the point is in convert_tensor model.cpp:

https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L735

while for vulkan I arrive here:
https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L1822

I'm not sure how to debug this right now.
the verbose makes me think it was a loading error.

cuda:
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4900.07 MiB on device 0: cudaMalloc failed: out of memory

vulkan:
gml_vulkan: Device memory allocation of size 847096320 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory

any ideas to try to investigate?

thanks Dario

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions