Description
Hi guys, thanks for the great work.
I recently downloaded the latest master from git
to try out the new features like inpaint.
I noticed that when I compile with CUDA and VULKAN and try to load SDXL models I get a segmentation fault.
I think it all happens near model.cpp
when i compile with cpu backend this error does not appear.
attach verbose cuda backend:
Option:
n_threads: 6
mode: txt2img
model_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:163 - Using CUDA backend
[INFO ] stable-diffusion.cpp:195 - loading model from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:230 - loading vae from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:275 - Weight type: f16
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:280 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4900.07 MiB on device 0: cudaMalloc failed: out of memory
[ERROR] ggml_extend.hpp:1101 - unet alloc params backend buffer failed, num_tensors = 1680
[INFO ] stable-diffusion.cpp:354 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1107 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:417 - loading weights
[DEBUG] model.cpp:1698 - loading tensors from /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
|=============> | 713/2641 - 11.36it/s
Errore di segmentazione (core dump creato)
attach verbose vulkan backend:
./build.vulkan/bin/sd -m /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors --vae /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors -p 'a lovely cat' --vae-on-cpu -v
Option:
n_threads: 6
mode: txt2img
model_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:172 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce GTX 1070 Ti (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
ggml_vulkan: Compiling shaders..............................Done!
[INFO ] stable-diffusion.cpp:195 - loading model from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:230 - loading vae from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:275 - Weight type: f16
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:280 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
ggml_vulkan: Device memory allocation of size 847096320 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:1101 - unet alloc params backend buffer failed, num_tensors = 1680
[INFO ] stable-diffusion.cpp:354 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1107 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:417 - loading weights
[DEBUG] model.cpp:1698 - loading tensors from /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
|=============> | 713/2641 - 71.43it/s
Errore di segmentazione (core dump creato)
I run sd.cpp with gdb to try to trace the error, but I'm not sure if that's the right place.
gdb cuda:
Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x0000555555940af2 in ggml_fp16_to_fp32_row ()
(gdb) where
#0 0x0000555555940af2 in ggml_fp16_to_fp32_row ()
#1 0x00005555555eb689 in ModelLoader::load_tensors(std::function<bool (TensorStorage const&, ggml_tensor**)>, ggml_backend*) ()
#2 0x00005555555ec8b9 in ModelLoader::load_tensors(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ggml_tensor*> > >&, ggml_backend*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)
()
#3 0x00005555556bf935 in StableDiffusionGGML::load_from_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, ggml_type, schedule_t, bool, bool, bool, bool) ()
#4 0x0000555555627ccc in new_sd_ctx ()
#5 0x00005555555736bc in main ()
gdb vulkan:
Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x00005555557a81d5 in ggml_backend_tensor_set ()
(gdb) where
#0 0x00005555557a81d5 in ggml_backend_tensor_set ()
#1 0x00005555555f4593 in ModelLoader::load_tensors(std::function<bool (TensorStorage const&, ggml_tensor**)>, ggml_backend*) ()
#2 0x00005555555f4d59 in ModelLoader::load_tensors(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ggml_tensor*> > >&, ggml_backend*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)
()
#3 0x00005555556c7d35 in StableDiffusionGGML::load_from_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, ggml_type, schedule_t, bool, bool, bool, bool) ()
#4 0x000055555563015c in new_sd_ctx ()
#5 0x00005555555991cc in main ()
in the cuda case if I follow gdb suggestion
it seems that the point is in convert_tensor model.cpp:
https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L735
while for vulkan I arrive here:
https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L1822
I'm not sure how to debug this right now.
the verbose makes me think it was a loading error.
cuda:
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4900.07 MiB on device 0: cudaMalloc failed: out of memory
vulkan:
gml_vulkan: Device memory allocation of size 847096320 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
any ideas to try to investigate?
thanks Dario