Bug in convert mode "ggml-quants.c:3929: fatal error" #689

Disonantemus · 2025-05-25T15:25:38Z

Summary

When trying to convert a model to iq3_s or iq3_xxs, gives fatal error and abort.

Error

sd -M convert -m realDream_sdxl6.safetensors --type iq3_s
[INFO ] model.cpp:908  - load realDream_sdxl6.safetensors using safetensors format
[INFO ] model.cpp:1985 - model tensors mem size: 2183.06MB
  |=>                                                | 55/2641 - 0.00it/sOops: found point 103 not on grid: 103 0 0 0
/usr/src/debug/stable-diffusion.cpp-vulkan-git/stable-diffusion.cpp/ggml/src/ggml-quants.c:3929: fatal error
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Command to test quants:

sd -M convert -m realDream_sdxl6.safetensors --type q4_0

Test model: realDream_sdxl6 ( SDXL | F16 | 6.46GB )

Speed to convert quants (almost all of them)

quant	model tensor mem size	it/s
tq1_0	1565.20MB	14.49
tq2_0	1697.60MB	13.51
q2_K	1896.20MB	4.52
iq3_xxs	2050.66MB	no
iq3_s	2183.06MB	no
q3_K	2183.06MB	8.26
iq4_xs	2469.92MB	1.61
iq4_nl	2479.52MB	1.78
q4_0	2479.52MB	10.00
q4_K	2558.19MB	5.46
q4_1	2659.47MB	5.56
q5_0	2839.42MB	9.80
q5_K	2911.25MB	5.26
q5_1	3019.38MB	5.52
q6_K	3286.38MB	5.95
q8_0	3919.13MB	13.70

Time to finish convertion:

q8_0: 4m16s
iq4_xs: 19m52s (very very slow)

Conclusions

Bug error (fatal) in: iq3_xxs, iq3_s, maybe more
Conversion use only "one" CPU core, multithreaded optimization maybe?
q8_0 is converted 4.65x faster than iq4_xs
Faster: q8_0 > q4_0

System:

OS: Arch Linux x86_64
Kernel: Linux 6.12.24-1-lts
Shell: bash 5.2.37
WM: dwm (X11)
Terminal: tmux 3.5a
CPU: Intel(R) Core(TM) i7-4790 (8) @ 3.60 GHz
GPU: NVIDIA GeForce GTX 1660 SUPER [Discrete] (6GB)
Memory: 2.47 GiB / 15.56 GiB (16%)
Locale: en_US.UTF-8

The text was updated successfully, but these errors were encountered:

stduhpf · 2025-05-25T18:22:19Z

Does the crash happen with other models too? This looks like a bug in upstream GGML, and I've never seen this one before.

And yeah, conversion is slow, but it seems a bit hard to optimize it, and it's not something that's typically used a lot, so I don't think improving it has a high priority.

EDIT:
I tried with the same model you linked, and I can confirm the issue. The other SDXL models I tried didn't have this issue. I don't understand enough about the quantization process to figure out what's causing it though.

idostyle · 2025-05-25T23:01:29Z

Might be related to ggml-org/llama.cpp#11773 and the linked issues within. You
could try the fix proposed by compilade.

stduhpf · 2025-05-26T00:15:54Z

Might be related to ggml-org/llama.cpp#11773 and the linked issues within. You could try the fix proposed by compilade.

It really looks like the same kind of issue, but that patch doesn't fix the issue here (no changes for iq3_s, and iq3_xxs still fails in the same way despite the eps check).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in convert mode "ggml-quants.c:3929: fatal error" #689

Bug in convert mode "ggml-quants.c:3929: fatal error" #689

Disonantemus commented May 25, 2025

stduhpf commented May 25, 2025 •

edited

Loading

Uh oh!

idostyle commented May 25, 2025

Uh oh!

stduhpf commented May 26, 2025

Uh oh!

Bug in convert mode "ggml-quants.c:3929: fatal error" #689

Bug in convert mode "ggml-quants.c:3929: fatal error" #689

Comments

Disonantemus commented May 25, 2025

Summary

Error

Command to test quants:

Speed to convert quants (almost all of them)

Conclusions

System:

stduhpf commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idostyle commented May 25, 2025

Uh oh!

stduhpf commented May 26, 2025

Uh oh!

stduhpf commented May 25, 2025 •

edited

Loading