Skip to content

Bug in convert mode "ggml-quants.c:3929: fatal error" #689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Disonantemus opened this issue May 25, 2025 · 3 comments
Open

Bug in convert mode "ggml-quants.c:3929: fatal error" #689

Disonantemus opened this issue May 25, 2025 · 3 comments

Comments

@Disonantemus
Copy link

Summary

  • When trying to convert a model to iq3_s or iq3_xxs, gives fatal error and abort.

Error

sd -M convert -m realDream_sdxl6.safetensors --type iq3_s
[INFO ] model.cpp:908  - load realDream_sdxl6.safetensors using safetensors format
[INFO ] model.cpp:1985 - model tensors mem size: 2183.06MB
  |=>                                                | 55/2641 - 0.00it/sOops: found point 103 not on grid: 103 0 0 0
/usr/src/debug/stable-diffusion.cpp-vulkan-git/stable-diffusion.cpp/ggml/src/ggml-quants.c:3929: fatal error
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Command to test quants:

sd -M convert -m realDream_sdxl6.safetensors --type q4_0

Test model: realDream_sdxl6 ( SDXL | F16 | 6.46GB )

Speed to convert quants (almost all of them)

quant model tensor mem size it/s
tq1_0 1565.20MB 14.49
tq2_0 1697.60MB 13.51
q2_K 1896.20MB 4.52
iq3_xxs 2050.66MB no
iq3_s 2183.06MB no
q3_K 2183.06MB 8.26
iq4_xs 2469.92MB 1.61
iq4_nl 2479.52MB 1.78
q4_0 2479.52MB 10.00
q4_K 2558.19MB 5.46
q4_1 2659.47MB 5.56
q5_0 2839.42MB 9.80
q5_K 2911.25MB 5.26
q5_1 3019.38MB 5.52
q6_K 3286.38MB 5.95
q8_0 3919.13MB 13.70

Time to finish convertion:

  • q8_0: 4m16s
  • iq4_xs: 19m52s (very very slow)

Conclusions

  • Bug error (fatal) in: iq3_xxs, iq3_s, maybe more
  • Conversion use only "one" CPU core, multithreaded optimization maybe?
  • q8_0 is converted 4.65x faster than iq4_xs
  • Faster: q8_0 > q4_0

System:

OS: Arch Linux x86_64
Kernel: Linux 6.12.24-1-lts
Shell: bash 5.2.37
WM: dwm (X11)
Terminal: tmux 3.5a
CPU: Intel(R) Core(TM) i7-4790 (8) @ 3.60 GHz
GPU: NVIDIA GeForce GTX 1660 SUPER [Discrete] (6GB)
Memory: 2.47 GiB / 15.56 GiB (16%)
Locale: en_US.UTF-8

@stduhpf
Copy link
Contributor

stduhpf commented May 25, 2025

Does the crash happen with other models too? This looks like a bug in upstream GGML, and I've never seen this one before.

And yeah, conversion is slow, but it seems a bit hard to optimize it, and it's not something that's typically used a lot, so I don't think improving it has a high priority.

EDIT:
I tried with the same model you linked, and I can confirm the issue. The other SDXL models I tried didn't have this issue. I don't understand enough about the quantization process to figure out what's causing it though.

@idostyle
Copy link
Contributor

Might be related to ggml-org/llama.cpp#11773 and the linked issues within. You
could try the fix proposed by compilade.

@stduhpf
Copy link
Contributor

stduhpf commented May 26, 2025

Might be related to ggml-org/llama.cpp#11773 and the linked issues within. You could try the fix proposed by compilade.

It really looks like the same kind of issue, but that patch doesn't fix the issue here (no changes for iq3_s, and iq3_xxs still fails in the same way despite the eps check).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants