ggml : fix SSM_SCAN for n_groups > 1 #15625

compilade · 2025-08-27T21:56:55Z

This fixes a problem noticed by @gabe-l-hart in #15507 (comment).

The upstream implementation of the SSM scan repeats the grouped parts of B and C like repeat_interleave behaves instead of like repeat.

Since most Mamba2 models use n_groups == 1, and that Mamba-Codestral-7B-v0.1 (which uses n_groups == 8) had a non-extreme perplexity, this was not really noticed until #15507.

On CPU, this reduces the perplexity of a Q8_0 Mamba-Codestral-7B-v0.1 on the first 10 chunks of wiki.test.raw quite a lot:

Before:

[1]8.2788,[2]10.8075,[3]12.6548,[4]14.6535,[5]14.1671,[6]14.0561,[7]14.6714,[8]14.8880,[9]15.2977,[10]16.1435,
Final estimate: PPL = 16.1435 +/- 0.88152

After:

[1]5.2122,[2]6.4318,[3]7.0763,[4]8.0914,[5]8.1234,[6]8.1319,[7]8.4960,[8]8.6492,[9]8.7738,[10]9.1851,
Final estimate: PPL = 9.1851 +/- 0.46159

To be clear, there's no need for reconversion of the affected models because this is purely a problem in the SSM_SCAN operation.

However, if imatrix was used, it would make sense to recompute that.

TODO:

Test with Metal
Test with CUDA

Make sure to read the contributing guidelines before submitting a PR

gabe-l-hart

The changes look good to me, but I don't have them fully verified yet. When merged into my NemotronH branch, the results are much better than previously, but they still don't match the transformers CUDA output. This is likely somewhere else in the model implementation unrelated to this bug, though. I have run Mamba-Codestral-7B with the fix and it shows good quality output.

gabe-l-hart

I now have this working for NemotronH. Without this fix, the results are still garbage, but with this fix, I get coherent results that match (with --temp 0) across CPU, Metal, and CUDA. I've also verified that results on Mamba-Codestral-7B match across CPU and Metal and show good quality, and for granite-4.0-tiny-preview (to sanity check a n_groups == 1 model).

It's good to ship in my book!

gabe-l-hart · 2025-08-28T14:07:37Z

@compilade Assuming you don't have any further testing you want to do, let's merge this one!

…nemotron-nano-15409 * origin/master: ggml : fix SSM_SCAN for n_groups > 1 (ggml-org#15625) kv-cache : fix find_slot to not search for continuous slot (ggml-org#15638) model : jina-embeddings-v3 support (ggml-org#13693)

…upport * origin/master: ggml : fix SSM_SCAN for n_groups > 1 (ggml-org#15625) kv-cache : fix find_slot to not search for continuous slot (ggml-org#15638) model : jina-embeddings-v3 support (ggml-org#13693) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

ggml : fix SSM_SCAN for n_groups > 1

dc2187d

compilade requested a review from gabe-l-hart August 27, 2025 21:56

compilade added generation quality Quality of model output bugfix fixes an issue or bug labels Aug 27, 2025

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Aug 27, 2025

gabe-l-hart reviewed Aug 28, 2025

View reviewed changes

gabe-l-hart approved these changes Aug 28, 2025

View reviewed changes

compilade merged commit 7380414 into master Aug 28, 2025
48 checks passed

compilade mentioned this pull request Aug 28, 2025

Feature Request: Support Codestral Mamba #8519

Closed

Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 29, 2025

ggml : fix SSM_SCAN for n_groups > 1 (ggml-org#15625)

221a8c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : fix SSM_SCAN for n_groups > 1 #15625

ggml : fix SSM_SCAN for n_groups > 1 #15625

compilade commented Aug 27, 2025 •

edited

Loading

Uh oh!

gabe-l-hart left a comment

Uh oh!

gabe-l-hart left a comment

Uh oh!

gabe-l-hart commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

ggml : fix SSM_SCAN for n_groups > 1 #15625

ggml : fix SSM_SCAN for n_groups > 1 #15625

Conversation

compilade commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO:

Uh oh!

gabe-l-hart left a comment

Choose a reason for hiding this comment

Uh oh!

gabe-l-hart left a comment

Choose a reason for hiding this comment

Uh oh!

gabe-l-hart commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

compilade commented Aug 27, 2025 •

edited

Loading