Eval bug: Nemotron v2 Nano always reprocesses prompt

### Name and Version

load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /devel/tools/llama.cpp/build/bin/libggml-cuda.so
load_backend: loaded CPU backend from /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so
version: 6323 (ad8916634)
built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

RTX 3080 + i7-9700K

### Models

Nemotron Nano v2 9B

### Problem description & steps to reproduce

Nemotron always reprocesses the entire context. I though this was maybe due to the fact that the thinking contents get included, but even after I added proper reasoning support and the thinking gets correctly parsed to `reasoning_content`, the context gets reprocessed every time.

@ggerganov any chance to add that SWA snapshot mechanism here as well?

### First Bad Commit

_No response_

### Relevant log output

```shell
slot launch_slot_: id  0 | task 18510 | processing task
slot update_slots: id  0 | task 18510 | new prompt, n_ctx_slot = 120064, n_keep = 0, n_prompt_tokens = 26680
slot update_slots: id  0 | task 18510 | n_past = 26582, cache_tokens.size() = 27102, seq_id = 0, pos_min = 27101, n_swa = 0
slot update_slots: id  0 | task 18510 | forcing full prompt re-processing due to lack of cache data (likely due to SWA, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 18510 | kv cache rm [0, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.076762
slot update_slots: id  0 | task 18510 | kv cache rm [2048, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.153523
slot update_slots: id  0 | task 18510 | kv cache rm [4096, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 6144, n_tokens = 2048, progress = 0.230285
slot update_slots: id  0 | task 18510 | kv cache rm [6144, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 8192, n_tokens = 2048, progress = 0.307046
slot update_slots: id  0 | task 18510 | kv cache rm [8192, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 10240, n_tokens = 2048, progress = 0.383808
slot update_slots: id  0 | task 18510 | kv cache rm [10240, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 12288, n_tokens = 2048, progress = 0.460570
slot update_slots: id  0 | task 18510 | kv cache rm [12288, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 14336, n_tokens = 2048, progress = 0.537331
slot update_slots: id  0 | task 18510 | kv cache rm [14336, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 16384, n_tokens = 2048, progress = 0.614093
slot update_slots: id  0 | task 18510 | kv cache rm [16384, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 18432, n_tokens = 2048, progress = 0.690855
slot update_slots: id  0 | task 18510 | kv cache rm [18432, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 20480, n_tokens = 2048, progress = 0.767616
slot update_slots: id  0 | task 18510 | kv cache rm [20480, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 22528, n_tokens = 2048, progress = 0.844378
slot update_slots: id  0 | task 18510 | kv cache rm [22528, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 24576, n_tokens = 2048, progress = 0.921139
slot update_slots: id  0 | task 18510 | kv cache rm [24576, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 26624, n_tokens = 2048, progress = 0.997901
slot update_slots: id  0 | task 18510 | kv cache rm [26624, end)
slot update_slots: id  0 | task 18510 | prompt processing progress, n_past = 26680, n_tokens = 56, progress = 1.000000
slot update_slots: id  0 | task 18510 | prompt done, n_past = 26680, n_tokens = 56
slot      release: id  0 | task 18510 | stop processing: n_past = 30775, truncated = 0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Nemotron v2 Nano always reprocesses prompt #15677

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Nemotron v2 Nano always reprocesses prompt #15677

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions