-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Description
Name and Version
load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /devel/tools/llama.cpp/build/bin/libggml-cuda.so
load_backend: loaded CPU backend from /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so
version: 6323 (ad89166)
built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 3080 + i7-9700K
Models
Nemotron Nano v2 9B
Problem description & steps to reproduce
Nemotron always reprocesses the entire context. I though this was maybe due to the fact that the thinking contents get included, but even after I added proper reasoning support and the thinking gets correctly parsed to reasoning_content
, the context gets reprocessed every time.
@ggerganov any chance to add that SWA snapshot mechanism here as well?
First Bad Commit
No response
Relevant log output
slot launch_slot_: id 0 | task 18510 | processing task
slot update_slots: id 0 | task 18510 | new prompt, n_ctx_slot = 120064, n_keep = 0, n_prompt_tokens = 26680
slot update_slots: id 0 | task 18510 | n_past = 26582, cache_tokens.size() = 27102, seq_id = 0, pos_min = 27101, n_swa = 0
slot update_slots: id 0 | task 18510 | forcing full prompt re-processing due to lack of cache data (likely due to SWA, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 18510 | kv cache rm [0, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.076762
slot update_slots: id 0 | task 18510 | kv cache rm [2048, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.153523
slot update_slots: id 0 | task 18510 | kv cache rm [4096, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 6144, n_tokens = 2048, progress = 0.230285
slot update_slots: id 0 | task 18510 | kv cache rm [6144, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 8192, n_tokens = 2048, progress = 0.307046
slot update_slots: id 0 | task 18510 | kv cache rm [8192, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 10240, n_tokens = 2048, progress = 0.383808
slot update_slots: id 0 | task 18510 | kv cache rm [10240, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 12288, n_tokens = 2048, progress = 0.460570
slot update_slots: id 0 | task 18510 | kv cache rm [12288, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 14336, n_tokens = 2048, progress = 0.537331
slot update_slots: id 0 | task 18510 | kv cache rm [14336, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 16384, n_tokens = 2048, progress = 0.614093
slot update_slots: id 0 | task 18510 | kv cache rm [16384, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 18432, n_tokens = 2048, progress = 0.690855
slot update_slots: id 0 | task 18510 | kv cache rm [18432, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 20480, n_tokens = 2048, progress = 0.767616
slot update_slots: id 0 | task 18510 | kv cache rm [20480, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 22528, n_tokens = 2048, progress = 0.844378
slot update_slots: id 0 | task 18510 | kv cache rm [22528, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 24576, n_tokens = 2048, progress = 0.921139
slot update_slots: id 0 | task 18510 | kv cache rm [24576, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 26624, n_tokens = 2048, progress = 0.997901
slot update_slots: id 0 | task 18510 | kv cache rm [26624, end)
slot update_slots: id 0 | task 18510 | prompt processing progress, n_past = 26680, n_tokens = 56, progress = 1.000000
slot update_slots: id 0 | task 18510 | prompt done, n_past = 26680, n_tokens = 56
slot release: id 0 | task 18510 | stop processing: n_past = 30775, truncated = 0