Description
Name and Version
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
llama.cpp-b4702
llama.cpp-b4751
llama.cpp-b4756 **************
llama.cpp-b4759 **************
llama.cpp-b4761
llama.cpp-b4762
llama.cpp-b4769
llama.cpp-b4775
llama.cpp-b4800
llama.cpp-b4900
llama.cpp-b4940
llama.cpp-b4990
llama.cpp-b5026
llama.cpp-b5030
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server, llama-cli
Command line
./build/bin/llama-server -m /data/qwq-32b-q8_0-00001-of-00009.gguf -fa -s 3047 --temp 0.6 --top-p 0.95 -ngl 100 --host 0.0.0.0 -c 131072
Problem description & steps to reproduce
Testing with Different Versions of the llama.cpp Server for the Same Inference Task
Using two versions of the llama.cpp server to address the same problem:
llama.cpp-b4756
llama.cpp-b4759
Both versions employ identical parameters and models, yet exhibit significant performance differences.
Key observations:
Performance degradation:
b4759 is noticeably less capable than b4756 (performing worse than twice as poorly in some cases).
Token consumption for the same task:
b4756: ~3,000 tokens
b4759: ~6,000 tokens
Version comparison:
b4702 (an older version) shows superior performance compared to b4756.
The test problem used:
Can you help me decrypt this cipher I received?
"K nkmg rncakpi hqqvdcnn."
This behavior is reproducible through multiple tests. After extensive testing, version b4759 was identified as the one with drastically degraded performance.
If you can reproduce similar findings, please share your test cases!
First Bad Commit
No response