Releases: dumpmemory/llama.cpp
Releases · dumpmemory/llama.cpp
b5907
b5903
gguf-py : dump bpw per layer and model in markdown mode (#14703)
b5900
vulkan: add RTE variants for glu/add/sub/mul/div (#14653)
b5898
cuda: fix build warnings in set-rows.cu (unused variable) (#14687) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b5585
CUDA: fix FTZ in FA for Gemma 3 (#13991)
b5581
opencl: add `backend_synchronize` (#13939) * This is not needed by the normal use where the result is read using `tensor_get`, but it allows perf mode of `test-backend-ops` to properly measure performance.
b5579
server : disable speculative decoding for SWA models (#13970) * server : use swa-full fo draft context ggml-ci * server : disable speculative decoding for SWA models
b5574
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13…
b5568
sync : ggml ggml-ci
b5558
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…