Releases · dumpmemory/llama.cpp

16 Jul 09:20

5cae766

b5907 Latest

Latest

scripts: synthetic prompt mode for server-bench.py (#14695)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6
373 MB 2025-07-16T09:20:48Z
llama-b5907-bin-macos-arm64.zip

sha256:381d7cb8243837178a7ad2c174677c67162cba45c3e30992224d7ff55202e426
10.6 MB 2025-07-16T09:21:00Z
llama-b5907-bin-macos-x64.zip

sha256:8bf7372dae0c17d3e669750db3ceb2c4e028f158929e128c13fc7b5e408666d5
26.4 MB 2025-07-16T09:21:01Z
llama-b5907-bin-ubuntu-vulkan-x64.zip

sha256:4e94d33a64d3a76eb1d92239d79c6c0bea9e00e62912b626d127909693a4a45b
20.8 MB 2025-07-16T09:21:02Z
llama-b5907-bin-ubuntu-x64.zip

sha256:f67b8a85c85e12e4b0a754d810c7075ba15486d29f0669cfe11cd0ea496e96e1
12.4 MB 2025-07-16T09:21:04Z
llama-b5907-bin-win-cpu-arm64.zip

sha256:a6e086163e791cd8f2f09abda96d94aede3d8be1a08ebe7c71173b44625c9f15
10.8 MB 2025-07-16T09:21:05Z
llama-b5907-bin-win-cpu-x64.zip

sha256:c39880a188b4224a36b317bae34502bc81e7a4ab85726c4e3bdc14d4e2b98441
13.7 MB 2025-07-16T09:21:05Z
llama-b5907-bin-win-cuda-12.4-x64.zip

sha256:e19a11e9df459484640682fe83374d07a84ed8f954d56924e17d104670f79290
129 MB 2025-07-16T09:21:07Z
llama-b5907-bin-win-hip-radeon-x64.zip

sha256:1db26d0c6ddcc352d209f5f9e6fd861014f27517190f3098bbf455dc89ec6bcc
298 MB 2025-07-16T09:21:11Z
llama-b5907-bin-win-opencl-adreno-arm64.zip

sha256:518e2ece213a2e8c0ea13fd603bdf7b2ed085b5f02793e3a6e0dd93b74e954f9
11.2 MB 2025-07-16T09:21:20Z
Source code (zip)

2025-07-16T07:33:28Z
Source code (tar.gz)

2025-07-16T07:33:28Z

16 Jul 04:05

github-actions

b5903

c81f419

b5903

gguf-py : dump bpw per layer and model in markdown mode (#14703)

Assets 15

15 Jul 23:11

github-actions

b5900

10a0351

b5900

vulkan: add RTE variants for glu/add/sub/mul/div (#14653)

Assets 15

15 Jul 09:34

github-actions

b5898

cbc68be

b5898

cuda: fix build warnings in set-rows.cu (unused variable) (#14687)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Assets 15

04 Jun 08:50

github-actions

b5585

0b4be4c

b5585

CUDA: fix FTZ in FA for Gemma 3 (#13991)

Assets 18

03 Jun 02:50

github-actions

b5581

71e74a3

b5581

opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

Assets 18

02 Jun 21:11

github-actions

b5579

3637576

b5579

server : disable speculative decoding for SWA models (#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

Assets 18

02 Jun 14:25

github-actions

b5574

093e3f1

b5574

cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13…

Assets 18

01 Jun 15:15

github-actions

b5568

f3a4b16

b5568

sync : ggml

ggml-ci

Assets 18

01 Jun 03:58

github-actions

b5558

053b153

b5558

threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: dumpmemory/llama.cpp

b5907

Uh oh!

b5903

Uh oh!

b5900

Uh oh!

b5898

Uh oh!

b5585

Uh oh!

b5581

Uh oh!

b5579

Uh oh!

b5574

Uh oh!

b5568

Uh oh!

b5558

Uh oh!