Releases · ggml-org/llama.cpp

12 Aug 20:38

b049315

b6140 Latest

Latest

HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-12T20:38:08Z
llama-b6140-bin-macos-arm64.zip

sha256:718e4eca745d70dc14aa91ae068c1b9b6e6d918aa532e828b748bc085524a27b

10.8 MB 2025-08-12T20:38:19Z
llama-b6140-bin-macos-x64.zip

sha256:7c7c7b008a9a4d790138ac66b248595b05cd74af2eadc13bb4c0f3391f5a367d

27.6 MB 2025-08-12T20:38:20Z
llama-b6140-bin-ubuntu-vulkan-x64.zip

sha256:e2b47729a2caf89983da694f189ee2da778bd92be17e8bc2c0d5acdda5171338

21.5 MB 2025-08-12T20:38:21Z
llama-b6140-bin-ubuntu-x64.zip

sha256:4d5cd13b969845520086f0c014d5e18987b7e3938a2f00c4b3e53f533f355ae6

12.7 MB 2025-08-12T20:38:22Z
llama-b6140-bin-win-cpu-arm64.zip

sha256:21bf1857b2ef88a2f62c30799c377a77260cc8b5f7f6d87d152c1b473b0ed6ff

11 MB 2025-08-12T20:38:23Z
llama-b6140-bin-win-cpu-x64.zip

sha256:71102380e02b1f346e5f1219217373ad2455f6ddceb500ae1f565ad15cb688d0

13.9 MB 2025-08-12T20:38:24Z
llama-b6140-bin-win-cuda-12.4-x64.zip

sha256:7eaded0fa8579f8f697654b2b49b346f22972a581cddb4cf7f9f814635cec767

139 MB 2025-08-12T20:38:26Z
llama-b6140-bin-win-hip-radeon-x64.zip

sha256:fc87cfa2a4795f42a4e18fa80f655bdb94ad708856363764ebcc6b20c5e41333

287 MB 2025-08-12T20:38:30Z
llama-b6140-bin-win-opencl-adreno-arm64.zip

sha256:e61cb10ecf3b1610280cb2281b3b572992401b1bcb4ed2d720bcc843a2b890c8

11.4 MB 2025-08-12T20:38:37Z
Source code (zip)

2025-08-12T20:15:12Z
Source code (tar.gz)

2025-08-12T20:15:12Z

12 Aug 12:16

github-actions

b6139

f4586ee

b6139

sycl: Fix and disable more configurations of mul_mat (#15151)

* sycl: Fix and disable more configurations of mul_mat

* Disable more configurations

Assets 15

12 Aug 10:17

github-actions

b6138

60a7658

b6138

opencl: allow mixed f16/f32 `add` (#15140)

Assets 15

12 Aug 10:02

github-actions

b6137

efe3a90

b6137

CUDA cmake: add `-lineinfo` for easier debug (#15260)

Assets 15

12 Aug 08:26

github-actions

b6136

bbd57b7

b6136

CANN: GGML_OP_CPY optimization (#15070)

Signed-off-by: noemotiovon <757486878@qq.com>

Assets 15

12 Aug 03:01

github-actions

b6135

25ff6f7

b6135

musa: fix failures in test-backend-ops for mul_mat_id op (#15236)

* musa: fix failures in test-backend-ops for mul_mat_id op

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Assets 15

11 Aug 15:25

github-actions

b6134

be48528

b6134

CANN: Add broadcast for softmax and FA (#15208)

* refactor softmax

* fix fa

* fix mask shape

* format

* add comments

* Remove whitespace

Assets 15

11 Aug 15:30

github-actions

b6133

cf9e564

b6133

mtmd : Fix MinicpmV model converter and clip to avoid using hardcode.…

Assets 15

11 Aug 14:42

github-actions

b6132

fba5c0d

b6132

chat : hotfix gpt-oss jinja raising an exception (#15243)

* chat : hotfix gpt-oss jinja raising an exception

* fix

Assets 15

11 Aug 13:06

github-actions

b6131

53d0a12

b6131

server : allow specifying reasoning_format in HTTP request (#15238)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6140

Uh oh!

b6139

Uh oh!

b6138

Uh oh!

b6137

Uh oh!

b6136

Uh oh!

b6135

Uh oh!

b6134

Uh oh!

b6133

Uh oh!

b6132

Uh oh!

b6131

Uh oh!