Skip to content

Releases: ggml-org/llama.cpp

b6140

12 Aug 20:38
b049315
Compare
Choose a tag to compare
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

b6139

12 Aug 12:16
f4586ee
Compare
Choose a tag to compare
sycl: Fix and disable more configurations of mul_mat (#15151)

* sycl: Fix and disable more configurations of mul_mat

* Disable more configurations

b6138

12 Aug 10:17
60a7658
Compare
Choose a tag to compare
opencl: allow mixed f16/f32 `add` (#15140)

b6137

12 Aug 10:02
efe3a90
Compare
Choose a tag to compare
CUDA cmake: add `-lineinfo` for easier debug (#15260)

b6136

12 Aug 08:26
bbd57b7
Compare
Choose a tag to compare
CANN: GGML_OP_CPY optimization (#15070)

Signed-off-by: noemotiovon <757486878@qq.com>

b6135

12 Aug 03:01
25ff6f7
Compare
Choose a tag to compare
musa: fix failures in test-backend-ops for mul_mat_id op (#15236)

* musa: fix failures in test-backend-ops for mul_mat_id op

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

b6134

11 Aug 15:25
be48528
Compare
Choose a tag to compare
CANN: Add broadcast for softmax and FA (#15208)

* refactor softmax

* fix fa

* fix mask shape

* format

* add comments

* Remove whitespace

b6133

11 Aug 15:30
cf9e564
Compare
Choose a tag to compare
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode.…

b6132

11 Aug 14:42
fba5c0d
Compare
Choose a tag to compare
chat : hotfix gpt-oss jinja raising an exception (#15243)

* chat : hotfix gpt-oss jinja raising an exception

* fix

b6131

11 Aug 13:06
53d0a12
Compare
Choose a tag to compare
server : allow specifying reasoning_format in HTTP request (#15238)