Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6140
b6139
sycl: Fix and disable more configurations of mul_mat (#15151) * sycl: Fix and disable more configurations of mul_mat * Disable more configurations
b6138
opencl: allow mixed f16/f32 `add` (#15140)
b6137
CUDA cmake: add `-lineinfo` for easier debug (#15260)
b6136
CANN: GGML_OP_CPY optimization (#15070) Signed-off-by: noemotiovon <757486878@qq.com>
b6135
musa: fix failures in test-backend-ops for mul_mat_id op (#15236) * musa: fix failures in test-backend-ops for mul_mat_id op Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b6134
CANN: Add broadcast for softmax and FA (#15208) * refactor softmax * fix fa * fix mask shape * format * add comments * Remove whitespace
b6133
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode.…
b6132
chat : hotfix gpt-oss jinja raising an exception (#15243) * chat : hotfix gpt-oss jinja raising an exception * fix
b6131
server : allow specifying reasoning_format in HTTP request (#15238)