Skip to content

Releases: ggml-org/llama.cpp

b5590

04 Jun 20:23
0d39844
Compare
Choose a tag to compare
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)

* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position

b5589

04 Jun 16:34
3e63a58
Compare
Choose a tag to compare
kv-cache : refactor the update/defrag mechanism (#13988)

* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci

b5588

04 Jun 14:21
2589ad3
Compare
Choose a tag to compare
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)

b5587

04 Jun 11:53
4825487
Compare
Choose a tag to compare
releases : use dl backend for linux release, remove arm64 linux relea…

b5586

04 Jun 08:27
3ac6753
Compare
Choose a tag to compare
llama-graph : use ggml_repeat_4d (#13998)

b5585

04 Jun 07:50
0b4be4c
Compare
Choose a tag to compare
CUDA: fix FTZ in FA for Gemma 3 (#13991)

b5584

04 Jun 07:45
e0e806f
Compare
Choose a tag to compare
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)

ggml-ci

b5581

03 Jun 00:49
71e74a3
Compare
Choose a tag to compare
opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

b5580

03 Jun 00:38
bfb1e01
Compare
Choose a tag to compare
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)

* add concat, pad, repeat, tsembd, tanh, upscale

* small fixes

b5579

02 Jun 19:30
3637576
Compare
Choose a tag to compare
server : disable speculative decoding for SWA models (#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models