Skip to content

Tags: ggml-org/llama.cpp

Tags

b5590

Toggle b5590's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)

* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position

b5589

Toggle b5589's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
kv-cache : refactor the update/defrag mechanism (#13988)

* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci

b5588

Toggle b5588's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)

b5587

Toggle b5587's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
releases : use dl backend for linux release, remove arm64 linux relea…

…se (#13996)

b5586

Toggle b5586's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-graph : use ggml_repeat_4d (#13998)

b5585

Toggle b5585's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: fix FTZ in FA for Gemma 3 (#13991)

b5584

Toggle b5584's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)

ggml-ci

b5581

Toggle b5581's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

b5580

Toggle b5580's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)

* add concat, pad, repeat, tsembd, tanh, upscale

* small fixes

b5579

Toggle b5579's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : disable speculative decoding for SWA models (#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models