Tags · rpatil524/llama.cpp

b5902

model : add Kimi-K2 support (ggml-org#14654)

* Kimi-K2 conversion

* add Kimi_K2  pre type

* Kimi-K2

* Kimi-K2 unicode

* Kimi-K2

* LLAMA_MAX_EXPERTS 384

* fix vocab iteration

* regex space fix

* add kimi-k2 to pre_computed_hashes

* Updated with kimi-k2 get_vocab_base_pre hash

* fix whitespaces

* fix flake errors

* remove more unicode.cpp whitespaces

* change set_vocab() flow

* add moonshotai-Kimi-K2.jinja to /models/templates/

* update moonshotai-Kimi-K2.jinja

* add kimi-k2 chat template

* add kimi-k2

* update NotImplementedError

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* except Exception

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Jul 15, 2025
4a4f426
zip
tar.gz
Downloads

b5898

cuda: fix build warnings in set-rows.cu (unused variable) (ggml-org#1…

…4687)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Jul 15, 2025
cbc68be
zip
tar.gz
Downloads

b5897

sycl: Hotfix for non dnnl codepath (ggml-org#14677)

Jul 14, 2025
bdca383
zip
tar.gz
Downloads

b5583

vulkan: fix warnings in perf logger querypool code (ggml-org#13937)

Jun 3, 2025
7e00e60
zip
tar.gz
Downloads

b5581

opencl: add `backend_synchronize` (ggml-org#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

Jun 2, 2025
71e74a3
zip
tar.gz
Downloads

b5579

server : disable speculative decoding for SWA models (ggml-org#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

Jun 2, 2025
3637576
zip
tar.gz
Downloads

b5575

mtmd : fix memory leak in mtmd_helper_eval_chunk_single (ggml-org#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Jun 2, 2025
bfd3227
zip
tar.gz
Downloads

b5572

gguf: fix failure on version == 0 (ggml-org#13956)

Jun 1, 2025
7675c55
zip
tar.gz
Downloads

b5569

ggml: check if non-native endian model is being loaded (ggml-org#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: update error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Jun 1, 2025
e57bb87
zip
tar.gz
Downloads

b5561

readme : update bindings (ggml-org#13950)

Jun 1, 2025
8726392
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b5902

b5898

b5897

b5583

b5581

b5579

b5575

b5572

b5569

b5561

Tags: rpatil524/llama.cpp