Skip to content

Tags: rpatil524/llama.cpp

Tags

b5902

Toggle b5902's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
model : add Kimi-K2 support (ggml-org#14654)

* Kimi-K2 conversion

* add Kimi_K2  pre type

* Kimi-K2

* Kimi-K2 unicode

* Kimi-K2

* LLAMA_MAX_EXPERTS 384

* fix vocab iteration

* regex space fix

* add kimi-k2 to pre_computed_hashes

* Updated with kimi-k2 get_vocab_base_pre hash

* fix whitespaces

* fix flake errors

* remove more unicode.cpp whitespaces

* change set_vocab() flow

* add moonshotai-Kimi-K2.jinja to /models/templates/

* update moonshotai-Kimi-K2.jinja

* add kimi-k2 chat template

* add kimi-k2

* update NotImplementedError

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* except Exception

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

b5898

Toggle b5898's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cuda: fix build warnings in set-rows.cu (unused variable) (ggml-org#1…

…4687)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

b5897

Toggle b5897's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
sycl: Hotfix for non dnnl codepath (ggml-org#14677)

b5583

Toggle b5583's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: fix warnings in perf logger querypool code (ggml-org#13937)

b5581

Toggle b5581's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
opencl: add `backend_synchronize` (ggml-org#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

b5579

Toggle b5579's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : disable speculative decoding for SWA models (ggml-org#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

b5575

Toggle b5575's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (ggml-org#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b5572

Toggle b5572's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
gguf: fix failure on version == 0 (ggml-org#13956)

b5569

Toggle b5569's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml: check if non-native endian model is being loaded (ggml-org#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: update error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

b5561

Toggle b5561's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
readme : update bindings (ggml-org#13950)