Skip to content

Tags: gitgat/llama.cpp

Tags

b1621

Toggle b1621's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
grammar : revert the replacement of llama_token_to_piece with id_to_t…

…oken (ggml-org#4396)

b1620

Toggle b1620's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
sync : ggml (new ops, tests, backend, etc.) (ggml-org#4359)

* sync : ggml (part 1)

* sync : ggml (part 2, CUDA)

* sync : ggml (part 3, Metal)

* ggml : build fixes

ggml-ci

* cuda : restore lost changes

* cuda : restore lost changes (StableLM rope)

* cmake : enable separable compilation for CUDA

ggml-ci

* ggml-cuda : remove device side dequantize

* Revert "cmake : enable separable compilation for CUDA"

This reverts commit 09e35d0.

* cuda : remove assert for rope

* tests : add test-backend-ops

* ggml : fix bug in ggml_concat

* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`

* ci : try to fix macOS

* ggml-backend : remove backend self-registration

* ci : disable Metal for macOS cmake build

ggml-ci

* metal : fix "supports family" call

* metal : fix assert

* metal : print resource path

ggml-ci

---------

Co-authored-by: slaren <slarengh@gmail.com>

b1619

Toggle b1619's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : per-layer KV cache + quantum K cache (ggml-org#4309)

* per-layer KV

* remove unnecessary copies

* less code duplication, offload k and v separately

* llama : offload KV cache per-layer

* llama : offload K shift tensors

* llama : offload for rest of the model arches

* llama : enable offload debug temporarily

* llama : keep the KV related layers on the device

* llama : remove mirrors, perform Device -> Host when partial offload

* common : add command-line arg to disable KV cache offloading

* llama : update session save/load

* llama : support quantum K cache (ggml-org#4312)

* llama : support quantum K cache (wip)

* metal : add F32 -> Q8_0 copy kernel

* cuda : add F32 -> Q8_0 copy kernel

ggml-ci

* cuda : use mmv kernel for quantum cache ops

* llama : pass KV cache type through API

* llama : fix build

ggml-ci

* metal : add F32 -> Q4_0 copy kernel

* metal : add F32 -> Q4_1 copy kernel

* cuda : wip

* cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

* llama-bench : support type_k/type_v

* metal : use mm kernel only for quantum KV cache

* cuda : add comment

* llama : remove memory_f16 and kv_f16 flags

---------

Co-authored-by: slaren <slarengh@gmail.com>

* readme : add API change notice

---------

Co-authored-by: slaren <slarengh@gmail.com>

b1618

Toggle b1618's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
train : fix ggml-org#4227 (double free in examples/train-text-from-sc…

…ratch/train-text-from-scratch.cpp) (ggml-org#4351)

On commit b1108 (44c117f) xaedes added

    ggml_allocr * alloc = NULL;

    ... (many lines in between)

    if (alloc) {
        ggml_allocr_free(alloc);
    }

Which is correct, but it's easy to lose context after many lines in between.

On commit b1287 (0e76a899) xaedes made a big change. From here on, alloc is freed eagerly.

    alloc = ggml_allocr_new(...)
    ... (short lines of code)
    ggml_allocr_free(alloc)

This happens a few times, but alloc is never set to NULL, and many lines below,
we still have

    if (alloc) {
        ggml_allocr_free(alloc);
    }

which causes a double-free.

b1617

Toggle b1617's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
server : recognize cache_prompt parameter in OAI API (ggml-org#4347)

b1616

Toggle b1616's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
common : fix compile warning

b1615

Toggle b1615's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
speculative : support `--color` (ggml-org#4343)

* speculative: add some colors

* minor : add braces

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b1614

Toggle b1614's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
grammar : pre-computed pieces + reserve mem + less string copies (ggm…

…l-org#4330)

* reserve space for codepoints

* improvement for the appended 0

* used precomputed token text for grammar sample

* reserve canidates_decoded

* reserve canidates_grammar

* remove candidates_decoded

* Revert "remove candidates_decoded"

This reverts commit 3773328.

* changed decode_utf8 to take src by ref

b1613

Toggle b1613's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : allow overriding GGUF metadata when loading model (ggml-org#4092

)

* feat: Allow overriding GGUF metadata when loading model

* Fix the one time GCC is stricter than clang about something

* Step1

* Refactor... basically everything!

* Nuke obsolete GetArrayLen struct

* simplify std::string specialization

* Various cleanups

Add informational output when overrides are applied

Warn user when an override with the wrong type is specified

* Fix broken logic for parsing bool KV overrides
Fix issue where overrides didn't apply when key missing in GGUF metadata
Resolve merge changes

* llama : rearrange model params

* Update new GET_KEY call

Add note that metadata KV overrides aren't reflected in initial metadata KV info dump

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b1612

Toggle b1612's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
sampling : custom samplers order (ggml-org#4285)

* Samplers sequence order w parameter

* Cleaned commented code

* Fixed formatting

* Rewrote with unordered_map

* Revert and rewrite, too many problems and safeguards would be needed

* Fixed code style

* Code style fixes according to review

* More readable samplers input string, fixed help

* Style fix in sampler_queue

* Formatting fixes

* Fixing whitespaces