Tags · gitgat/llama.cpp

b1621

grammar : revert the replacement of llama_token_to_piece with id_to_t…

…oken (ggml-org#4396)

Dec 9, 2023
e18f734
zip
tar.gz

b1620

sync : ggml (new ops, tests, backend, etc.) (ggml-org#4359)

* sync : ggml (part 1)

* sync : ggml (part 2, CUDA)

* sync : ggml (part 3, Metal)

* ggml : build fixes

ggml-ci

* cuda : restore lost changes

* cuda : restore lost changes (StableLM rope)

* cmake : enable separable compilation for CUDA

ggml-ci

* ggml-cuda : remove device side dequantize

* Revert "cmake : enable separable compilation for CUDA"

This reverts commit 09e35d0.

* cuda : remove assert for rope

* tests : add test-backend-ops

* ggml : fix bug in ggml_concat

* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`

* ci : try to fix macOS

* ggml-backend : remove backend self-registration

* ci : disable Metal for macOS cmake build

ggml-ci

* metal : fix "supports family" call

* metal : fix assert

* metal : print resource path

ggml-ci

---------

Co-authored-by: slaren <slarengh@gmail.com>

Dec 7, 2023
fe680e3
zip
tar.gz

b1619

llama : per-layer KV cache + quantum K cache (ggml-org#4309)

* per-layer KV

* remove unnecessary copies

* less code duplication, offload k and v separately

* llama : offload KV cache per-layer

* llama : offload K shift tensors

* llama : offload for rest of the model arches

* llama : enable offload debug temporarily

* llama : keep the KV related layers on the device

* llama : remove mirrors, perform Device -> Host when partial offload

* common : add command-line arg to disable KV cache offloading

* llama : update session save/load

* llama : support quantum K cache (ggml-org#4312)

* llama : support quantum K cache (wip)

* metal : add F32 -> Q8_0 copy kernel

* cuda : add F32 -> Q8_0 copy kernel

ggml-ci

* cuda : use mmv kernel for quantum cache ops

* llama : pass KV cache type through API

* llama : fix build

ggml-ci

* metal : add F32 -> Q4_0 copy kernel

* metal : add F32 -> Q4_1 copy kernel

* cuda : wip

* cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

* llama-bench : support type_k/type_v

* metal : use mm kernel only for quantum KV cache

* cuda : add comment

* llama : remove memory_f16 and kv_f16 flags

---------

Co-authored-by: slaren <slarengh@gmail.com>

* readme : add API change notice

---------

Co-authored-by: slaren <slarengh@gmail.com>

Dec 7, 2023
bcc0eb4
zip
tar.gz

b1618

train : fix ggml-org#4227 (double free in examples/train-text-from-sc…

…ratch/train-text-from-scratch.cpp) (ggml-org#4351)

On commit b1108 (44c117f) xaedes added

    ggml_allocr * alloc = NULL;

    ... (many lines in between)

    if (alloc) {
        ggml_allocr_free(alloc);
    }

Which is correct, but it's easy to lose context after many lines in between.

On commit b1287 (0e76a899) xaedes made a big change. From here on, alloc is freed eagerly.

    alloc = ggml_allocr_new(...)
    ... (short lines of code)
    ggml_allocr_free(alloc)

This happens a few times, but alloc is never set to NULL, and many lines below,
we still have

    if (alloc) {
        ggml_allocr_free(alloc);
    }

which causes a double-free.

Dec 7, 2023
81bc921
zip
tar.gz

b1617

server : recognize cache_prompt parameter in OAI API (ggml-org#4347)

Dec 6, 2023
05cd6e5
zip
tar.gz

b1616

common : fix compile warning

Dec 6, 2023
caa9249
zip
tar.gz

b1615

speculative : support `--color` (ggml-org#4343)

* speculative: add some colors

* minor : add braces

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Dec 6, 2023
da5eaef
zip
tar.gz

b1614

grammar : pre-computed pieces + reserve mem + less string copies (ggm…

…l-org#4330)

* reserve space for codepoints

* improvement for the appended 0

* used precomputed token text for grammar sample

* reserve canidates_decoded

* reserve canidates_grammar

* remove candidates_decoded

* Revert "remove candidates_decoded"

This reverts commit 3773328.

* changed decode_utf8 to take src by ref

Dec 5, 2023
5f6e0c0
zip
tar.gz

b1613

llama : allow overriding GGUF metadata when loading model (ggml-org#4092

)

* feat: Allow overriding GGUF metadata when loading model

* Fix the one time GCC is stricter than clang about something

* Step1

* Refactor... basically everything!

* Nuke obsolete GetArrayLen struct

* simplify std::string specialization

* Various cleanups

Add informational output when overrides are applied

Warn user when an override with the wrong type is specified

* Fix broken logic for parsing bool KV overrides
Fix issue where overrides didn't apply when key missing in GGUF metadata
Resolve merge changes

* llama : rearrange model params

* Update new GET_KEY call

Add note that metadata KV overrides aren't reflected in initial metadata KV info dump

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Dec 5, 2023
5aa365d
zip
tar.gz

b1612

sampling : custom samplers order (ggml-org#4285)

* Samplers sequence order w parameter

* Cleaned commented code

* Fixed formatting

* Rewrote with unordered_map

* Revert and rewrite, too many problems and safeguards would be needed

* Fixed code style

* Code style fixes according to review

* More readable samplers input string, fixed help

* Style fix in sampler_queue

* Formatting fixes

* Fixing whitespaces

Dec 5, 2023
52c8bc3
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b1621

b1620

b1619

b1618

b1617

b1616

b1615

b1614

b1613

b1612

Tags: gitgat/llama.cpp