Tags: gitgat/llama.cpp
Tags
grammar : revert the replacement of llama_token_to_piece with id_to_t… …oken (ggml-org#4396)
sync : ggml (new ops, tests, backend, etc.) (ggml-org#4359) * sync : ggml (part 1) * sync : ggml (part 2, CUDA) * sync : ggml (part 3, Metal) * ggml : build fixes ggml-ci * cuda : restore lost changes * cuda : restore lost changes (StableLM rope) * cmake : enable separable compilation for CUDA ggml-ci * ggml-cuda : remove device side dequantize * Revert "cmake : enable separable compilation for CUDA" This reverts commit 09e35d0. * cuda : remove assert for rope * tests : add test-backend-ops * ggml : fix bug in ggml_concat * ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()` * ci : try to fix macOS * ggml-backend : remove backend self-registration * ci : disable Metal for macOS cmake build ggml-ci * metal : fix "supports family" call * metal : fix assert * metal : print resource path ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>
llama : per-layer KV cache + quantum K cache (ggml-org#4309) * per-layer KV * remove unnecessary copies * less code duplication, offload k and v separately * llama : offload KV cache per-layer * llama : offload K shift tensors * llama : offload for rest of the model arches * llama : enable offload debug temporarily * llama : keep the KV related layers on the device * llama : remove mirrors, perform Device -> Host when partial offload * common : add command-line arg to disable KV cache offloading * llama : update session save/load * llama : support quantum K cache (ggml-org#4312) * llama : support quantum K cache (wip) * metal : add F32 -> Q8_0 copy kernel * cuda : add F32 -> Q8_0 copy kernel ggml-ci * cuda : use mmv kernel for quantum cache ops * llama : pass KV cache type through API * llama : fix build ggml-ci * metal : add F32 -> Q4_0 copy kernel * metal : add F32 -> Q4_1 copy kernel * cuda : wip * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels * llama-bench : support type_k/type_v * metal : use mm kernel only for quantum KV cache * cuda : add comment * llama : remove memory_f16 and kv_f16 flags --------- Co-authored-by: slaren <slarengh@gmail.com> * readme : add API change notice --------- Co-authored-by: slaren <slarengh@gmail.com>
train : fix ggml-org#4227 (double free in examples/train-text-from-sc… …ratch/train-text-from-scratch.cpp) (ggml-org#4351) On commit b1108 (44c117f) xaedes added ggml_allocr * alloc = NULL; ... (many lines in between) if (alloc) { ggml_allocr_free(alloc); } Which is correct, but it's easy to lose context after many lines in between. On commit b1287 (0e76a899) xaedes made a big change. From here on, alloc is freed eagerly. alloc = ggml_allocr_new(...) ... (short lines of code) ggml_allocr_free(alloc) This happens a few times, but alloc is never set to NULL, and many lines below, we still have if (alloc) { ggml_allocr_free(alloc); } which causes a double-free.
server : recognize cache_prompt parameter in OAI API (ggml-org#4347)
speculative : support `--color` (ggml-org#4343) * speculative: add some colors * minor : add braces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
grammar : pre-computed pieces + reserve mem + less string copies (ggm… …l-org#4330) * reserve space for codepoints * improvement for the appended 0 * used precomputed token text for grammar sample * reserve canidates_decoded * reserve canidates_grammar * remove candidates_decoded * Revert "remove candidates_decoded" This reverts commit 3773328. * changed decode_utf8 to take src by ref
llama : allow overriding GGUF metadata when loading model (ggml-org#4092 ) * feat: Allow overriding GGUF metadata when loading model * Fix the one time GCC is stricter than clang about something * Step1 * Refactor... basically everything! * Nuke obsolete GetArrayLen struct * simplify std::string specialization * Various cleanups Add informational output when overrides are applied Warn user when an override with the wrong type is specified * Fix broken logic for parsing bool KV overrides Fix issue where overrides didn't apply when key missing in GGUF metadata Resolve merge changes * llama : rearrange model params * Update new GET_KEY call Add note that metadata KV overrides aren't reflected in initial metadata KV info dump --------- Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
sampling : custom samplers order (ggml-org#4285) * Samplers sequence order w parameter * Cleaned commented code * Fixed formatting * Rewrote with unordered_map * Revert and rewrite, too many problems and safeguards would be needed * Fixed code style * Code style fixes according to review * More readable samplers input string, fixed help * Style fix in sampler_queue * Formatting fixes * Fixing whitespaces
PreviousNext