-
Notifications
You must be signed in to change notification settings - Fork 12k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
17 Releases published by 1 person
-
b5578
published
Jun 2, 2025 -
b5579
published
Jun 2, 2025 -
b5580
published
Jun 3, 2025 -
b5581
published
Jun 3, 2025 -
b5584
published
Jun 4, 2025 -
b5585
published
Jun 4, 2025 -
b5586
published
Jun 4, 2025 -
b5587
published
Jun 4, 2025 -
b5588
published
Jun 4, 2025 -
b5589
published
Jun 4, 2025 -
b5590
published
Jun 4, 2025 -
b5591
published
Jun 5, 2025 -
b5592
published
Jun 5, 2025 -
b5593
published
Jun 5, 2025 -
b5595
published
Jun 5, 2025 -
b5596
published
Jun 5, 2025 -
b5598
published
Jun 5, 2025
20 Pull requests merged by 12 people
-
gguf-py : add add_classifier_output_labels method to writer
#14031 merged
Jun 5, 2025 -
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs
#14001 merged
Jun 5, 2025 -
Fix CUDA build failure on AutoDL cloud platforms
#14005 merged
Jun 5, 2025 -
memory : migrate from llama_kv_cache to more generic llama_memory
#14006 merged
Jun 5, 2025 -
llama : allow using mmap without PrefetchVirtualMemory
#14013 merged
Jun 5, 2025 -
chore: added badge and link to release
#13938 merged
Jun 5, 2025 -
vocab : warn about missing mask token
#14022 merged
Jun 5, 2025 -
context : fix pos_min initialization upon decode error
#14008 merged
Jun 5, 2025 -
vulkan: automatically deduce size of push constants
#13936 merged
Jun 5, 2025 -
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
#13813 merged
Jun 4, 2025 -
kv-cache : refactor the update/defrag mechanism
#13988 merged
Jun 4, 2025 -
ci : remove cuda 11.7 releases, switch runner to windows 2022
#13997 merged
Jun 4, 2025 -
releases : use dl backend for linux release, remove arm64 linux release
#13996 merged
Jun 4, 2025 -
llama-graph : use ggml_repeat_4d
#13998 merged
Jun 4, 2025 -
CUDA: fix FTZ in FA for Gemma 3
#13991 merged
Jun 4, 2025 -
kv-cache : fix unified::seq_rm to work with seq_id < 0
#13985 merged
Jun 4, 2025 -
vulkan: fix warnings in perf logger querypool code
#13937 merged
Jun 3, 2025 -
docs : add "Quick start" section for new users
#13862 merged
Jun 3, 2025 -
opencl: add
backend_synchronize
#13939 merged
Jun 2, 2025 -
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat
#13840 merged
Jun 2, 2025
17 Pull requests opened by 14 people
-
Hybrid recurrent cache
#13979 opened
Jun 2, 2025 -
llama : allow building all tests on windows when not using shared libs
#13980 opened
Jun 2, 2025 -
chore(server): split context-server to its own file
#13987 opened
Jun 3, 2025 -
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 opened
Jun 4, 2025 -
opencl: preliminary support for Q4_0 mul_mat_id using matvec
#14003 opened
Jun 4, 2025 -
llama-chat : Do not throw when tool parsing fails
#14012 opened
Jun 4, 2025 -
llama: Attempt to add ModernBert
#14014 opened
Jun 4, 2025 -
server: Enable mtmd in llama-server `/completion` endpoint
#14016 opened
Jun 4, 2025 -
tests : add test-tokenizers-repo
#14017 opened
Jun 4, 2025 -
ggml-cpu: fix uncaught underscore terminators for s390x
#14023 opened
Jun 5, 2025 -
llama : support qwen3 rerank and embeddings
#14029 opened
Jun 5, 2025 -
llama : deprecate llama_kv_self_ API
#14030 opened
Jun 5, 2025 -
cpu: Update RISC-V condition to require GCC version 14 or higher
#14032 opened
Jun 5, 2025 -
cuda : fix device sync on buffer clear
#14033 opened
Jun 5, 2025 -
sycl: Adding additional cpy dbg print output
#14034 opened
Jun 5, 2025 -
llama : add thread safety test
#14035 opened
Jun 5, 2025 -
ggml-cpu: optimise assembly calls for hsum on s390x
#14037 opened
Jun 5, 2025
14 Issues closed by 7 people
-
Eval bug: std::runtime_error Invalid diff:
#13876 closed
Jun 5, 2025 -
Compile bug: Race condition during compilation, compilation works with -j 1 but not with -j 8
#13993 closed
Jun 5, 2025 -
Bug: MinGW build fails to load models with "error loading model: PrefetchVirtualMemory unavailable"
#9311 closed
Jun 5, 2025 -
Eval bug: llama-server -hf nomic-ai/nomic-embed-text-v2-moe-GGUF --embeddings , broken on latest version
#14021 closed
Jun 5, 2025 -
Compile bug: Prooted Debian in Droid Termux only
#12452 closed
Jun 5, 2025 -
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 closed
Jun 5, 2025 -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 closed
Jun 5, 2025 -
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 closed
Jun 5, 2025 -
Misc. bug: new kv cell seq implementation does not handle "seq_id = -1" specified in the API
#13983 closed
Jun 4, 2025 -
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 closed
Jun 4, 2025 -
Perplexity script for non GGUF quantization
#13015 closed
Jun 4, 2025 -
Eval bug: RWKV inference issue with llama-server
#13018 closed
Jun 4, 2025 -
Container images in GHCR registry, are not multi arch
#13995 closed
Jun 3, 2025 -
Misc. bug: llama-server didn't display thought process since b5576
#13981 closed
Jun 3, 2025
14 Issues opened by 12 people
-
Misc bug: Mistral NeMo Instruct 2407 - Failed to infer a tool call example (possible template bug)
#14038 opened
Jun 5, 2025 -
Feature Request: add a new repo for convertion of gguf
#14027 opened
Jun 5, 2025 -
Feature Request: support FP8 data type in llama.cpp
#14020 opened
Jun 5, 2025 -
Misc. bug: "error: invalid argument: /bin/sh" when using Docker image
#14019 opened
Jun 5, 2025 -
llama.cpp error when using the snowflake-arctic-embed-v2 model
#14018 opened
Jun 4, 2025 -
Feature Request: Support Llama-Nemotron-Nano-VL-8B-V1
#14015 opened
Jun 4, 2025 -
Compile bug: numerous deprecation warnings when compiling in Termux
#14011 opened
Jun 4, 2025 -
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 opened
Jun 4, 2025 -
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 opened
Jun 3, 2025 -
Compile bug:
#13992 opened
Jun 3, 2025 -
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 opened
Jun 3, 2025 -
Feature Request:
#13989 opened
Jun 3, 2025 -
Misc. bug: sentencepiece not included in requirements.txt
#13982 opened
Jun 3, 2025
40 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
sycl: Add reorder to Q6_K mmvq implementation
#13885 commented on
Jun 5, 2025 • 17 new comments -
finetune.cpp command-line arg
#13873 commented on
Jun 5, 2025 • 13 new comments -
ggml-cpu : split arch-specific implementations
#13892 commented on
Jun 5, 2025 • 7 new comments -
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 commented on
Jun 4, 2025 • 4 new comments -
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on
Jun 5, 2025 • 3 new comments -
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 commented on
Jun 5, 2025 • 3 new comments -
llama : support multiple classifier outputs and labels
#13940 commented on
Jun 5, 2025 • 3 new comments -
SYCL: Implement few same quantized type copy kernels
#13739 commented on
Jun 4, 2025 • 1 new comment -
Need to undefine "hz" on AIX
#13894 commented on
Jun 4, 2025 • 0 new comments -
remove WIP since PR has been merged
#13912 commented on
Jun 4, 2025 • 0 new comments -
[Ascend NPU] Enable labeler
#13914 commented on
Jun 4, 2025 • 0 new comments -
musa: enable fp16 mma (all) and cublas on qy2
#13842 commented on
Jun 4, 2025 • 0 new comments -
ggml : add ggml_fill()
#13772 commented on
Jun 4, 2025 • 0 new comments -
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 commented on
Jun 3, 2025 • 0 new comments -
Granite Four
#13550 commented on
Jun 4, 2025 • 0 new comments -
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on
Jun 5, 2025 • 0 new comments -
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on
Jun 3, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jun 5, 2025 • 0 new comments -
ci: add LoongArch cross-compile build
#13944 commented on
Jun 4, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Jun 3, 2025 • 0 new comments -
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 commented on
Jun 2, 2025 • 0 new comments -
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 commented on
Jun 3, 2025 • 0 new comments -
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 commented on
Jun 3, 2025 • 0 new comments -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on
Jun 3, 2025 • 0 new comments -
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on
Jun 3, 2025 • 0 new comments -
Misc. bug: Compilation with openCL on latest build
#13300 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: SIGILL
#13161 commented on
Jun 4, 2025 • 0 new comments -
Compile bug: Canot convert from char8_t to char* in llama-chat.cpp
#12740 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: -TS doesn't support more than ? Devices
#13293 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Custom model error.
#13318 commented on
Jun 5, 2025 • 0 new comments -
Eval bug: Qwen3 30B A3B is slow with CUDA
#13211 commented on
Jun 5, 2025 • 0 new comments -
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on
Jun 5, 2025 • 0 new comments -
Feature Request: s390x CI
#13243 commented on
Jun 5, 2025 • 0 new comments -
Eval bug: Cannot load Qwen3 ranking models
#13820 commented on
Jun 5, 2025 • 0 new comments -
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on
Jun 5, 2025 • 0 new comments -
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on
Jun 5, 2025 • 0 new comments