Pulse · ggml-org/llama.cpp · GitHub

June 2, 2025 – June 5, 2025

Overview

37 Active pull requests

28 Active issues

17 Releases published by 1 person

b5578
published Jun 2, 2025
b5579
published Jun 2, 2025
b5580
published Jun 3, 2025
b5581
published Jun 3, 2025
b5584
published Jun 4, 2025
b5585
published Jun 4, 2025
b5586
published Jun 4, 2025
b5587
published Jun 4, 2025
b5588
published Jun 4, 2025
b5589
published Jun 4, 2025
b5590
published Jun 4, 2025
b5591
published Jun 5, 2025
b5592
published Jun 5, 2025
b5593
published Jun 5, 2025
b5595
published Jun 5, 2025
b5596
published Jun 5, 2025
b5598
published Jun 5, 2025

20 Pull requests merged by 12 people

gguf-py : add add_classifier_output_labels method to writer
#14031 merged Jun 5, 2025
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs
#14001 merged Jun 5, 2025
Fix CUDA build failure on AutoDL cloud platforms
#14005 merged Jun 5, 2025
memory : migrate from llama_kv_cache to more generic llama_memory
#14006 merged Jun 5, 2025
llama : allow using mmap without PrefetchVirtualMemory
#14013 merged Jun 5, 2025
chore: added badge and link to release
#13938 merged Jun 5, 2025
vocab : warn about missing mask token
#14022 merged Jun 5, 2025
context : fix pos_min initialization upon decode error
#14008 merged Jun 5, 2025
vulkan: automatically deduce size of push constants
#13936 merged Jun 5, 2025
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
#13813 merged Jun 4, 2025
kv-cache : refactor the update/defrag mechanism
#13988 merged Jun 4, 2025
ci : remove cuda 11.7 releases, switch runner to windows 2022
#13997 merged Jun 4, 2025
releases : use dl backend for linux release, remove arm64 linux release
#13996 merged Jun 4, 2025
llama-graph : use ggml_repeat_4d
#13998 merged Jun 4, 2025
CUDA: fix FTZ in FA for Gemma 3
#13991 merged Jun 4, 2025
kv-cache : fix unified::seq_rm to work with seq_id < 0
#13985 merged Jun 4, 2025
vulkan: fix warnings in perf logger querypool code
#13937 merged Jun 3, 2025
docs : add "Quick start" section for new users
#13862 merged Jun 3, 2025
opencl: add backend_synchronize
#13939 merged Jun 2, 2025
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat
#13840 merged Jun 2, 2025

17 Pull requests opened by 14 people

Hybrid recurrent cache
#13979 opened Jun 2, 2025
llama : allow building all tests on windows when not using shared libs
#13980 opened Jun 2, 2025
chore(server): split context-server to its own file
#13987 opened Jun 3, 2025
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 opened Jun 4, 2025
opencl: preliminary support for Q4_0 mul_mat_id using matvec
#14003 opened Jun 4, 2025
llama-chat : Do not throw when tool parsing fails
#14012 opened Jun 4, 2025
llama: Attempt to add ModernBert
#14014 opened Jun 4, 2025
server: Enable mtmd in llama-server `/completion` endpoint
#14016 opened Jun 4, 2025
tests : add test-tokenizers-repo
#14017 opened Jun 4, 2025
ggml-cpu: fix uncaught underscore terminators for s390x
#14023 opened Jun 5, 2025
llama : support qwen3 rerank and embeddings
#14029 opened Jun 5, 2025
llama : deprecate llama_kv_self_ API
#14030 opened Jun 5, 2025
cpu: Update RISC-V condition to require GCC version 14 or higher
#14032 opened Jun 5, 2025
cuda : fix device sync on buffer clear
#14033 opened Jun 5, 2025
sycl: Adding additional cpy dbg print output
#14034 opened Jun 5, 2025
llama : add thread safety test
#14035 opened Jun 5, 2025
ggml-cpu: optimise assembly calls for hsum on s390x
#14037 opened Jun 5, 2025

14 Issues closed by 7 people

Eval bug: std::runtime_error Invalid diff:
#13876 closed Jun 5, 2025
Compile bug: Race condition during compilation, compilation works with -j 1 but not with -j 8
#13993 closed Jun 5, 2025
Bug: MinGW build fails to load models with "error loading model: PrefetchVirtualMemory unavailable"
#9311 closed Jun 5, 2025
Eval bug: llama-server -hf nomic-ai/nomic-embed-text-v2-moe-GGUF --embeddings , broken on latest version
#14021 closed Jun 5, 2025
Compile bug: Prooted Debian in Droid Termux only
#12452 closed Jun 5, 2025
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 closed Jun 5, 2025
Feature Request: Ability to pack multiple GGUFs into single one
#13028 closed Jun 5, 2025
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 closed Jun 5, 2025
Misc. bug: new kv cell seq implementation does not handle "seq_id = -1" specified in the API
#13983 closed Jun 4, 2025
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 closed Jun 4, 2025
Perplexity script for non GGUF quantization
#13015 closed Jun 4, 2025
Eval bug: RWKV inference issue with llama-server
#13018 closed Jun 4, 2025
Container images in GHCR registry, are not multi arch
#13995 closed Jun 3, 2025
Misc. bug: llama-server didn't display thought process since b5576
#13981 closed Jun 3, 2025

14 Issues opened by 12 people

Misc bug: Mistral NeMo Instruct 2407 - Failed to infer a tool call example (possible template bug)
#14038 opened Jun 5, 2025
Misc. minor bug: llama-server: model parameters labels in webui settings are not shown in android chrome browser when not set in "desktop site" mode
#14036 opened Jun 5, 2025
Feature Request: add a new repo for convertion of gguf
#14027 opened Jun 5, 2025
Feature Request: support FP8 data type in llama.cpp
#14020 opened Jun 5, 2025
Misc. bug: "error: invalid argument: /bin/sh" when using Docker image
#14019 opened Jun 5, 2025
llama.cpp error when using the snowflake-arctic-embed-v2 model
#14018 opened Jun 4, 2025
Feature Request: Support Llama-Nemotron-Nano-VL-8B-V1
#14015 opened Jun 4, 2025
Compile bug: numerous deprecation warnings when compiling in Termux
#14011 opened Jun 4, 2025
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 opened Jun 4, 2025
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 opened Jun 3, 2025
Compile bug:
#13992 opened Jun 3, 2025
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 opened Jun 3, 2025
Feature Request:
#13989 opened Jun 3, 2025
Misc. bug: sentencepiece not included in requirements.txt
#13982 opened Jun 3, 2025

40 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

sycl: Add reorder to Q6_K mmvq implementation
#13885 commented on Jun 5, 2025 • 17 new comments
finetune.cpp command-line arg
#13873 commented on Jun 5, 2025 • 13 new comments
ggml-cpu : split arch-specific implementations
#13892 commented on Jun 5, 2025 • 7 new comments
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 commented on Jun 4, 2025 • 4 new comments
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on Jun 5, 2025 • 3 new comments
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 commented on Jun 5, 2025 • 3 new comments
llama : support multiple classifier outputs and labels
#13940 commented on Jun 5, 2025 • 3 new comments
SYCL: Implement few same quantized type copy kernels
#13739 commented on Jun 4, 2025 • 1 new comment
Need to undefine "hz" on AIX
#13894 commented on Jun 4, 2025 • 0 new comments
remove WIP since PR has been merged
#13912 commented on Jun 4, 2025 • 0 new comments
[Ascend NPU] Enable labeler
#13914 commented on Jun 4, 2025 • 0 new comments
musa: enable fp16 mma (all) and cublas on qy2
#13842 commented on Jun 4, 2025 • 0 new comments
ggml : add ggml_fill()
#13772 commented on Jun 4, 2025 • 0 new comments
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 commented on Jun 3, 2025 • 0 new comments
Granite Four
#13550 commented on Jun 4, 2025 • 0 new comments
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on Jun 5, 2025 • 0 new comments
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on Jun 3, 2025 • 0 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on Jun 5, 2025 • 0 new comments
ci: add LoongArch cross-compile build
#13944 commented on Jun 4, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Jun 3, 2025 • 0 new comments
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 commented on Jun 2, 2025 • 0 new comments
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 commented on Jun 3, 2025 • 0 new comments
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 commented on Jun 3, 2025 • 0 new comments
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on Jun 3, 2025 • 0 new comments
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on Jun 3, 2025 • 0 new comments
Misc. bug: Compilation with openCL on latest build
#13300 commented on Jun 4, 2025 • 0 new comments
Eval bug: SIGILL
#13161 commented on Jun 4, 2025 • 0 new comments
Compile bug: Canot convert from char8_t to char* in llama-chat.cpp
#12740 commented on Jun 4, 2025 • 0 new comments
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on Jun 4, 2025 • 0 new comments
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on Jun 4, 2025 • 0 new comments
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 commented on Jun 4, 2025 • 0 new comments
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 commented on Jun 4, 2025 • 0 new comments
Misc. bug: -TS doesn't support more than ? Devices
#13293 commented on Jun 4, 2025 • 0 new comments
Eval bug: Custom model error.
#13318 commented on Jun 5, 2025 • 0 new comments
Eval bug: Qwen3 30B A3B is slow with CUDA
#13211 commented on Jun 5, 2025 • 0 new comments
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on Jun 5, 2025 • 0 new comments
Feature Request: s390x CI
#13243 commented on Jun 5, 2025 • 0 new comments
Eval bug: Cannot load Qwen3 ranking models
#13820 commented on Jun 5, 2025 • 0 new comments
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on Jun 5, 2025 • 0 new comments
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on Jun 5, 2025 • 0 new comments