Pulse · ggml-org/llama.cpp · GitHub

July 12, 2025 – July 19, 2025

Overview

83 Active pull requests

72 Active issues

49 Releases published by 1 person

b5874
published Jul 12, 2025
b5875
published Jul 12, 2025
b5876
published Jul 12, 2025
b5880
published Jul 12, 2025
b5882
published Jul 12, 2025
b5884
published Jul 12, 2025
b5886
published Jul 13, 2025
b5887
published Jul 13, 2025
b5888
published Jul 13, 2025
b5889
published Jul 13, 2025
b5890
published Jul 13, 2025
b5891
published Jul 14, 2025
b5892
published Jul 14, 2025
b5893
published Jul 14, 2025
b5894
published Jul 14, 2025
b5895
published Jul 14, 2025
b5896
published Jul 14, 2025
b5897
published Jul 14, 2025
b5898
published Jul 15, 2025
b5899
published Jul 15, 2025
b5900
published Jul 15, 2025
b5901
published Jul 15, 2025
b5902
published Jul 15, 2025
b5904
published Jul 16, 2025
b5908
published Jul 16, 2025
b5909
published Jul 16, 2025
b5910
published Jul 16, 2025
b5911
published Jul 16, 2025
b5912
published Jul 16, 2025
b5913
published Jul 16, 2025
b5914
published Jul 16, 2025
b5915
published Jul 16, 2025
b5916
published Jul 16, 2025
b5919
published Jul 17, 2025
b5920
published Jul 17, 2025
b5921
published Jul 17, 2025
b5922
published Jul 17, 2025
b5923
published Jul 17, 2025
b5924
published Jul 17, 2025
b5927
published Jul 18, 2025
b5928
published Jul 18, 2025
b5929
published Jul 18, 2025
b5930
published Jul 18, 2025
b5932
published Jul 18, 2025
b5933
published Jul 18, 2025
b5934
published Jul 18, 2025
b5935
published Jul 18, 2025
b5936
published Jul 18, 2025
b5937
published Jul 18, 2025

61 Pull requests merged by 26 people

metal : fuse add, mul
#14596 merged Jul 18, 2025
graph : fix graph reuse reset of params
#14760 merged Jul 18, 2025
parallel : add option for different RNG seeds
#14757 merged Jul 18, 2025
Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs
#14741 merged Jul 18, 2025
graph : avoid huge warm-up graphs for MoE models
#14753 merged Jul 18, 2025
model : fix build after merge conflict
#14754 merged Jul 18, 2025
Add EXAONE 4.0 model architecture
#14630 merged Jul 18, 2025
CUDA: set_rows + cpy.cu refactor
#14712 merged Jul 18, 2025
graph : refactor context to not pass gf explicitly
#14629 merged Jul 18, 2025
Move the graph placeholder message to debug mode
#14748 merged Jul 18, 2025
use max work group size for device to replace the magic number
#14732 merged Jul 18, 2025
Fix Ernie4.5 MoE without shared experts
#14746 merged Jul 17, 2025
nix: use optionalAttrs for env mkDerivation attrset argument
#14726 merged Jul 17, 2025
Model: Add support for Ernie 4.5 MoE
#14658 merged Jul 17, 2025
kv-cache : fix k-shift for multiple streams
#14742 merged Jul 17, 2025
llama : reuse compute graphs
#14482 merged Jul 17, 2025
model : fix parallel processing for lfm2
#14705 merged Jul 17, 2025
kv-cache : opt mask set input
#14600 merged Jul 17, 2025
batch : fix uninitialized has_cpl flag
#14733 merged Jul 17, 2025
ci : disable failing vulkan crossbuilds
#14723 merged Jul 16, 2025
convert : make hf token optional
#14717 merged Jul 16, 2025
Fix parameter order issue for hybrid memory initialization
#14725 merged Jul 16, 2025
ggml: Add initial WebGPU backend
#14521 merged Jul 16, 2025
Support Cosyvoice2-0.5B By allowing Qwen2 architecture to have a optional bias tensor
#14711 merged Jul 16, 2025
llama : add high-throughput mode
#14363 merged Jul 16, 2025
Support diffusion models: Add Dream 7B
#14644 merged Jul 16, 2025
ggml : add asserts
#14720 merged Jul 16, 2025
server : pre-calculate EOG logit biases
#14721 merged Jul 16, 2025
Bug: fix inputs to conv1d in mamba layer of plamo2
#14716 merged Jul 16, 2025
server : fix handling of the ignore_eos flag
#14710 merged Jul 16, 2025
scripts: synthetic prompt mode for server-bench.py
#14695 merged Jul 16, 2025
convert : only check for tokenizer folder if we need it
#14704 merged Jul 16, 2025
convert : add pre-computed hashes first to prevent order mishaps
#14701 merged Jul 16, 2025
llama: add LLAMA_API to deprecated llama_kv_self_seq_div
#14708 merged Jul 16, 2025
scripts: add bpw per layer and model
#14703 merged Jul 15, 2025
Model : Add support for Kimi-K2
#14654 merged Jul 15, 2025
vulkan: fix noncontig check for mat_mul_id splitting
#14683 merged Jul 15, 2025
vulkan: add RTE variants for glu/add/sub/mul/div
#14653 merged Jul 15, 2025
model : add PLaMo-2 model
#14560 merged Jul 15, 2025
cuda: fix build warnings in set-rows.cu (unused variable)
#14687 merged Jul 15, 2025
sycl: Hotfix for non dnnl codepath
#14677 merged Jul 14, 2025
PPC:Refactor llamafile_sgemm code
#14673 merged Jul 14, 2025
llama-context: add ability to get logits
#14672 merged Jul 14, 2025
scripts: benchmark for HTTP server throughput
#14668 merged Jul 14, 2025
SYCL: use 1D kernel for set_rows
#14618 merged Jul 14, 2025
sycl: Batched mulmat rework for oneDNN dispatch
#14617 merged Jul 14, 2025
llama : add jinja template for rwkv-world
#14665 merged Jul 13, 2025
quantize: fix minor logic flaw in --tensor-type
#14572 merged Jul 13, 2025
cuda : add set rows for bf16
#14664 merged Jul 13, 2025
Add ELU CUDA support
#14657 merged Jul 13, 2025
ggml : set_rows type coverage
#14661 merged Jul 13, 2025
Add missing unary ops Metal support
#14660 merged Jul 13, 2025
Add CMake presets for Linux and GCC
#14656 merged Jul 13, 2025
test-backend-ops : cover lfm2 cases in test_ssm_conv
#14651 merged Jul 12, 2025
readme : add LFM2 to models section
#14650 merged Jul 12, 2025
CUDA: add set rows for f32 and f16
#14551 merged Jul 12, 2025
sync : ggml
#14648 merged Jul 12, 2025
sync : ggml
#14647 merged Jul 12, 2025
server : fix pooled embedding output
#14645 merged Jul 12, 2025
vulkan: support SET_ROWS
#14587 merged Jul 12, 2025
vulkan: optimizations for deepseek prompt processing
#14555 merged Jul 12, 2025

22 Pull requests opened by 19 people

webui : add a preset feature to the settings
#14649 opened Jul 12, 2025
Add Pad Reflect 1D CUDA support
#14659 opened Jul 13, 2025
bug fix: handle saving/loading null layers in recurrent memory
#14675 opened Jul 14, 2025
kleidiai: add support for get_rows
#14676 opened Jul 14, 2025
Adding a simple-function-call example - hopefully not doing anything wrong
#14682 opened Jul 14, 2025
Fix KleidiAI compilation errors with -DGGML_NATIVE=OFF (issue #14464)
#14700 opened Jul 15, 2025
vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274)
#14707 opened Jul 16, 2025
server: add prompt processing progress streaming for /completion endpoint #14685
#14728 opened Jul 16, 2025
mtmd : Support jinja in libmtmd (Only for QwenVL and Qwen Omni)
#14730 opened Jul 17, 2025
feat: Add optional prompt processing progress streaming
#14731 opened Jul 17, 2025
CUDA: skip masked out KQ slices in mma FA kernel
#14735 opened Jul 17, 2025
Documentation: Update build.md's Vulkan section
#14736 opened Jul 17, 2025
Improve Mistral models integration with llama.cpp
#14737 opened Jul 17, 2025
examples : predicted output for text generation
#14739 opened Jul 17, 2025
metal: SSM_SCAN performance
#14743 opened Jul 17, 2025
[ROCm] Fix HIP version check for HIPBLAS V2 API compatibility
#14744 opened Jul 17, 2025
Fix MinicpmV model converter and clip to avoid using hardcode.
#14750 opened Jul 18, 2025
tests : add non-cont K,V FA tests
#14756 opened Jul 18, 2025
cuda : implement bf16 cpy ops and enable bf16 cont
#14763 opened Jul 18, 2025
webui: add missing messages in export (#13552)
#14764 opened Jul 18, 2025
feat: Add extended sampling API with candidate token lists #14612
#14765 opened Jul 19, 2025
docs : mention apt installation method
#14766 opened Jul 19, 2025

44 Issues closed by 15 people

Eval bug: SIGILL
#13161 closed Jul 19, 2025
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 closed Jul 19, 2025
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 closed Jul 19, 2025
Compile bug: numerous deprecation warnings when compiling in Termux
#14011 closed Jul 19, 2025
Feature Request: Support Llama-Nemotron-Nano-VL-8B-V1
#14015 closed Jul 19, 2025
Misc. bug: "error: invalid argument: /bin/sh" when using Docker image
#14019 closed Jul 19, 2025
Eval bug: b5922 causes gibberish on context shift
#14759 closed Jul 18, 2025
Eval bug: SYCL backend "invalid work-group size" error when using MoE models with Intel iGPU
#14689 closed Jul 18, 2025
Misc. bug: sentencepiece not included in requirements.txt
#13982 closed Jul 18, 2025
Compile bug:
#13992 closed Jul 18, 2025
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 closed Jul 18, 2025
Feature Request: Add Ernie4.5MoE support
#14465 closed Jul 17, 2025
Compile bug: cannot compile get_rows_iq1_m
#14542 closed Jul 17, 2025
Misc. bug: mtmd cannot decode an image provided through valid OpenAI API request
#14615 closed Jul 17, 2025
Eval bug: Assertion failure when using LFM2 with parallel request processing
#14670 closed Jul 17, 2025
Eval bug: Llama-server on Mac starting at b5478 only producing 2-3 streaming tokens on Qwen3 and Deepseek R1 0528
#14706 closed Jul 17, 2025
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 closed Jul 17, 2025
Feature Request: WINA
#13964 closed Jul 17, 2025
Eval bug: Unable to load the model on GPU
#13967 closed Jul 17, 2025
make using shifting context easier.
#13969 closed Jul 17, 2025
context shifting should be default option?
#13971 closed Jul 17, 2025
Misc. bug: llama-bench improper tensor split
#13972 closed Jul 17, 2025
Misc. bug: Hybrid models failing to load with assert GGML_ASSERT(kv_size % n_pad == 0)
#14724 closed Jul 16, 2025
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 closed Jul 16, 2025
Data offline
#14722 closed Jul 16, 2025
Feature Request: Exclude thinking tokens from server cache for reasoning models
#14379 closed Jul 16, 2025
Eval bug: llama-tts abort
#13955 closed Jul 16, 2025
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 closed Jul 16, 2025
Eval bug: make_cpu_buft_list: no CPU backend found .... failed to load model
#14691 closed Jul 15, 2025
llama.cpp unable to compile for HIP
#14694 closed Jul 15, 2025
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 closed Jul 15, 2025
Misc. bug: Decreased success rate for tool calling
#13769 closed Jul 15, 2025
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 closed Jul 15, 2025
Feature Request: Granite 4 Support
#13275 closed Jul 14, 2025
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 closed Jul 14, 2025
Feature Request: Generate Image Embeddings with llama.cpp
#13913 closed Jul 14, 2025
Compile bug: allocator.h:165:24 Call to implicitly-deleted copy constructor of 'std::unique_ptr<llama_adapter_lora, llama_adapter_lora_deleter>'
#13925 closed Jul 14, 2025
Eval bug: convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_k_m'
#14667 closed Jul 13, 2025
Eval bug: Gemma 3n incoherent with HIP when prompt length > ubatch
#14604 closed Jul 13, 2025
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 closed Jul 13, 2025
Feature Request: Make the `/completion` endpoint in `llama-server` work with multimodal models
#13872 closed Jul 13, 2025
Feature Request: Multimodal: llama-server support for Qwen2.5-VL chat template type: list of image paths (type: "video")
#13905 closed Jul 13, 2025
Eval bug:[5808-] qwen3 30B vulkan run with GGG
#14583 closed Jul 12, 2025
Misc. bug: Embedding/pooling: I receive 10xvector not 1xvector
#14543 closed Jul 12, 2025

28 Issues opened by 27 people

Feature Request: Direct FP8 conversion from convert_hf_to_gguf.py
#14762 opened Jul 18, 2025
Exaone-4 gibberish when using jinja template
#14761 opened Jul 18, 2025
Eval bug: Nemotron 49b doesnt load correctly
#14752 opened Jul 18, 2025
Error with unsloth DeepSeek-V3 BF16 and imatrix
#14749 opened Jul 18, 2025
Misc. bug: RPC flash attention bug on deepseek models (deepseek/kimi k2)
#14747 opened Jul 17, 2025
Compile bug: GGML Vulkan :: "format not a string literal and no format arguments [-Werror=format-security]"
#14745 opened Jul 17, 2025
Misc. bug: out of memory error after PR #13746
#14740 opened Jul 17, 2025
Misc. bug: b5921 release zip on github misses llama-embedding binary
#14738 opened Jul 17, 2025
Regarding the build for 8060S (gfx1151):
#14734 opened Jul 17, 2025
Eval bug: Nondeterministic output with ROCm backend despite zero temperature
#14727 opened Jul 16, 2025
Misc. bug: Llama server use some of the vram on another GPU, even I set -mg 1 and -sm 'none'
#14719 opened Jul 16, 2025
Misc. bug: After fine-tuning LLM-Research/Meta-Llama-3-8B-Instruct model with LLaMA Factory, an error occurs while converting it to the GGUF format.
#14715 opened Jul 16, 2025
Feature Request: Optimization of work for MOE architecture
#14714 opened Jul 16, 2025
Misc. bug: llamacpp crashes my PC whenever I close the console for it.
#14713 opened Jul 16, 2025
Compile bug: [SYCL][ARC A770] Regression: 双 A770 支持在 b5422 及以后版本失效
#14709 opened Jul 16, 2025
Misc. bug: OpenAI API v1/responses llama-server
#14702 opened Jul 15, 2025
Feature Request: ARMv7 / Termux Support on Mobile Devices
#14699 opened Jul 15, 2025
Eval bug: Gemma 3n on Vulkan fails to load
#14698 opened Jul 15, 2025
Eval bug: Regression: Tool calls still returned in content field as JSON string instead of tool_calls array
#14697 opened Jul 15, 2025
Eval bug: Unable to run with Qwen3 model
#14696 opened Jul 15, 2025
Compile bug: llama-llava-clip-quantize-cli not found
#14693 opened Jul 15, 2025
Eval bug: CUDA error: operation not supported
#14692 opened Jul 15, 2025
Misc. bug: Meta-Llama-3-8B-Instruct could not convert to .guuf. error:FileNotFoundError: File not found: /mnt/workspace/LLaMA-Factory/output/llama3_lora_sft/tokenizer.model
#14690 opened Jul 15, 2025
Feature Request: Server stream response for "prompt processing progress"
#14685 opened Jul 15, 2025
Misc. bug: DeepSeek-R1 0528 671b:Q4_K_XL think tags do not close sometimes
#14679 opened Jul 14, 2025
Eval bug: Qwen 2.5 VL gets stuck in a loop
#14663 opened Jul 13, 2025
Bunch of blank lines in prompt lead to segmentation fault in tokenizer with Qwen3
#14655 opened Jul 12, 2025
Feature Request: Add Explicit Context Reset for llama-cli or llama-server
#14652 opened Jul 12, 2025

64 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add CUDA non-contiguous Unary Ops support
#14639 commented on Jul 15, 2025 • 11 new comments
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on Jul 18, 2025 • 7 new comments
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on Jul 13, 2025 • 2 new comments
OpenCL: add `mul_mat_f16_f32_image` kernel
#14635 commented on Jul 15, 2025 • 2 new comments
tool: add convertation of text/parquet to custom format
#14622 commented on Jul 18, 2025 • 2 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on Jul 17, 2025 • 1 new comment
finetune.cpp command-line arg
#13873 commented on Jul 18, 2025 • 1 new comment
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on Jul 15, 2025 • 1 new comment
OpenCL: add conv2d kernel
#14403 commented on Jul 18, 2025 • 1 new comment
Allow truncation when embedding
#14493 commented on Jul 18, 2025 • 1 new comment
docker : add cann build pipline
#14591 commented on Jul 17, 2025 • 1 new comment
common: add config presets for falcon
#14638 commented on Jul 12, 2025 • 1 new comment
Misc. bug: weird cursor placement in the web UI
#14233 commented on Jul 18, 2025 • 0 new comments
Misc. bug: prompt as pasted content in the server
#14251 commented on Jul 18, 2025 • 0 new comments
android built on GPU cannot comparable with CPU?
#13910 commented on Jul 18, 2025 • 0 new comments
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on Jul 18, 2025 • 0 new comments
Feature Request: s390x CI
#13243 commented on Jul 18, 2025 • 0 new comments
Feature Request: Gemma3n multimodal support
#14429 commented on Jul 18, 2025 • 0 new comments
main: failed to quantize model from 'gemma-3n-E2B-it.f16.gguf'
#14405 commented on Jul 18, 2025 • 0 new comments
Misc. bug: [CANN] memory leaky using CANN as backend
#14257 commented on Jul 19, 2025 • 0 new comments
Feature Request: Improve Sampling API: Expose Top‑K/Top‑P Candidate Token Lists in C API
#14612 commented on Jul 19, 2025 • 0 new comments
Revert "ggml : remove OpenCL (#7735) + (#8235)"
#8986 commented on Jul 17, 2025 • 0 new comments
imatrix : use GGUF to store importance matrices
#9400 commented on Jul 19, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Jul 18, 2025 • 0 new comments
llama-server : implement universal assisted decoding
#12635 commented on Jul 15, 2025 • 0 new comments
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on Jul 18, 2025 • 0 new comments
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3
#14624 commented on Jul 17, 2025 • 0 new comments
llama : support qwen3 rerank and embeddings
#14029 commented on Jul 13, 2025 • 0 new comments
metal : reuse graphs
#14570 commented on Jul 17, 2025 • 0 new comments
musa: upgrade musa sdk to 4.2.0
#14498 commented on Jul 18, 2025 • 0 new comments
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on Jul 17, 2025 • 0 new comments
[CANN] weight format to nz for Ascend310P3
#14407 commented on Jul 16, 2025 • 0 new comments
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on Jul 12, 2025 • 0 new comments
Metrics should not include : in Prometheus metric names
#14150 commented on Jul 13, 2025 • 0 new comments
Eval bug: (MAC) fail in `GGML_METAL_ADD_KERNEL(GGML_METAL_KERNEL_TYPE_FLASH_ATTN_EXT_Q8_0_H96, flash_attn_ext_q8_0_h96, has_simdgroup_mm);`
#14110 commented on Jul 13, 2025 • 0 new comments
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 commented on Jul 13, 2025 • 0 new comments
prismatic-vlms to gguf?
#14159 commented on Jul 14, 2025 • 0 new comments
Research: mmap eviction
#14154 commented on Jul 14, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Jul 14, 2025 • 0 new comments
OpenCL backend with Qualcomm Adreno GPUs load time is too long
#14337 commented on Jul 14, 2025 • 0 new comments
Feature Request: add tool calling for deepseek-r1-0528
#14557 commented on Jul 15, 2025 • 0 new comments
Feature Request: Support EXAONE 4.0
#14474 commented on Jul 15, 2025 • 0 new comments
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 commented on Jul 15, 2025 • 0 new comments
Compile bug: zero-size array ‘gemm_gemv_kernels’ / invalid feature modifier ‘sme’
#14464 commented on Jul 15, 2025 • 0 new comments
ggml : add WebGPU backend
#7773 commented on Jul 16, 2025 • 0 new comments
Misc. bug: full-cuda docker build needs ldconfig before launching llama-*
#14195 commented on Jul 16, 2025 • 0 new comments
Misc. bug: LLAMA-SERVER is 40% slower than LLAMA-CLI when using identical parameters including -ot option for tensor offloading
#14201 commented on Jul 16, 2025 • 0 new comments
Misc. bug: evaluate_and_capture_cuda_graph NULL POINTER DEREFERENCE
#14186 commented on Jul 16, 2025 • 0 new comments
Misc. bug: Failure to allocate buffer with ROCm 6.4
#14178 commented on Jul 16, 2025 • 0 new comments
Misc. bug: Potential out of bound in rerank
#13549 commented on Jul 16, 2025 • 0 new comments
Misc. bug: Qwen3-Embedding-0.6B-GGUF doesn't work for 32768 context size (too much memory used)
#14084 commented on Jul 16, 2025 • 0 new comments
changelog : `libllama` API
#9289 commented on Jul 16, 2025 • 0 new comments
Misc. bug: crash on vulkan with new max mem alloc size calculations since b5703
#14553 commented on Jul 16, 2025 • 0 new comments
Feature Request: Generic CPU in ggml-cpu/arch
#14402 commented on Jul 16, 2025 • 0 new comments
Feature Request: Support Kimi K2
#14642 commented on Jul 16, 2025 • 0 new comments
Feature Request: llama-server: a flag for limiting input image size
#14216 commented on Jul 17, 2025 • 0 new comments
Misc. bug: OAI response_format json_schema and json_object not applied with Llama 3.x models
#14218 commented on Jul 17, 2025 • 0 new comments
Eval bug: RWKV inference with llama-parallel gets wrong output with lmhead offloaded to GPU
#14211 commented on Jul 17, 2025 • 0 new comments
Misc. bug: [Windows] GPU layers/tensors still consume system memory after load when mmap = true
#14187 commented on Jul 17, 2025 • 0 new comments
Misc. bug: Stuck while loading the model
#14114 commented on Jul 17, 2025 • 0 new comments
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 commented on Jul 17, 2025 • 0 new comments
Feature Request: Add support for Kokoro TTS
#11050 commented on Jul 17, 2025 • 0 new comments
Eval bug: Inconsistent Embedding Similarity between llama-server and LlamaCppEmbeddings for BGE-M3 Model
#14280 commented on Jul 17, 2025 • 0 new comments
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 commented on Jul 17, 2025 • 0 new comments