Pulse · ggml-org/llama.cpp · GitHub

May 29, 2025 – June 5, 2025

Overview

80 Active pull requests

71 Active issues

44 Releases published by 1 person

b5537
published May 29, 2025
b5538
published May 29, 2025
b5539
published May 30, 2025
b5540
published May 30, 2025
b5541
published May 30, 2025
b5543
published May 30, 2025
b5544
published May 30, 2025
b5545
published May 30, 2025
b5546
published May 30, 2025
b5547
published May 30, 2025
b5548
published May 30, 2025
b5551
published May 31, 2025
b5552
published May 31, 2025
b5554
published May 31, 2025
b5555
published May 31, 2025
b5556
published May 31, 2025
b5558
published May 31, 2025
b5559
published Jun 1, 2025
b5560
published Jun 1, 2025
b5568
published Jun 1, 2025
b5569
published Jun 1, 2025
b5571
published Jun 1, 2025
b5572
published Jun 1, 2025
b5573
published Jun 2, 2025
b5574
published Jun 2, 2025
b5575
published Jun 2, 2025
b5576
published Jun 2, 2025
b5577
published Jun 2, 2025
b5578
published Jun 2, 2025
b5579
published Jun 2, 2025
b5580
published Jun 3, 2025
b5581
published Jun 3, 2025
b5584
published Jun 4, 2025
b5585
published Jun 4, 2025
b5586
published Jun 4, 2025
b5587
published Jun 4, 2025
b5588
published Jun 4, 2025
b5589
published Jun 4, 2025
b5590
published Jun 4, 2025
b5591
published Jun 5, 2025
b5592
published Jun 5, 2025
b5593
published Jun 5, 2025
b5595
published Jun 5, 2025
b5596
published Jun 5, 2025

56 Pull requests merged by 25 people

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs
#14001 merged Jun 5, 2025
Fix CUDA build failure on AutoDL cloud platforms
#14005 merged Jun 5, 2025
memory : migrate from llama_kv_cache to more generic llama_memory
#14006 merged Jun 5, 2025
llama : allow using mmap without PrefetchVirtualMemory
#14013 merged Jun 5, 2025
chore: added badge and link to release
#13938 merged Jun 5, 2025
vocab : warn about missing mask token
#14022 merged Jun 5, 2025
context : fix pos_min initialization upon decode error
#14008 merged Jun 5, 2025
vulkan: automatically deduce size of push constants
#13936 merged Jun 5, 2025
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
#13813 merged Jun 4, 2025
kv-cache : refactor the update/defrag mechanism
#13988 merged Jun 4, 2025
ci : remove cuda 11.7 releases, switch runner to windows 2022
#13997 merged Jun 4, 2025
releases : use dl backend for linux release, remove arm64 linux release
#13996 merged Jun 4, 2025
llama-graph : use ggml_repeat_4d
#13998 merged Jun 4, 2025
CUDA: fix FTZ in FA for Gemma 3
#13991 merged Jun 4, 2025
kv-cache : fix unified::seq_rm to work with seq_id < 0
#13985 merged Jun 4, 2025
vulkan: fix warnings in perf logger querypool code
#13937 merged Jun 3, 2025
docs : add "Quick start" section for new users
#13862 merged Jun 3, 2025
opencl: add backend_synchronize
#13939 merged Jun 2, 2025
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat
#13840 merged Jun 2, 2025
server : disable speculative decoding for SWA models
#13970 merged Jun 2, 2025
metal : use F32 attention accumulators in FA kernels
#13975 merged Jun 2, 2025
gemma : more consistent attention scaling for v2 and v3
#13951 merged Jun 2, 2025
server: update deepseek reasoning format (pass reasoning_content as diffs)
#13933 merged Jun 2, 2025
mtmd : fix memory leak in mtmd_helper_eval_chunk_single
#13961 merged Jun 2, 2025
"Fix: Handle mixed-case 'Power' strings in POWER CPU detection"
#13966 merged Jun 2, 2025
sycl: quantize and reorder the input to q8_1 when reorder is enabled
#13826 merged Jun 2, 2025
gguf: fix failure on version == 0
#13956 merged Jun 1, 2025
convert : fix nomic-bert-moe mask token
#13757 merged Jun 1, 2025
convert : fix vocab padding code for bert models
#13954 merged Jun 1, 2025
ggml: check if non-native endian model is being loaded
#13943 merged Jun 1, 2025
sync : ggml
#13953 merged Jun 1, 2025
add easy-llama Python bindings to README
#13950 merged Jun 1, 2025
parallel : fix n_junk == 0
#13952 merged Jun 1, 2025
kv-cache : split implementation in separate sources
#13920 merged Jun 1, 2025
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
#12995 merged May 31, 2025
Note about necessity of having libcurl installed for standard build
#13945 merged May 31, 2025
chat: allow unclosed thinking tags
#13931 merged May 31, 2025
llama : deprecate explicit kv_self defrag/update calls
#13921 merged May 31, 2025
llama : use n_swa + n_ubatch cells for SWA cache
#13833 merged May 31, 2025
Replace alert and confirm with custom modals.
#13711 merged May 31, 2025
llama : auto-batch preparation
#13845 merged May 31, 2025
mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (⚠️ breaking change)
#13917 merged May 31, 2025
kv-cache : refactor + add llama_memory_state_i
#13746 merged May 31, 2025
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856)
#13895 merged May 31, 2025
CUDA: fix typo in FlashAttention code
#13926 merged May 30, 2025
sched : avoid changing cur_copy when a graph is already allocated
#13922 merged May 30, 2025
parallel : increase the variability of the prompt lengths
#13927 merged May 30, 2025
cuda : prevent using split buffers with 3d/4d matrices
#13919 merged May 30, 2025
SYCL: Add mrope kernel
#13755 merged May 30, 2025
sync : vendor
#13901 merged May 30, 2025
convert : fix rwkv bos/eos token
#13844 merged May 30, 2025
convert : allow partial update to the chkhsh pre-tokenizer list
#13847 merged May 30, 2025
Add support for DistilBert
#13907 merged May 30, 2025
model: minicpm should use llm_build_granite
#13911 merged May 30, 2025
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture
#13890 merged May 29, 2025
llama : add support for jina-reranker-v2
#13900 merged May 29, 2025

24 Pull requests opened by 22 people

Need to undefine "hz" on AIX
#13894 opened May 29, 2025
ci(intel): venv for python & pip installation for intel docker
#13898 opened May 29, 2025
convert: add eagle2 draft arch
#13908 opened May 30, 2025
remove WIP since PR has been merged
#13912 opened May 30, 2025
[Ascend NPU] Enable labeler
#13914 opened May 30, 2025
[CANN]Support Acl Graph
#13915 opened May 30, 2025
Add plamo2
#13930 opened May 30, 2025
`chat`: improve llama 3.x handling of <|python_tag|> (+ allow --special combo)
#13932 opened May 30, 2025
llama : support multiple classifier outputs and labels
#13940 opened May 31, 2025
ci: add LoongArch cross-compile build
#13944 opened May 31, 2025
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 opened Jun 2, 2025
Hybrid recurrent cache
#13979 opened Jun 2, 2025
llama : allow building all tests on windows when not using shared libs
#13980 opened Jun 2, 2025
chore(server): split context-server to its own file
#13987 opened Jun 3, 2025
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 opened Jun 4, 2025
opencl: preliminary support for Q4_0 mul_mat_id using matvec
#14003 opened Jun 4, 2025
llama-chat : Do not throw when tool parsing fails
#14012 opened Jun 4, 2025
llama: Attempt to add ModernBert
#14014 opened Jun 4, 2025
server: Enable mtmd in llama-server `/completion` endpoint
#14016 opened Jun 4, 2025
tests : add test-tokenizers-repo
#14017 opened Jun 4, 2025
ggml-cpu: fix uncaught underscore terminators for s390x
#14023 opened Jun 5, 2025
llama : support qwen3 rerank and embeddings
#14029 opened Jun 5, 2025
llama : deprecate llama_kv_self_ API
#14030 opened Jun 5, 2025
gguf-py : add add_classifier_output_labels method to writer
#14031 opened Jun 5, 2025

42 Issues closed by 12 people

Bug: MinGW build fails to load models with "error loading model: PrefetchVirtualMemory unavailable"
#9311 closed Jun 5, 2025
Eval bug: llama-server -hf nomic-ai/nomic-embed-text-v2-moe-GGUF --embeddings , broken on latest version
#14021 closed Jun 5, 2025
Compile bug: Prooted Debian in Droid Termux only
#12452 closed Jun 5, 2025
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 closed Jun 5, 2025
Feature Request: Ability to pack multiple GGUFs into single one
#13028 closed Jun 5, 2025
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 closed Jun 5, 2025
Misc. bug: new kv cell seq implementation does not handle "seq_id = -1" specified in the API
#13983 closed Jun 4, 2025
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 closed Jun 4, 2025
Perplexity script for non GGUF quantization
#13015 closed Jun 4, 2025
Eval bug: RWKV inference issue with llama-server
#13018 closed Jun 4, 2025
Container images in GHCR registry, are not multi arch
#13995 closed Jun 3, 2025
Misc. bug: llama-server didn't display thought process since b5576
#13981 closed Jun 3, 2025
Misc. bug: Reasoning content is not separated when streaming
#13867 closed Jun 2, 2025
Misc. bug: memory leak in mtmd ? (mtmd_helper_eval_chunk_single)
#13958 closed Jun 2, 2025
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 closed Jun 2, 2025
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 closed Jun 2, 2025
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 closed Jun 2, 2025
Eval bug: why Gemma 3 model has run into CPU inference
#13004 closed Jun 2, 2025
Eval bug: default system prompt in llama-server
#13948 closed Jun 1, 2025
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 closed Jun 1, 2025
Eval bug: CUDA error: an illegal memory access was encountered on mistral-small-3.1-24b-instruct with mmproj
#13879 closed Jun 1, 2025
Eval bug: Not support DeepSeek-R1-0528-GGUF-Q8_0
#13916 closed May 31, 2025
mtmd: cmake: C API broken since last change, static linking always broken
#13902 closed May 31, 2025
Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use
#13812 closed May 31, 2025
CUDA illigal memory bug 75 fixed?
#13906 closed May 31, 2025
Misc. bug: what(): Unexpected empty grammar stack after accepting piece: <unused32>
#13341 closed May 31, 2025
Compile bug: gcc-11: error: unrecognized command-line option '-compress-mode=size'
#12325 closed May 31, 2025
Eval bug: convert_hf_to_gguf.py AttributeError:
#12847 closed May 31, 2025
Compile bug: FAILED: examples/llava/CMakeFiles/llava.dir/llava.cpp.obj
#12899 closed May 31, 2025
Compile bug: how to enable opencl in termux
#12911 closed May 31, 2025
Misc. bug: llama-server speculative decoding not as performant as llama-speculative-simple
#12968 closed May 31, 2025
Feature Request: multi model cli tools: Convert submitted images to best size and format for model
#12981 closed May 31, 2025
Feature Request: Make chat sessions possible with multi model cli tools
#12982 closed May 31, 2025
Feature Request: llama-tts: read from text files and pipe audio signals to stdout for direct audio conversion using ffmpeg
#12984 closed May 31, 2025
Misc. bug: Potential memory leak in backend registry
#12986 closed May 31, 2025
Eval bug: llama-server.exe silently crashes (ucrtbased.dll) after 2-3 requests in a dialogue
#13877 closed May 30, 2025
`CUDA error: an illegal memory access was encountered` on DeepSeek-R1-0528
#13909 closed May 30, 2025
CUDA error: an illegal memory access was encountered (with large prompts)
#13851 closed May 30, 2025
Eval bug: "GGML_ASSERT(!(split && ne02 > 1)) failed" when loading DeepSeek-R1T with --split-mode row
#13372 closed May 30, 2025
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 closed May 30, 2025
Misc. bug: Excessive power draw on the second GPU in dual RTX 3090 setup when idle
#12958 closed May 30, 2025
Why does /ggml/CMakeLists.txt add_subdirectory(examples)?
#12963 closed May 30, 2025

29 Issues opened by 27 people

Feature Request: add a new repo for convertion of gguf
#14027 opened Jun 5, 2025
Feature Request: support FP8 data type in llama.cpp
#14020 opened Jun 5, 2025
Misc. bug: "error: invalid argument: /bin/sh" when using Docker image
#14019 opened Jun 5, 2025
llama.cpp error when using the snowflake-arctic-embed-v2 model
#14018 opened Jun 4, 2025
Feature Request: Support Llama-Nemotron-Nano-VL-8B-V1
#14015 opened Jun 4, 2025
Compile bug: numerous deprecation warnings when compiling in Termux
#14011 opened Jun 4, 2025
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 opened Jun 4, 2025
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 opened Jun 3, 2025
Compile bug: Race condition during compilation, compilation works with -j 1 but not with -j 8
#13993 opened Jun 3, 2025
Compile bug:
#13992 opened Jun 3, 2025
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 opened Jun 3, 2025
Feature Request:
#13989 opened Jun 3, 2025
Misc. bug: sentencepiece not included in requirements.txt
#13982 opened Jun 3, 2025
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU
#13978 opened Jun 2, 2025
Eval bug: Unexpected failure converting Mistral 7B v0.2 to f32 GGUF
#13976 opened Jun 2, 2025
Misc. bug: llama-bench improper tensor split
#13972 opened Jun 2, 2025
context shifting should be default option?
#13971 opened Jun 2, 2025
make using shifting context easier.
#13969 opened Jun 2, 2025
Eval bug: Unable to load the model on GPU
#13967 opened Jun 2, 2025
Eval bug: llama.cpp crashes in string comparison when using a reasoning model for long periods of time
#13965 opened Jun 2, 2025
Feature Request: WINA
#13964 opened Jun 2, 2025
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 opened Jun 2, 2025
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 opened Jun 1, 2025
Eval bug: llama-tts abort
#13955 opened Jun 1, 2025
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 opened May 31, 2025
Compile bug: allocator.h:165:24 Call to implicitly-deleted copy constructor of 'std::unique_ptr<llama_adapter_lora, llama_adapter_lora_deleter>'
#13925 opened May 30, 2025
Feature Request: Generate Image Embeddings with llama.cpp
#13913 opened May 30, 2025
android built on GPU cannot comparable with CPU?
#13910 opened May 30, 2025
Feature Request: Multimodal: llama-server support for Qwen2.5-VL chat template type: list of image paths (type: "video")
#13905 opened May 29, 2025

72 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

finetune.cpp command-line arg
#13873 commented on Jun 5, 2025 • 70 new comments
sycl: Add reorder to Q6_K mmvq implementation
#13885 commented on Jun 5, 2025 • 20 new comments
ggml-cpu : split arch-specific implementations
#13892 commented on Jun 5, 2025 • 7 new comments
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 commented on Jun 5, 2025 • 3 new comments
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on Jun 3, 2025 • 3 new comments
SYCL: Implement few same quantized type copy kernels
#13739 commented on Jun 4, 2025 • 1 new comment
llama : initial Mamba-2 support
#9126 commented on May 30, 2025 • 0 new comments
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on Jun 5, 2025 • 0 new comments
Add PaliGemma Support
#7553 commented on Jun 1, 2025 • 0 new comments
Llama cpp low level python bindings
#1660 commented on Jun 1, 2025 • 0 new comments
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on Jun 5, 2025 • 0 new comments
Eval bug: Cannot load Qwen3 ranking models
#13820 commented on Jun 5, 2025 • 0 new comments
Feature Request: s390x CI
#13243 commented on Jun 5, 2025 • 0 new comments
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on Jun 5, 2025 • 0 new comments
Eval bug: Qwen3 30B A3B is slow with CUDA
#13211 commented on Jun 5, 2025 • 0 new comments
Eval bug: Custom model error.
#13318 commented on Jun 5, 2025 • 0 new comments
Eval bug: std::runtime_error Invalid diff:
#13876 commented on Jun 4, 2025 • 0 new comments
Misc. bug: -TS doesn't support more than ? Devices
#13293 commented on Jun 4, 2025 • 0 new comments
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 commented on Jun 4, 2025 • 0 new comments
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 commented on Jun 4, 2025 • 0 new comments
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on Jun 4, 2025 • 0 new comments
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on May 31, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Jun 3, 2025 • 0 new comments
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on May 29, 2025 • 0 new comments
convert : write tensors in parallel
#12837 commented on Jun 2, 2025 • 0 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on Jun 3, 2025 • 0 new comments
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on Jun 3, 2025 • 0 new comments
webui: Add editing assistant messages (#11849)
#13522 commented on May 29, 2025 • 0 new comments
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on Jun 5, 2025 • 0 new comments
Granite Four
#13550 commented on Jun 4, 2025 • 0 new comments
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 commented on Jun 3, 2025 • 0 new comments
model : jina-embeddings-v3 support
#13693 commented on Jun 1, 2025 • 0 new comments
ggml : add ggml_fill()
#13772 commented on Jun 4, 2025 • 0 new comments
server: args for draft model cache types (#11200)
#13782 commented on May 30, 2025 • 0 new comments
kv-cache : avoid modifying recurrent cells when setting inputs
#13834 commented on May 31, 2025 • 0 new comments
musa: enable fp16 mma (all) and cublas on qy2
#13842 commented on Jun 4, 2025 • 0 new comments
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.SCB'
#12923 commented on Jun 1, 2025 • 0 new comments
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 commented on Jun 1, 2025 • 0 new comments
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 commented on Jun 1, 2025 • 0 new comments
Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout
#13240 commented on Jun 1, 2025 • 0 new comments
Misc. bug: terminate called after throwing an instance of 'vk::DeviceLostError'
#13248 commented on Jun 1, 2025 • 0 new comments
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 commented on Jun 1, 2025 • 0 new comments
Misc. bug: Decreased success rate for tool calling
#13769 commented on May 31, 2025 • 0 new comments
Misc. bug: xcframework does not contain support for Catalyst
#12751 commented on May 31, 2025 • 0 new comments
Eval bug: Can't utilize all 16 threads / 8 CPU cores for prompt processing when using llama-server. works fine with llama-cli
#13197 commented on May 31, 2025 • 0 new comments
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 commented on May 30, 2025 • 0 new comments
Compile bug: Vulkan Cross compile for arm64
#13068 commented on May 30, 2025 • 0 new comments
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 commented on May 30, 2025 • 0 new comments
Eval bug: Unreadable output when using qwen2-vl model.
#13165 commented on May 30, 2025 • 0 new comments
Misc. bug: llama-parallel segmentation fault
#13172 commented on May 30, 2025 • 0 new comments
Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp
#13189 commented on May 30, 2025 • 0 new comments
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 commented on May 30, 2025 • 0 new comments
Feature Request: Falcon-H1
#13681 commented on May 29, 2025 • 0 new comments
Feature Request: Installable package via winget
#8188 commented on May 29, 2025 • 0 new comments
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on Jun 4, 2025 • 0 new comments
Compile bug: Canot convert from char8_t to char* in llama-chat.cpp
#12740 commented on Jun 4, 2025 • 0 new comments
Eval bug: SIGILL
#13161 commented on Jun 4, 2025 • 0 new comments
Misc. bug: Compilation with openCL on latest build
#13300 commented on Jun 4, 2025 • 0 new comments
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on Jun 3, 2025 • 0 new comments
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on Jun 3, 2025 • 0 new comments
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 commented on Jun 3, 2025 • 0 new comments
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 commented on Jun 3, 2025 • 0 new comments
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 commented on Jun 2, 2025 • 0 new comments
Slow token generation speed of Gemma 3 QAT Models
#13048 commented on Jun 2, 2025 • 0 new comments
Misc. bug: OpenCL: Issue with Adreno 610
#13115 commented on Jun 2, 2025 • 0 new comments
Eval bug: sentencepiece tokenizer generates incorrect tokens
#13256 commented on Jun 2, 2025 • 0 new comments
Misc. bug: the output file of llama-quantize is not gguf format
#13258 commented on Jun 2, 2025 • 0 new comments
Misc. bug: Server does not always cancel requests for disconnected connections
#13262 commented on Jun 2, 2025 • 0 new comments
Feature Request: add to llama-bench device info reporting of "bf16:1", if built with VK_KHR_bfloat16 support and driver also supports it..
#13274 commented on Jun 2, 2025 • 0 new comments
Misc. bug: vulkan prompt processing suddenly slows down once I reach a certain prompt size
#13765 commented on Jun 1, 2025 • 0 new comments
Feature Request: Support Codestral Mamba
#8519 commented on Jun 1, 2025 • 0 new comments
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on Jun 1, 2025 • 0 new comments