-
Notifications
You must be signed in to change notification settings - Fork 12k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
44 Releases published by 1 person
-
b5537
published
May 29, 2025 -
b5538
published
May 29, 2025 -
b5539
published
May 30, 2025 -
b5540
published
May 30, 2025 -
b5541
published
May 30, 2025 -
b5543
published
May 30, 2025 -
b5544
published
May 30, 2025 -
b5545
published
May 30, 2025 -
b5546
published
May 30, 2025 -
b5547
published
May 30, 2025 -
b5548
published
May 30, 2025 -
b5551
published
May 31, 2025 -
b5552
published
May 31, 2025 -
b5554
published
May 31, 2025 -
b5555
published
May 31, 2025 -
b5556
published
May 31, 2025 -
b5558
published
May 31, 2025 -
b5559
published
Jun 1, 2025 -
b5560
published
Jun 1, 2025 -
b5568
published
Jun 1, 2025 -
b5569
published
Jun 1, 2025 -
b5571
published
Jun 1, 2025 -
b5572
published
Jun 1, 2025 -
b5573
published
Jun 2, 2025 -
b5574
published
Jun 2, 2025 -
b5575
published
Jun 2, 2025 -
b5576
published
Jun 2, 2025 -
b5577
published
Jun 2, 2025 -
b5578
published
Jun 2, 2025 -
b5579
published
Jun 2, 2025 -
b5580
published
Jun 3, 2025 -
b5581
published
Jun 3, 2025 -
b5584
published
Jun 4, 2025 -
b5585
published
Jun 4, 2025 -
b5586
published
Jun 4, 2025 -
b5587
published
Jun 4, 2025 -
b5588
published
Jun 4, 2025 -
b5589
published
Jun 4, 2025 -
b5590
published
Jun 4, 2025 -
b5591
published
Jun 5, 2025 -
b5592
published
Jun 5, 2025 -
b5593
published
Jun 5, 2025 -
b5595
published
Jun 5, 2025 -
b5596
published
Jun 5, 2025
56 Pull requests merged by 25 people
-
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs
#14001 merged
Jun 5, 2025 -
Fix CUDA build failure on AutoDL cloud platforms
#14005 merged
Jun 5, 2025 -
memory : migrate from llama_kv_cache to more generic llama_memory
#14006 merged
Jun 5, 2025 -
llama : allow using mmap without PrefetchVirtualMemory
#14013 merged
Jun 5, 2025 -
chore: added badge and link to release
#13938 merged
Jun 5, 2025 -
vocab : warn about missing mask token
#14022 merged
Jun 5, 2025 -
context : fix pos_min initialization upon decode error
#14008 merged
Jun 5, 2025 -
vulkan: automatically deduce size of push constants
#13936 merged
Jun 5, 2025 -
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
#13813 merged
Jun 4, 2025 -
kv-cache : refactor the update/defrag mechanism
#13988 merged
Jun 4, 2025 -
ci : remove cuda 11.7 releases, switch runner to windows 2022
#13997 merged
Jun 4, 2025 -
releases : use dl backend for linux release, remove arm64 linux release
#13996 merged
Jun 4, 2025 -
llama-graph : use ggml_repeat_4d
#13998 merged
Jun 4, 2025 -
CUDA: fix FTZ in FA for Gemma 3
#13991 merged
Jun 4, 2025 -
kv-cache : fix unified::seq_rm to work with seq_id < 0
#13985 merged
Jun 4, 2025 -
vulkan: fix warnings in perf logger querypool code
#13937 merged
Jun 3, 2025 -
docs : add "Quick start" section for new users
#13862 merged
Jun 3, 2025 -
opencl: add
backend_synchronize
#13939 merged
Jun 2, 2025 -
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat
#13840 merged
Jun 2, 2025 -
server : disable speculative decoding for SWA models
#13970 merged
Jun 2, 2025 -
metal : use F32 attention accumulators in FA kernels
#13975 merged
Jun 2, 2025 -
gemma : more consistent attention scaling for v2 and v3
#13951 merged
Jun 2, 2025 -
server
: update deepseek reasoning format (pass reasoning_content as diffs)#13933 merged
Jun 2, 2025 -
mtmd : fix memory leak in mtmd_helper_eval_chunk_single
#13961 merged
Jun 2, 2025 -
"Fix: Handle mixed-case 'Power' strings in POWER CPU detection"
#13966 merged
Jun 2, 2025 -
sycl: quantize and reorder the input to q8_1 when reorder is enabled
#13826 merged
Jun 2, 2025 -
gguf: fix failure on version == 0
#13956 merged
Jun 1, 2025 -
convert : fix nomic-bert-moe mask token
#13757 merged
Jun 1, 2025 -
convert : fix vocab padding code for bert models
#13954 merged
Jun 1, 2025 -
ggml: check if non-native endian model is being loaded
#13943 merged
Jun 1, 2025 -
sync : ggml
#13953 merged
Jun 1, 2025 -
add easy-llama Python bindings to README
#13950 merged
Jun 1, 2025 -
parallel : fix n_junk == 0
#13952 merged
Jun 1, 2025 -
kv-cache : split implementation in separate sources
#13920 merged
Jun 1, 2025 -
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
#12995 merged
May 31, 2025 -
Note about necessity of having libcurl installed for standard build
#13945 merged
May 31, 2025 -
chat
: allow unclosed thinking tags#13931 merged
May 31, 2025 -
llama : deprecate explicit kv_self defrag/update calls
#13921 merged
May 31, 2025 -
llama : use n_swa + n_ubatch cells for SWA cache
#13833 merged
May 31, 2025 -
Replace alert and confirm with custom modals.
#13711 merged
May 31, 2025 -
llama : auto-batch preparation
#13845 merged
May 31, 2025 -
mtmd : drop
_shared
fromlibmtmd
name, merge helpers into libmtmd (⚠️ breaking change)#13917 merged
May 31, 2025 -
kv-cache : refactor + add llama_memory_state_i
#13746 merged
May 31, 2025 -
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856)
#13895 merged
May 31, 2025 -
CUDA: fix typo in FlashAttention code
#13926 merged
May 30, 2025 -
sched : avoid changing cur_copy when a graph is already allocated
#13922 merged
May 30, 2025 -
parallel : increase the variability of the prompt lengths
#13927 merged
May 30, 2025 -
cuda : prevent using split buffers with 3d/4d matrices
#13919 merged
May 30, 2025 -
SYCL: Add mrope kernel
#13755 merged
May 30, 2025 -
sync : vendor
#13901 merged
May 30, 2025 -
convert : fix rwkv bos/eos token
#13844 merged
May 30, 2025 -
convert : allow partial update to the chkhsh pre-tokenizer list
#13847 merged
May 30, 2025 -
Add support for DistilBert
#13907 merged
May 30, 2025 -
model: minicpm should use llm_build_granite
#13911 merged
May 30, 2025 -
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture
#13890 merged
May 29, 2025 -
llama : add support for jina-reranker-v2
#13900 merged
May 29, 2025
24 Pull requests opened by 22 people
-
Need to undefine "hz" on AIX
#13894 opened
May 29, 2025 -
ci(intel): venv for python & pip installation for intel docker
#13898 opened
May 29, 2025 -
convert: add eagle2 draft arch
#13908 opened
May 30, 2025 -
remove WIP since PR has been merged
#13912 opened
May 30, 2025 -
[Ascend NPU] Enable labeler
#13914 opened
May 30, 2025 -
[CANN]Support Acl Graph
#13915 opened
May 30, 2025 -
Add plamo2
#13930 opened
May 30, 2025 -
`chat`: improve llama 3.x handling of <|python_tag|> (+ allow --special combo)
#13932 opened
May 30, 2025 -
llama : support multiple classifier outputs and labels
#13940 opened
May 31, 2025 -
ci: add LoongArch cross-compile build
#13944 opened
May 31, 2025 -
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 opened
Jun 2, 2025 -
Hybrid recurrent cache
#13979 opened
Jun 2, 2025 -
llama : allow building all tests on windows when not using shared libs
#13980 opened
Jun 2, 2025 -
chore(server): split context-server to its own file
#13987 opened
Jun 3, 2025 -
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 opened
Jun 4, 2025 -
opencl: preliminary support for Q4_0 mul_mat_id using matvec
#14003 opened
Jun 4, 2025 -
llama-chat : Do not throw when tool parsing fails
#14012 opened
Jun 4, 2025 -
llama: Attempt to add ModernBert
#14014 opened
Jun 4, 2025 -
server: Enable mtmd in llama-server `/completion` endpoint
#14016 opened
Jun 4, 2025 -
tests : add test-tokenizers-repo
#14017 opened
Jun 4, 2025 -
ggml-cpu: fix uncaught underscore terminators for s390x
#14023 opened
Jun 5, 2025 -
llama : support qwen3 rerank and embeddings
#14029 opened
Jun 5, 2025 -
llama : deprecate llama_kv_self_ API
#14030 opened
Jun 5, 2025 -
gguf-py : add add_classifier_output_labels method to writer
#14031 opened
Jun 5, 2025
42 Issues closed by 12 people
-
Bug: MinGW build fails to load models with "error loading model: PrefetchVirtualMemory unavailable"
#9311 closed
Jun 5, 2025 -
Eval bug: llama-server -hf nomic-ai/nomic-embed-text-v2-moe-GGUF --embeddings , broken on latest version
#14021 closed
Jun 5, 2025 -
Compile bug: Prooted Debian in Droid Termux only
#12452 closed
Jun 5, 2025 -
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 closed
Jun 5, 2025 -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 closed
Jun 5, 2025 -
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 closed
Jun 5, 2025 -
Misc. bug: new kv cell seq implementation does not handle "seq_id = -1" specified in the API
#13983 closed
Jun 4, 2025 -
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 closed
Jun 4, 2025 -
Perplexity script for non GGUF quantization
#13015 closed
Jun 4, 2025 -
Eval bug: RWKV inference issue with llama-server
#13018 closed
Jun 4, 2025 -
Container images in GHCR registry, are not multi arch
#13995 closed
Jun 3, 2025 -
Misc. bug: llama-server didn't display thought process since b5576
#13981 closed
Jun 3, 2025 -
Misc. bug: Reasoning content is not separated when streaming
#13867 closed
Jun 2, 2025 -
Misc. bug: memory leak in mtmd ? (mtmd_helper_eval_chunk_single)
#13958 closed
Jun 2, 2025 -
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 closed
Jun 2, 2025 -
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 closed
Jun 2, 2025 -
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 closed
Jun 2, 2025 -
Eval bug: why Gemma 3 model has run into CPU inference
#13004 closed
Jun 2, 2025 -
Eval bug: default system prompt in llama-server
#13948 closed
Jun 1, 2025 -
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 closed
Jun 1, 2025 -
Eval bug: Not support DeepSeek-R1-0528-GGUF-Q8_0
#13916 closed
May 31, 2025 -
mtmd: cmake: C API broken since last change, static linking always broken
#13902 closed
May 31, 2025 -
Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use
#13812 closed
May 31, 2025 -
CUDA illigal memory bug 75 fixed?
#13906 closed
May 31, 2025 -
Misc. bug: what(): Unexpected empty grammar stack after accepting piece: <unused32>
#13341 closed
May 31, 2025 -
Compile bug: gcc-11: error: unrecognized command-line option '-compress-mode=size'
#12325 closed
May 31, 2025 -
Eval bug: convert_hf_to_gguf.py AttributeError:
#12847 closed
May 31, 2025 -
Compile bug: FAILED: examples/llava/CMakeFiles/llava.dir/llava.cpp.obj
#12899 closed
May 31, 2025 -
Compile bug: how to enable opencl in termux
#12911 closed
May 31, 2025 -
Misc. bug: llama-server speculative decoding not as performant as llama-speculative-simple
#12968 closed
May 31, 2025 -
Feature Request: multi model cli tools: Convert submitted images to best size and format for model
#12981 closed
May 31, 2025 -
Feature Request: Make chat sessions possible with multi model cli tools
#12982 closed
May 31, 2025 -
Misc. bug: Potential memory leak in backend registry
#12986 closed
May 31, 2025 -
Eval bug: llama-server.exe silently crashes (ucrtbased.dll) after 2-3 requests in a dialogue
#13877 closed
May 30, 2025 -
`CUDA error: an illegal memory access was encountered` on DeepSeek-R1-0528
#13909 closed
May 30, 2025 -
CUDA error: an illegal memory access was encountered (with large prompts)
#13851 closed
May 30, 2025 -
Eval bug: "GGML_ASSERT(!(split && ne02 > 1)) failed" when loading DeepSeek-R1T with --split-mode row
#13372 closed
May 30, 2025 -
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 closed
May 30, 2025 -
Misc. bug: Excessive power draw on the second GPU in dual RTX 3090 setup when idle
#12958 closed
May 30, 2025 -
Why does /ggml/CMakeLists.txt add_subdirectory(examples)?
#12963 closed
May 30, 2025
29 Issues opened by 27 people
-
Feature Request: add a new repo for convertion of gguf
#14027 opened
Jun 5, 2025 -
Feature Request: support FP8 data type in llama.cpp
#14020 opened
Jun 5, 2025 -
Misc. bug: "error: invalid argument: /bin/sh" when using Docker image
#14019 opened
Jun 5, 2025 -
llama.cpp error when using the snowflake-arctic-embed-v2 model
#14018 opened
Jun 4, 2025 -
Feature Request: Support Llama-Nemotron-Nano-VL-8B-V1
#14015 opened
Jun 4, 2025 -
Compile bug: numerous deprecation warnings when compiling in Termux
#14011 opened
Jun 4, 2025 -
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 opened
Jun 4, 2025 -
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 opened
Jun 3, 2025 -
Compile bug: Race condition during compilation, compilation works with -j 1 but not with -j 8
#13993 opened
Jun 3, 2025 -
Compile bug:
#13992 opened
Jun 3, 2025 -
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 opened
Jun 3, 2025 -
Feature Request:
#13989 opened
Jun 3, 2025 -
Misc. bug: sentencepiece not included in requirements.txt
#13982 opened
Jun 3, 2025 -
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU
#13978 opened
Jun 2, 2025 -
Eval bug: Unexpected failure converting Mistral 7B v0.2 to f32 GGUF
#13976 opened
Jun 2, 2025 -
Misc. bug: llama-bench improper tensor split
#13972 opened
Jun 2, 2025 -
context shifting should be default option?
#13971 opened
Jun 2, 2025 -
make using shifting context easier.
#13969 opened
Jun 2, 2025 -
Eval bug: Unable to load the model on GPU
#13967 opened
Jun 2, 2025 -
Eval bug: llama.cpp crashes in string comparison when using a reasoning model for long periods of time
#13965 opened
Jun 2, 2025 -
Feature Request: WINA
#13964 opened
Jun 2, 2025 -
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 opened
Jun 2, 2025 -
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 opened
Jun 1, 2025 -
Eval bug: llama-tts abort
#13955 opened
Jun 1, 2025 -
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 opened
May 31, 2025 -
Feature Request: Generate Image Embeddings with llama.cpp
#13913 opened
May 30, 2025 -
android built on GPU cannot comparable with CPU?
#13910 opened
May 30, 2025
72 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
finetune.cpp command-line arg
#13873 commented on
Jun 5, 2025 • 70 new comments -
sycl: Add reorder to Q6_K mmvq implementation
#13885 commented on
Jun 5, 2025 • 20 new comments -
ggml-cpu : split arch-specific implementations
#13892 commented on
Jun 5, 2025 • 7 new comments -
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 commented on
Jun 5, 2025 • 3 new comments -
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on
Jun 3, 2025 • 3 new comments -
SYCL: Implement few same quantized type copy kernels
#13739 commented on
Jun 4, 2025 • 1 new comment -
llama : initial Mamba-2 support
#9126 commented on
May 30, 2025 • 0 new comments -
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on
Jun 5, 2025 • 0 new comments -
Add PaliGemma Support
#7553 commented on
Jun 1, 2025 • 0 new comments -
Llama cpp low level python bindings
#1660 commented on
Jun 1, 2025 • 0 new comments -
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on
Jun 5, 2025 • 0 new comments -
Eval bug: Cannot load Qwen3 ranking models
#13820 commented on
Jun 5, 2025 • 0 new comments -
Feature Request: s390x CI
#13243 commented on
Jun 5, 2025 • 0 new comments -
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on
Jun 5, 2025 • 0 new comments -
Eval bug: Qwen3 30B A3B is slow with CUDA
#13211 commented on
Jun 5, 2025 • 0 new comments -
Eval bug: Custom model error.
#13318 commented on
Jun 5, 2025 • 0 new comments -
Eval bug: std::runtime_error Invalid diff:
#13876 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: -TS doesn't support more than ? Devices
#13293 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on
Jun 4, 2025 • 0 new comments -
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on
May 31, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Jun 3, 2025 • 0 new comments -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
May 29, 2025 • 0 new comments -
convert : write tensors in parallel
#12837 commented on
Jun 2, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jun 3, 2025 • 0 new comments -
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on
Jun 3, 2025 • 0 new comments -
webui: Add editing assistant messages (#11849)
#13522 commented on
May 29, 2025 • 0 new comments -
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on
Jun 5, 2025 • 0 new comments -
Granite Four
#13550 commented on
Jun 4, 2025 • 0 new comments -
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 commented on
Jun 3, 2025 • 0 new comments -
model : jina-embeddings-v3 support
#13693 commented on
Jun 1, 2025 • 0 new comments -
ggml : add ggml_fill()
#13772 commented on
Jun 4, 2025 • 0 new comments -
server: args for draft model cache types (#11200)
#13782 commented on
May 30, 2025 • 0 new comments -
kv-cache : avoid modifying recurrent cells when setting inputs
#13834 commented on
May 31, 2025 • 0 new comments -
musa: enable fp16 mma (all) and cublas on qy2
#13842 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.SCB'
#12923 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout
#13240 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: terminate called after throwing an instance of 'vk::DeviceLostError'
#13248 commented on
Jun 1, 2025 • 0 new comments -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: Decreased success rate for tool calling
#13769 commented on
May 31, 2025 • 0 new comments -
Misc. bug: xcframework does not contain support for Catalyst
#12751 commented on
May 31, 2025 • 0 new comments -
Eval bug: Can't utilize all 16 threads / 8 CPU cores for prompt processing when using llama-server. works fine with llama-cli
#13197 commented on
May 31, 2025 • 0 new comments -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 commented on
May 30, 2025 • 0 new comments -
Compile bug: Vulkan Cross compile for arm64
#13068 commented on
May 30, 2025 • 0 new comments -
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 commented on
May 30, 2025 • 0 new comments -
Eval bug: Unreadable output when using qwen2-vl model.
#13165 commented on
May 30, 2025 • 0 new comments -
Misc. bug: llama-parallel segmentation fault
#13172 commented on
May 30, 2025 • 0 new comments -
Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp
#13189 commented on
May 30, 2025 • 0 new comments -
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 commented on
May 30, 2025 • 0 new comments -
Feature Request: Falcon-H1
#13681 commented on
May 29, 2025 • 0 new comments -
Feature Request: Installable package via winget
#8188 commented on
May 29, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Jun 4, 2025 • 0 new comments -
Compile bug: Canot convert from char8_t to char* in llama-chat.cpp
#12740 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: SIGILL
#13161 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: Compilation with openCL on latest build
#13300 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on
Jun 3, 2025 • 0 new comments -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on
Jun 3, 2025 • 0 new comments -
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 commented on
Jun 3, 2025 • 0 new comments -
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 commented on
Jun 3, 2025 • 0 new comments -
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 commented on
Jun 2, 2025 • 0 new comments -
Slow token generation speed of Gemma 3 QAT Models
#13048 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: OpenCL: Issue with Adreno 610
#13115 commented on
Jun 2, 2025 • 0 new comments -
Eval bug: sentencepiece tokenizer generates incorrect tokens
#13256 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: the output file of llama-quantize is not gguf format
#13258 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: Server does not always cancel requests for disconnected connections
#13262 commented on
Jun 2, 2025 • 0 new comments -
Feature Request: add to llama-bench device info reporting of "bf16:1", if built with VK_KHR_bfloat16 support and driver also supports it..
#13274 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: vulkan prompt processing suddenly slows down once I reach a certain prompt size
#13765 commented on
Jun 1, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Jun 1, 2025 • 0 new comments -
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on
Jun 1, 2025 • 0 new comments