-
Notifications
You must be signed in to change notification settings - Fork 12k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
49 Releases published by 1 person
-
b5512
published
May 27, 2025 -
b5513
published
May 27, 2025 -
b5514
published
May 27, 2025 -
b5515
published
May 27, 2025 -
b5516
published
May 27, 2025 -
b5517
published
May 28, 2025 -
b5519
published
May 28, 2025 -
b5522
published
May 28, 2025 -
b5524
published
May 28, 2025 -
b5526
published
May 28, 2025 -
b5527
published
May 28, 2025 -
b5529
published
May 29, 2025 -
b5530
published
May 29, 2025 -
b5532
published
May 29, 2025 -
b5533
published
May 29, 2025 -
b5534
published
May 29, 2025 -
b5535
published
May 29, 2025 -
b5537
published
May 29, 2025 -
b5538
published
May 29, 2025 -
b5539
published
May 30, 2025 -
b5540
published
May 30, 2025 -
b5541
published
May 30, 2025 -
b5543
published
May 30, 2025 -
b5544
published
May 30, 2025 -
b5545
published
May 30, 2025 -
b5546
published
May 30, 2025 -
b5547
published
May 30, 2025 -
b5548
published
May 30, 2025 -
b5551
published
May 31, 2025 -
b5552
published
May 31, 2025 -
b5554
published
May 31, 2025 -
b5555
published
May 31, 2025 -
b5556
published
May 31, 2025 -
b5558
published
May 31, 2025 -
b5559
published
Jun 1, 2025 -
b5560
published
Jun 1, 2025 -
b5568
published
Jun 1, 2025 -
b5569
published
Jun 1, 2025 -
b5571
published
Jun 1, 2025 -
b5572
published
Jun 1, 2025 -
b5573
published
Jun 2, 2025 -
b5574
published
Jun 2, 2025 -
b5575
published
Jun 2, 2025 -
b5576
published
Jun 2, 2025 -
b5577
published
Jun 2, 2025 -
b5578
published
Jun 2, 2025 -
b5579
published
Jun 2, 2025 -
b5580
published
Jun 3, 2025 -
b5581
published
Jun 3, 2025
62 Pull requests merged by 27 people
-
docs : add "Quick start" section for new users
#13862 merged
Jun 3, 2025 -
opencl: add
backend_synchronize
#13939 merged
Jun 2, 2025 -
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat
#13840 merged
Jun 2, 2025 -
server : disable speculative decoding for SWA models
#13970 merged
Jun 2, 2025 -
metal : use F32 attention accumulators in FA kernels
#13975 merged
Jun 2, 2025 -
gemma : more consistent attention scaling for v2 and v3
#13951 merged
Jun 2, 2025 -
server
: update deepseek reasoning format (pass reasoning_content as diffs)#13933 merged
Jun 2, 2025 -
mtmd : fix memory leak in mtmd_helper_eval_chunk_single
#13961 merged
Jun 2, 2025 -
"Fix: Handle mixed-case 'Power' strings in POWER CPU detection"
#13966 merged
Jun 2, 2025 -
sycl: quantize and reorder the input to q8_1 when reorder is enabled
#13826 merged
Jun 2, 2025 -
gguf: fix failure on version == 0
#13956 merged
Jun 1, 2025 -
convert : fix nomic-bert-moe mask token
#13757 merged
Jun 1, 2025 -
convert : fix vocab padding code for bert models
#13954 merged
Jun 1, 2025 -
ggml: check if non-native endian model is being loaded
#13943 merged
Jun 1, 2025 -
sync : ggml
#13953 merged
Jun 1, 2025 -
add easy-llama Python bindings to README
#13950 merged
Jun 1, 2025 -
parallel : fix n_junk == 0
#13952 merged
Jun 1, 2025 -
kv-cache : split implementation in separate sources
#13920 merged
Jun 1, 2025 -
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
#12995 merged
May 31, 2025 -
Note about necessity of having libcurl installed for standard build
#13945 merged
May 31, 2025 -
chat
: allow unclosed thinking tags#13931 merged
May 31, 2025 -
llama : deprecate explicit kv_self defrag/update calls
#13921 merged
May 31, 2025 -
llama : use n_swa + n_ubatch cells for SWA cache
#13833 merged
May 31, 2025 -
Replace alert and confirm with custom modals.
#13711 merged
May 31, 2025 -
llama : auto-batch preparation
#13845 merged
May 31, 2025 -
mtmd : drop
_shared
fromlibmtmd
name, merge helpers into libmtmd (⚠️ breaking change)#13917 merged
May 31, 2025 -
kv-cache : refactor + add llama_memory_state_i
#13746 merged
May 31, 2025 -
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856)
#13895 merged
May 31, 2025 -
CUDA: fix typo in FlashAttention code
#13926 merged
May 30, 2025 -
sched : avoid changing cur_copy when a graph is already allocated
#13922 merged
May 30, 2025 -
parallel : increase the variability of the prompt lengths
#13927 merged
May 30, 2025 -
cuda : prevent using split buffers with 3d/4d matrices
#13919 merged
May 30, 2025 -
SYCL: Add mrope kernel
#13755 merged
May 30, 2025 -
sync : vendor
#13901 merged
May 30, 2025 -
convert : fix rwkv bos/eos token
#13844 merged
May 30, 2025 -
convert : allow partial update to the chkhsh pre-tokenizer list
#13847 merged
May 30, 2025 -
Add support for DistilBert
#13907 merged
May 30, 2025 -
model: minicpm should use llm_build_granite
#13911 merged
May 30, 2025 -
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture
#13890 merged
May 29, 2025 -
llama : add support for jina-reranker-v2
#13900 merged
May 29, 2025 -
gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method
#13561 merged
May 29, 2025 -
arm64: optimize q4_k_q8_k kernel with i8mm
#13886 merged
May 29, 2025 -
cmake: Factor out CPU architecture detection
#13883 merged
May 29, 2025 -
ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm
#13882 merged
May 29, 2025 -
tests : remove json.hpp from a test
#13880 merged
May 29, 2025 -
convert : workaround for AutoConfig dummy labels
#13881 merged
May 29, 2025 -
llama : add RobertaForSequenceClassification reranker support
#13875 merged
May 29, 2025 -
ggml: aarch64: Implement SVE F32 kernels for vector functions
#13843 merged
May 29, 2025 -
gguf/utility: return full content on size < 0
#13841 merged
May 28, 2025 -
llama : fix KV shift for qwen2vl
#13870 merged
May 28, 2025 -
mtmd : move helpers to dedicated library (⚠️ breaking change)
#13866 merged
May 28, 2025 -
ci: disable LLAMA_CURL for Linux cross-builds
#13871 merged
May 28, 2025 -
Add support for BertForSequenceClassification reranking
#13858 merged
May 28, 2025 -
convert: small addition to support LlamaModel
#13838 merged
May 28, 2025 -
convert : fix qwen omni conversion
#13859 merged
May 28, 2025 -
Change umlaut test
#11600 merged
May 28, 2025 -
CUDA: fix FA tg at long context for CC >= 8.9
#13852 merged
May 28, 2025 -
convert : fix tensor naming conflict for llama 4 vision
#13836 merged
May 28, 2025 -
[CANN]: Add SOC TYPE printing in cmake configuration processing
#13837 merged
May 28, 2025 -
opencl: add new ops -
argsort
,div
,sub
,addrows
,sigmoid
,group_norm
#13787 merged
May 27, 2025 -
opencl: mark
MUL_MAT
supports non-contiguous tensors for f32#13790 merged
May 27, 2025
28 Pull requests opened by 24 people
-
kv-cache : avoid modifying recurrent cells when setting inputs
#13834 opened
May 27, 2025 -
musa: enable fp16 mma (all) and cublas on qy2
#13842 opened
May 28, 2025 -
tests : add test-tokenizers-remote
#13846 opened
May 28, 2025 -
finetune.cpp command-line arg
#13873 opened
May 28, 2025 -
sycl: Add reorder to Q6_K mmvq implementation
#13885 opened
May 29, 2025 -
musa: extract ggml_cuda_mul_mat_batched_cublas_gemm_batched_ex
#13887 opened
May 29, 2025 -
ggml-cpu : split arch-specific implementations
#13892 opened
May 29, 2025 -
Need to undefine "hz" on AIX
#13894 opened
May 29, 2025 -
ci(intel): venv for python & pip installation for intel docker
#13898 opened
May 29, 2025 -
convert: add eagle2 draft arch
#13908 opened
May 30, 2025 -
remove WIP since PR has been merged
#13912 opened
May 30, 2025 -
[Ascend NPU] Enable labeler
#13914 opened
May 30, 2025 -
[CANN]Support Acl Graph
#13915 opened
May 30, 2025 -
Add plamo2
#13930 opened
May 30, 2025 -
`chat`: improve llama 3.x handling of <|python_tag|> (+ allow --special combo)
#13932 opened
May 30, 2025 -
vulkan: automatically deduce size of push constants
#13936 opened
May 31, 2025 -
vulkan: fix warnings in perf logger querypool code
#13937 opened
May 31, 2025 -
chore: added badge and link to release
#13938 opened
May 31, 2025 -
llama : support multiple classifier outputs and labels
#13940 opened
May 31, 2025 -
ci: add LoongArch cross-compile build
#13944 opened
May 31, 2025 -
ci: Update windows-2019 to windows-2022
#13960 opened
Jun 1, 2025 -
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 opened
Jun 2, 2025 -
Hybrid recurrent cache
#13979 opened
Jun 2, 2025 -
llama : allow building all tests on windows when not using shared libs
#13980 opened
Jun 2, 2025 -
kv-cache : fix unified::seq_rm to work with seq_id < 0
#13985 opened
Jun 3, 2025 -
chore(server): split context-server to its own file
#13987 opened
Jun 3, 2025 -
kv-cache : refactor the update/defrag mechanism
#13988 opened
Jun 3, 2025 -
CUDA: fix FTZ in FA for Gemma 3
#13991 opened
Jun 3, 2025
52 Issues closed by 16 people
-
Misc. bug: llama-server didn't display thought process since b5576
#13981 closed
Jun 3, 2025 -
Misc. bug: Reasoning content is not separated when streaming
#13867 closed
Jun 2, 2025 -
Misc. bug: memory leak in mtmd ? (mtmd_helper_eval_chunk_single)
#13958 closed
Jun 2, 2025 -
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 closed
Jun 2, 2025 -
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 closed
Jun 2, 2025 -
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 closed
Jun 2, 2025 -
Eval bug: why Gemma 3 model has run into CPU inference
#13004 closed
Jun 2, 2025 -
Eval bug: default system prompt in llama-server
#13948 closed
Jun 1, 2025 -
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 closed
Jun 1, 2025 -
Eval bug: Not support DeepSeek-R1-0528-GGUF-Q8_0
#13916 closed
May 31, 2025 -
mtmd: cmake: C API broken since last change, static linking always broken
#13902 closed
May 31, 2025 -
Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use
#13812 closed
May 31, 2025 -
CUDA illigal memory bug 75 fixed?
#13906 closed
May 31, 2025 -
Misc. bug: what(): Unexpected empty grammar stack after accepting piece: <unused32>
#13341 closed
May 31, 2025 -
Compile bug: gcc-11: error: unrecognized command-line option '-compress-mode=size'
#12325 closed
May 31, 2025 -
Eval bug: convert_hf_to_gguf.py AttributeError:
#12847 closed
May 31, 2025 -
Compile bug: FAILED: examples/llava/CMakeFiles/llava.dir/llava.cpp.obj
#12899 closed
May 31, 2025 -
Compile bug: how to enable opencl in termux
#12911 closed
May 31, 2025 -
Misc. bug: llama-server speculative decoding not as performant as llama-speculative-simple
#12968 closed
May 31, 2025 -
Feature Request: multi model cli tools: Convert submitted images to best size and format for model
#12981 closed
May 31, 2025 -
Feature Request: Make chat sessions possible with multi model cli tools
#12982 closed
May 31, 2025 -
Misc. bug: Potential memory leak in backend registry
#12986 closed
May 31, 2025 -
Eval bug: llama-server.exe silently crashes (ucrtbased.dll) after 2-3 requests in a dialogue
#13877 closed
May 30, 2025 -
`CUDA error: an illegal memory access was encountered` on DeepSeek-R1-0528
#13909 closed
May 30, 2025 -
CUDA error: an illegal memory access was encountered (with large prompts)
#13851 closed
May 30, 2025 -
Eval bug: "GGML_ASSERT(!(split && ne02 > 1)) failed" when loading DeepSeek-R1T with --split-mode row
#13372 closed
May 30, 2025 -
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 closed
May 30, 2025 -
Misc. bug: Excessive power draw on the second GPU in dual RTX 3090 setup when idle
#12958 closed
May 30, 2025 -
Why does /ggml/CMakeLists.txt add_subdirectory(examples)?
#12963 closed
May 30, 2025 -
Misc. bug: gguf-new-metadata and gguf-editor-gui changes all integer arrays to INT32
#13557 closed
May 29, 2025 -
Eval bug: stream with tool_call fix in b5478 crash in container and issues with calls from apps
#13766 closed
May 29, 2025 -
Misc. bug: ALL gguf models fail to run (no log, docker exit code 139),
#12205 closed
May 29, 2025 -
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 closed
May 29, 2025 -
Compile bug: ggml-cuda/opt-step-adamw.cu error: identifier "__Poly8x8_t" is undefined on Jetson Orin AGX
#12826 closed
May 29, 2025 -
CUDA: implementation of mul_mat_id
#12859 closed
May 29, 2025 -
what *tool/framework* to use if testing performance of .gguf models
#12901 closed
May 29, 2025 -
Misc. bug: llama-bench --tensor-split handling is broken
#12917 closed
May 29, 2025 -
Compile bug: macro "DECL_FATTN_MMA_F16_CASE" requires 3 arguments, but only 2 given
#12921 closed
May 29, 2025 -
Misc. bug: llama-server "terminate called after throwing an instance of 'std::runtime_error'"
#12939 closed
May 29, 2025 -
Model conversion issue
#12941 closed
May 29, 2025 -
Eval bug: KV cache shifting does not work for Qwen2.5VL
#13865 closed
May 28, 2025 -
CI: build-linux-cross failing
#13869 closed
May 28, 2025 -
Eval bug: qwen2.5-vl related bugs
#13848 closed
May 28, 2025 -
Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp.
#13723 closed
May 28, 2025 -
Misc. bug: Streaming tool calls does not return "type": "function", unlike non-stream
#13798 closed
May 28, 2025 -
Feature Request: Free up VRAM when llama-server not in use
#11703 closed
May 28, 2025 -
Eval bug: ggml_vulkan: Device memory allocation of size N failed with ub > 4096 and c > 4096 and b > 4096
#12817 closed
May 28, 2025 -
Eval bug: ROCm error: CUBLAS_STATUS_INTERNAL_ERROR
#12878 closed
May 28, 2025 -
Misc. bug: gguf-my-repo doesn't work - [Errno 2] No such file or directory: './llama.cpp/llama-quantize'
#12925 closed
May 28, 2025 -
Misc. bug: The llama-server not read the "--keep" param that user input in the cli
#12927 closed
May 28, 2025
31 Issues opened by 30 people
-
Compile bug: Race condition during compilation, compilation works with -j 1 but not with -j 8
#13993 opened
Jun 3, 2025 -
Compile bug:
#13992 opened
Jun 3, 2025 -
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 opened
Jun 3, 2025 -
Feature Request:
#13989 opened
Jun 3, 2025 -
Misc. bug: new kv cell seq implementation does not handle "seq_id = -1" specified in the API
#13983 opened
Jun 3, 2025 -
Misc. bug: sentencepiece not included in requirements.txt
#13982 opened
Jun 3, 2025 -
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU
#13978 opened
Jun 2, 2025 -
Eval bug: Unexpected failure converting Mistral 7B v0.2 to f32 GGUF
#13976 opened
Jun 2, 2025 -
Misc. bug: llama-bench improper tensor split
#13972 opened
Jun 2, 2025 -
context shifting should be default option?
#13971 opened
Jun 2, 2025 -
make using shifting context easier.
#13969 opened
Jun 2, 2025 -
Eval bug: Unable to load the model on GPU
#13967 opened
Jun 2, 2025 -
Eval bug: llama.cpp crashes in string comparison when using a reasoning model for long periods of time
#13965 opened
Jun 2, 2025 -
Feature Request: WINA
#13964 opened
Jun 2, 2025 -
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 opened
Jun 2, 2025 -
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 opened
Jun 1, 2025 -
Eval bug: llama-tts abort
#13955 opened
Jun 1, 2025 -
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 opened
May 31, 2025 -
Feature Request: Generate Image Embeddings with llama.cpp
#13913 opened
May 30, 2025 -
android built on GPU cannot comparable with CPU?
#13910 opened
May 30, 2025 -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 opened
May 29, 2025 -
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 opened
May 29, 2025 -
Eval bug: std::runtime_error Invalid diff:
#13876 opened
May 28, 2025 -
Feature Request: Make the `/completion` endpoint in `llama-server` work with multimodal models
#13872 opened
May 28, 2025 -
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 opened
May 28, 2025 -
Feature Request: Optimize for Nvidia Jetson Series' truly Unified Memory Architecture
#13856 opened
May 28, 2025 -
Eval bug: Embeddings Always returned as non
#13854 opened
May 28, 2025 -
Feature Request: Set default of --numa to distribute
#13850 opened
May 28, 2025 -
Dequantize function: Row misalignment in dequantized tensors - only first column matches original
#13839 opened
May 28, 2025
68 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
#13813 commented on
Jun 3, 2025 • 5 new comments -
ggml: improve ggml_backend_cuda_cpy_tensor_async
#13818 commented on
Jun 1, 2025 • 3 new comments -
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on
Jun 3, 2025 • 3 new comments -
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 commented on
May 28, 2025 • 2 new comments -
SYCL: Implement few same quantized type copy kernels
#13739 commented on
Jun 3, 2025 • 1 new comment -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Jun 3, 2025 • 0 new comments -
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on
May 31, 2025 • 0 new comments -
llama : initial Mamba-2 support
#9126 commented on
May 30, 2025 • 0 new comments -
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on
Jun 1, 2025 • 0 new comments -
Add PaliGemma Support
#7553 commented on
Jun 1, 2025 • 0 new comments -
Llama cpp low level python bindings
#1660 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Jun 3, 2025 • 0 new comments -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on
Jun 3, 2025 • 0 new comments -
Feature Request: s390x CI
#13243 commented on
Jun 3, 2025 • 0 new comments -
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 commented on
Jun 3, 2025 • 0 new comments -
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 commented on
Jun 3, 2025 • 0 new comments -
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 commented on
Jun 2, 2025 • 0 new comments -
Slow token generation speed of Gemma 3 QAT Models
#13048 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: OpenCL: Issue with Adreno 610
#13115 commented on
Jun 2, 2025 • 0 new comments -
Eval bug: sentencepiece tokenizer generates incorrect tokens
#13256 commented on
Jun 2, 2025 • 0 new comments -
WIP: Add support for CogAgent
#12679 commented on
May 29, 2025 • 0 new comments -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
May 29, 2025 • 0 new comments -
convert : write tensors in parallel
#12837 commented on
Jun 2, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jun 3, 2025 • 0 new comments -
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on
Jun 3, 2025 • 0 new comments -
webui: Add editing assistant messages (#11849)
#13522 commented on
May 29, 2025 • 0 new comments -
Granite Four
#13550 commented on
May 30, 2025 • 0 new comments -
scripts: update pyproject.toml - deprecated poetry config + support uv
#13615 commented on
May 28, 2025 • 0 new comments -
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 commented on
Jun 3, 2025 • 0 new comments -
model : jina-embeddings-v3 support
#13693 commented on
Jun 1, 2025 • 0 new comments -
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 commented on
May 28, 2025 • 0 new comments -
ggml : add ggml_fill()
#13772 commented on
May 31, 2025 • 0 new comments -
server: args for draft model cache types (#11200)
#13782 commented on
May 30, 2025 • 0 new comments -
examples : support MiniCPM-V-2
#13828 commented on
May 28, 2025 • 0 new comments -
Eval bug: Can't utilize all 16 threads / 8 CPU cores for prompt processing when using llama-server. works fine with llama-cli
#13197 commented on
May 31, 2025 • 0 new comments -
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 commented on
May 30, 2025 • 0 new comments -
Compile bug: Vulkan Cross compile for arm64
#13068 commented on
May 30, 2025 • 0 new comments -
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 commented on
May 30, 2025 • 0 new comments -
Eval bug: Unreadable output when using qwen2-vl model.
#13165 commented on
May 30, 2025 • 0 new comments -
Misc. bug: llama-parallel segmentation fault
#13172 commented on
May 30, 2025 • 0 new comments -
Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp
#13189 commented on
May 30, 2025 • 0 new comments -
Feature Request: Falcon-H1
#13681 commented on
May 29, 2025 • 0 new comments -
Feature Request: Installable package via winget
#8188 commented on
May 29, 2025 • 0 new comments -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on
May 29, 2025 • 0 new comments -
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on
May 29, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
May 29, 2025 • 0 new comments -
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 commented on
May 29, 2025 • 0 new comments -
Eval bug: llama-mtmd-cli doesn't support system prompts
#13454 commented on
May 28, 2025 • 0 new comments -
Feature Request: video support in mtmd-cli / server
#13754 commented on
May 28, 2025 • 0 new comments -
webui: First user prompt sometimes disappears after sending
#13622 commented on
May 28, 2025 • 0 new comments -
Eval bug: Llama 4 Scout/Maverick crash when processing images with certain aspect ratio
#13827 commented on
May 27, 2025 • 0 new comments -
Misc. bug: the output file of llama-quantize is not gguf format
#13258 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: Server does not always cancel requests for disconnected connections
#13262 commented on
Jun 2, 2025 • 0 new comments -
Feature Request: add to llama-bench device info reporting of "bf16:1", if built with VK_KHR_bfloat16 support and driver also supports it..
#13274 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: vulkan prompt processing suddenly slows down once I reach a certain prompt size
#13765 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: Cannot load Qwen3 ranking models
#13820 commented on
Jun 1, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Jun 1, 2025 • 0 new comments -
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.SCB'
#12923 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout
#13240 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: terminate called after throwing an instance of 'vk::DeviceLostError'
#13248 commented on
Jun 1, 2025 • 0 new comments -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 commented on
Jun 1, 2025 • 0 new comments -
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on
May 31, 2025 • 0 new comments -
Misc. bug: Decreased success rate for tool calling
#13769 commented on
May 31, 2025 • 0 new comments -
Misc. bug: xcframework does not contain support for Catalyst
#12751 commented on
May 31, 2025 • 0 new comments -
Eval bug: SIGILL
#13161 commented on
May 31, 2025 • 0 new comments