[pull] master from ggml-org:master #407

pull · 2025-06-04T13:49:20Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

ggml-ci

* cmake: Simplify build-info.cpp generation The rebuild of build-info.cpp still gets triggered when .git/index gets changes. * cmake: generate build-info.cpp in build dir

Update oneMath commit to merged PR uxlfoundation/oneMath#669 which adds SYCL-Graph support for recording CUDA BLAS commands. With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph enabled. Prior to this change, an error would be thrown. ``` $ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2 UR CUDA ERROR: Value: 700 Name: CUDA_ERROR_ILLEGAL_ADDRESS Description: an illegal memory access was encountered Function: operator() Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154 Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN) Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator() SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code! in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598 $HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. ```

ggml-ci

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>

* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*

* batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci

* Update multimodal.md * Update multimodal.md

* batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display

* vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml-ci

* compare llama-bench: add option to plot * Address review comments: convert case + add type hints * Add matplotlib to requirements * fix tests * Improve comment and fix assert condition for test * Add back default test_name, add --plot_log_scale * use log_scale regardless of x_values

Currently when a model generates output which looks like a tool call, but is invalid an exception is thrown and not handled, causing the cli or llama-server to bail. Instead, handle the chat parser exception and simply return the generated text in such cases. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

* batch : verify multi-sequence input batches ggml-ci * cont : auto-gen positions + verify multi-seq input ggml-ci * cont : first print debug info, then perform validation ggml-ci * cont : fix position auto-gen + add comments ggml-ci

ggml-ci

* readme : add hot PRs * cont * readme : update title * readme : hot PRs links * cont

**Important** LFM2 was [merged ](huggingface/transformers#39340 transformers, but has not yet been released. To convert into gguf, install transformers from source ```shell pip install "transformers @ git+https://github.com/huggingface/transformers.git@main" ```

* vulkan: allow unclamped loads in coopmat2 mul_mat_id shader * vulkan: increase coopmat2 mul_mat_id tile size * vulkan: optimize mat_mul_id row_ids search to batch loads, and port to coopmat1 path * vulkan: use smaller FA row size when head size is large. applies to both scalar and CM2 paths (CM1 isn't used due to shared memory limits)

* vulkan: support SET_ROWS Add variants of the copy_to_quant shader that do the SET_ROWS operation. Change these shaders to spread the work across the workgroup. The memory access pattern is probably not great (one thread per quant block), but should be fine for now. * vulkan: optimize set_rows Larger workgroups for non-quant types. Set "norepeat" (there is manual repeat logic). Use fastmod.

ggml-ci

* CUDA: add set rows for f32 and f16 * Review: change kernel params, use strides from host * Use 1-d kernel * Review: use int64_t for blockDim.x, rename nb->s for clarity

* readme : add LFM2 to models section * fix copy paste...

ggml-ci

* llama : add jinja template for rwkv-world Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* SYCL: Use 1D kernel for set_rows * Remove dangling comment * Refactor and use ceil_div

* scripts: benchmark for HTTP server throughput * fix server connection reset

Remove un-necessary templates from class definition and packing functions Reduce deeply nested conditionals, if-else switching in mnapck function Replace repetitive code with inline functions in Packing functions 2 ~ 7% improvement in Q8 Model 15 ~ 50% improvement in Q4 Model Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

pull bot added the ⤵️ pull label Jun 4, 2025

github-actions bot added ggml devops testing Vulkan build python examples server android SYCL Nvidia GPU labels Jun 4, 2025

ggerganov and others added 18 commits June 13, 2025 08:03

vocab : prevent heap overflow when vocab is too small (#14145)

c33fe8b

ggml-ci

cmake : Improve build-info.cpp generation (#14156)

09cf2c7

* cmake: Simplify build-info.cpp generation The rebuild of build-info.cpp still gets triggered when .git/index gets changes. * cmake: generate build-info.cpp in build dir

sycl: Adding additional cpy dbg print output (#14034)

0889eba

server : fix SWA condition for full context reprocess (#14163)

ffad043

ggml-ci

pooling : make cls_b and cls_out_b optional (#14165)

d714dad

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>

cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)

cc8d081

* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*

readme : remove survey link (#14168)

b7cc774

batch : rework llama_batch_allocr (#14153)

60c6663

* batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci

docs : Update multimodal.md (#14122)

26ff368

* Update multimodal.md * Update multimodal.md

batch : add LLAMA_BATCH_DEBUG environment variable (#14172)

80709b7

* batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display

Merge commit from fork

3cfbbdb

* vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

sycl: fix docker image (#14144)

40643ed

vocab : fix build (#14175)

fb85a28

ggml-ci

docs : remove WIP since PR has been merged (#13912)

00ba772

ggerganov and others added 29 commits July 11, 2025 13:46

llama : move enum llama_vocab_pre_type to implementation (#14631)

0d5375d

ggml-ci

readme : add hot PRs (#14636)

aaa088d

* readme : add hot PRs * cont * readme : update title * readme : hot PRs links * cont

HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (#14634)

756aa10

server : fix pooled embedding output (#14645)

0c1df14

vulkan : implement ggml_roll (ggml/1290)

3e303b1

ggml-ci

vulkan : implement bilinear interpolation (ggml/1291)

74bb294

ggml-ci

sync : ggml

2155357

ggml-ci

vulkan : remove unused vars (#0)

3120413

ggml-ci

sync : ggml

8eff955

CUDA: add set rows for f32 and f16 (#14551)

7de5c7c

* CUDA: add set rows for f32 and f16 * Review: change kernel params, use strides from host * Use 1-d kernel * Review: use int64_t for blockDim.x, rename nb->s for clarity

docs : add LFM2 to models section (#14650)

67eade1

* readme : add LFM2 to models section * fix copy paste...

tests : cover lfm2 cases in test_ssm_conv (#14651)

c31e606

cmake : Add CMake presets for Linux and GCC (#14656)

84b396e

metal : Add missing unary ops Metal support (#14660)

dcf7f2e

ggml : add build-time message to remind about ggml_set_rows (#14661)

05fec5b

ggml-ci

cuda : add ELU support (#14657)

e743cdd

cuda : add set rows for bf16 (#14664)

923e3ea

quantize : fix minor logic flaw in --tensor-type (#14572)

982e347

sycl: Batched mulmat rework for oneDNN dispatch (#14617)

65a3ebb

SYCL: use 1D kernel for set_rows (#14618)

0f4c6ec

* SYCL: Use 1D kernel for set_rows * Remove dangling comment * Refactor and use ceil_div

scripts: benchmark for HTTP server throughput (#14668)

494c589

* scripts: benchmark for HTTP server throughput * fix server connection reset

llama-context: add ability to get logits (#14672)

9c9e4fc

sycl: Hotfix for non dnnl codepath (#14677)

bdca383

cuda: fix build warnings in set-rows.cu (unused variable) (#14687)

cbc68be

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

pull bot merged commit cbc68be into dumpmemory:master Jul 15, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from ggml-org:master #407

[pull] master from ggml-org:master #407

Uh oh!

pull bot commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[pull] master from ggml-org:master #407

[pull] master from ggml-org:master #407

Uh oh!

Conversation

pull bot commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pull bot commented Jun 4, 2025 •

edited

Loading