[pull] master from ggml-org:master #480

pull · 2025-07-16T15:22:07Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

ggml-ci

* ggml : add asserts ggml-ci * cont : fix constant type Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Support diffusion models: Add Dream 7B * Move diffusion to examples * Move stuff to examples. Add patch to not use kv-cache * Address review comments * Make sampling fast * llama: remove diffusion functions * Add basic timings + cleanup * More cleanup * Review comments: better formating, use LOG instead std::cerr, re-use batch, use ubatch instead of max_length * fixup! * Review: move everything to diffusion-cli for now

* kv-cache : prepare K/V buffers for separation ggml-ci * batched-bench : fix oob write ggml-ci * llama : add "virtual sequences" ggml-ci * llama : use "stream" vs "virtual sequence" ggml-ci * graph : fix stream splitting when KV cache is not used ggml-ci * kv-cache : add multi-stream save/load support ggml-ci * llama : add "--attn-streams" flag ggml-ci * kv-cache : fix handling when find_slot fails ggml-ci * kv-cache : restore find_slot impl ggml-ci * kv-cache : add comments * kv-cache : add bounds checks for sequence id ggml-ci * cont : add n_seq_max to batch allocr ggml-ci * kv-cache : perform stream copies lazily after llama_synchronize ggml-ci * kv-cache : avoid throwing exceptions across the C boundary ggml-ci * CUDA: 4D FlashAttention support (#14628) * CUDA: 4D FlashAttention support * CUDA: fix WMMA FA kernel * llama : rename attn_streams -> kv_unified ggml-ci * common : rename kv_split -> kv_unified ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Co-authored-by: qwaqrm <qwaqrm@126.com>

* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults * Initialize webgpu device * Making progress on setting up the backend * Finish more boilerplate/utility functions * Organize file and work on alloc buffer * Add webgpu_context to prepare for actually running some shaders * Work on memset and add shader loading * Work on memset polyfill * Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it * Implement get_tensor and buffer_clear * Finish rest of setup * Start work on compute graph * Basic mat mul working * Work on emscripten build * Basic WebGPU backend instructions * Use EMSCRIPTEN flag * Work on passing ci, implement 4d tensor multiplication * Pass thread safety test * Implement permuting for mul_mat and cpy * minor cleanups * Address feedback * Remove division by type size in cpy op * Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends * Fix name * Fix macos dawn prefix path

mitmul and others added 7 commits July 16, 2025 12:12

llama : fix parallel processing for plamo2 (#14716)

e4841d2

server : pre-calculate EOG logit biases (#14721)

6ffd4e9

ggml-ci

ggml : add asserts (#14720)

6497834

* ggml : add asserts ggml-ci * cont : fix constant type Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

model : support output bias for qwen2 (#14711)

b0f0ecc

Co-authored-by: qwaqrm <qwaqrm@126.com>

pull bot locked and limited conversation to collaborators Jul 16, 2025

pull bot added the ⤵️ pull label Jul 16, 2025

pull bot merged commit 21c0217 into rpatil524:master Jul 16, 2025

github-actions bot added documentation Improvements or additions to documentation Nvidia GPU testing examples devops python server ggml labels Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from ggml-org:master #480

[pull] master from ggml-org:master #480

pull bot commented Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

[pull] master from ggml-org:master #480

[pull] master from ggml-org:master #480

Conversation

pull bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pull bot commented Jul 16, 2025 •

edited

Loading