[pull] master from ggml-org:master #411

pull · 2025-07-16T13:49:18Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

ggml-ci

* ggml : add asserts ggml-ci * cont : fix constant type Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Support diffusion models: Add Dream 7B * Move diffusion to examples * Move stuff to examples. Add patch to not use kv-cache * Address review comments * Make sampling fast * llama: remove diffusion functions * Add basic timings + cleanup * More cleanup * Review comments: better formating, use LOG instead std::cerr, re-use batch, use ubatch instead of max_length * fixup! * Review: move everything to diffusion-cli for now

* kv-cache : prepare K/V buffers for separation ggml-ci * batched-bench : fix oob write ggml-ci * llama : add "virtual sequences" ggml-ci * llama : use "stream" vs "virtual sequence" ggml-ci * graph : fix stream splitting when KV cache is not used ggml-ci * kv-cache : add multi-stream save/load support ggml-ci * llama : add "--attn-streams" flag ggml-ci * kv-cache : fix handling when find_slot fails ggml-ci * kv-cache : restore find_slot impl ggml-ci * kv-cache : add comments * kv-cache : add bounds checks for sequence id ggml-ci * cont : add n_seq_max to batch allocr ggml-ci * kv-cache : perform stream copies lazily after llama_synchronize ggml-ci * kv-cache : avoid throwing exceptions across the C boundary ggml-ci * CUDA: 4D FlashAttention support (#14628) * CUDA: 4D FlashAttention support * CUDA: fix WMMA FA kernel * llama : rename attn_streams -> kv_unified ggml-ci * common : rename kv_split -> kv_unified ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ggerganov and others added 6 commits July 16, 2025 12:13

server : fix handling of the ignore_eos flag (#14710)

538cc77

ggml-ci

llama : fix parallel processing for plamo2 (#14716)

e4841d2

server : pre-calculate EOG logit biases (#14721)

6ffd4e9

ggml-ci

ggml : add asserts (#14720)

6497834

* ggml : add asserts ggml-ci * cont : fix constant type Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

pull bot locked and limited conversation to collaborators Jul 16, 2025

pull bot added the ⤵️ pull label Jul 16, 2025

pull bot merged commit 225e7a1 into dumpmemory:master Jul 16, 2025
48 of 51 checks passed

github-actions bot added Nvidia GPU examples server testing python ggml labels Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from ggml-org:master #411

[pull] master from ggml-org:master #411

Uh oh!

pull bot commented Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[pull] master from ggml-org:master #411

[pull] master from ggml-org:master #411

Uh oh!

Conversation

pull bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pull bot commented Jul 16, 2025 •

edited

Loading