Skip to content

Releases: struct/llama.cpp

b5884

12 Jul 23:28
c31e606
Compare
Choose a tag to compare
tests : cover lfm2 cases in test_ssm_conv (#14651)

b5849

08 Jul 23:14
6efcd65
Compare
Choose a tag to compare
vulkan: optimize flash attention split_k_reduce (#14554)

* vulkan: allow FA split_k with smaller KV values

* vulkan: spread split_k_reduce work across more threads

k_num can get rather large. Use the whole workgroup to reduce the M/L values.

Launch a thread for each element in the HSV dimension of the output. Helps a
lot for large HSV (like deepseek).