Skip to content

sycl: Batched mulmat rework for oneDNN dispatch #14617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 14, 2025

Conversation

ShanoToni
Copy link
Contributor

PR proposes a rework to existing dispatch of batched Mul_Mats for the sycl backend to oneDNN to allow better use of broadcasts for non matching batch sizes for inputs and handle non continuous data being passed in the tensors. This reduces the number of calls to oneDNN matmul.

Additionaly small fix added to PR for the ggml_sycl_mul_mat_vec_nc case to allow src1 to be non-continuous as well.

test-backend-ops passing all tests

llama-bench running model qwen2 shows no performance regression compared to master
running on Intel Battlemage

  • Master
model size params backend ngl test t/s
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 pp512 8531.35 ± 62.72
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 tg128 157.24 ± 0.27
  • PR branch
model size params backend ngl test t/s
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 pp512 8602.93 ± 39.79
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 tg128 157.20 ± 0.76

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 10, 2025
Copy link
Collaborator

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing the semantics of the dnn calls

@ShanoToni ShanoToni force-pushed the sycl_mul_mat_batched_rework branch from 747c12e to b2785f8 Compare July 11, 2025 10:57
@Alcpz Alcpz merged commit 65a3ebb into ggml-org:master Jul 14, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants