sycl: quantize and reorder the input to q8_1 when reorder is enabled #13826

AD2605 · 2025-05-27T12:32:43Z

Description

This PR adds a kernel which quantizes and reorders the input when converting the src1 tensor to the type q8_1, when the reorder optimization is enabled.

All the test cases pass when running with the environment variable GGML_SYCL_DISABLE_OPT to 0.

Performance Data. All performance data has been gathered with the above environment variable set to 0, with 2025.1 toolkit, and with the parameters of llama-bench set to -ngl 99 -t 8 -r 10.

Battlemage

model	size	params	backend	ngl	test	t/s master(`f9cd683`)	t/s (This Branch)
qwen2 1.5B Q4_0	1013.62 MiB	1.78 B	SYCL	99	pp512	7454.58 ± 27.22	7410.26 ± 24.63
qwen2 1.5B Q4_0	1013.62 MiB	1.78 B	SYCL	99	tg128	137.00 ± 2.06	135.75 ± 1.88
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	pp512	5713.46 ± 16.98	5690.11 ± 14.11
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	tg128	89.00 ± 1.50	88.76 ± 1.57
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	pp512	3171.06 ± 5.99	3163.50 ± 5.43
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	tg128	69.67 ± 0.51	69.88 ± 0.72
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	pp512	2069.41 ± 2.58	2066.22 ± 2.69
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	tg128	47.36 ± 0.31	47.25 ± 0.26

Lunar Lake

model	size	params	backend	ngl	test	t/s (`f9cd683`)	t/s(This branch)
qwen2 1.5B Q4_0	1013.62 MiB	1.78 B	SYCL	99	pp512	1802.52 ± 75.95	1887.25 ± 11.63
qwen2 1.5B Q4_0	1013.62 MiB	1.78 B	SYCL	99	tg128	55.43 ± 0.17	57.56 ± 1.20
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	pp512	1095.61 ± 28.74	1386.69 ± 15.77
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	tg128	28.35 ± 0.11	29.42 ± 0.37
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	pp512	736.88 ± 2.24	710.65 ± 17.88
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	tg128	22.36 ± 0.27	24.75 ± 0.43
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	pp512	422.54 ± 15.98	447.95 ± 2.62
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	tg128	12.99 ± 0.05	14.11 ± 0.12

…rder_q8_1

Alcpz

Minor comments. This is great work! Thanks

ggml/src/ggml-sycl/mmvq.cpp

ggml/src/ggml-sycl/ggml-sycl.cpp

ggml/src/ggml-sycl/vecdotq.hpp

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>

…gml-org#13826) * [WIP]: fuse q8 quantization and reorder * wip2: fuse q8 quantization and reorder * working q8 reorder commit * restored common.hpp * remove debug prints * remove unnecessary headers and remove trailing whitespace * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>

AD2605 added 6 commits May 23, 2025 16:46

[WIP]: fuse q8 quantization and reorder

0d71ffa

wip2: fuse q8 quantization and reorder

6096ff8

working q8 reorder commit

acd80ec

restored common.hpp

03bd1a6

Merge remote-tracking branch 'origin/master' into ad/quantize_and_reo…

7903264

…rder_q8_1

remove debug prints

ade12bf

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 27, 2025

Alcpz reviewed May 29, 2025

View reviewed changes

ggml/src/ggml-sycl/mmvq.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/vecdotq.hpp Show resolved Hide resolved

AD2605 and others added 2 commits May 29, 2025 10:46

remove unnecessary headers and remove trailing whitespace

79eede6

Update ggml/src/ggml-sycl/ggml-sycl.cpp

5f8bc74

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>

Alcpz approved these changes May 29, 2025

View reviewed changes

Rbiessy approved these changes May 30, 2025

View reviewed changes

Alcpz merged commit 663445b into ggml-org:master Jun 2, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sycl: quantize and reorder the input to q8_1 when reorder is enabled #13826

sycl: quantize and reorder the input to q8_1 when reorder is enabled #13826

AD2605 commented May 27, 2025

Uh oh!

Alcpz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sycl: quantize and reorder the input to q8_1 when reorder is enabled #13826

sycl: quantize and reorder the input to q8_1 when reorder is enabled #13826

Conversation

AD2605 commented May 27, 2025

Description

Battlemage

Lunar Lake

Uh oh!

Alcpz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!