[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests #156599

naveenthangudu · 2025-06-23T12:56:49Z

🚀 [ZENDNN] Integrate and Optimize zendnn_linear with Fusion and Prepack Support
📌 Summary
This PR introduces the zendnn_linear operator into PyTorch with full support for:

Unary and binary post-op fusions (e.g., ReLU, SiLU, GELU, Tanh, Sigmoid)
Weight prepacking for performance optimization
Extensive unit testing
ZenDNN integration with freezing path of torch compile(Inductor) and torch export(AOT-Inductor)

🔧 Key Features

ZenDNN Linear Operator Integration
Implemented zendnn_linear op
Registered in PyTorch with meta and shim support
Integrated into native_functions.yaml and AOT-Inductor backend
Fusion Support
Enabled unary and binary post-op fusions
Implemented linear_unary_binary fusion op
Added meta and shim functions for fusion support
Weight Prepacking
Introduced zendnn_weight_prepack_for_linear op
Added graph pass to insert prepack op into AOT-Inductor graph
Enabled via zendnn.optimize() API when weight_prepack=True
Validated via unit tests and graph inspection
Testing
Added unit tests for zendnn_linear and its fusions
Model-level tests for export and accuracy validation
Cleaned up redundant comments in test files
Infrastructure Enhancements
Added ZenDNN as an optional third-party library
Introduced USE_ZENDNN CMake flag
Added Python API torch._C.has_zendnn() for runtime checks
Zen4 CPU detection and compatibility validation

👥 Authors and Contributors
Naveen Kumar T — NAVEEN.THANGUDU@amd.com
Ankit Jaiswal — ankit.jaiswal@amd.com
Mrigank Srivastava — Mrigank.Srivastava@amd.com
Priyansh Jain — priyansh.jain2@amd.com
Dinesh Mareedu — dinesh.mareedu@amd.com
Harshal Adhav — harshal.adhav@amd.com
Charan Ponnada — charan.ponnada@amd.com
Chinmay Kulkarni — Chinmay.Kulkarni@amd.com
Aakar Dwivedi — aakar.dwivedi@amd.com

RFC

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

pytorch-bot · 2025-06-23T12:56:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156599

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0ef40c6 with merge base e273ff0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-06-23T12:56:54Z

The committers listed above are authorized under a signed CLA.

✅ login: naveenthangudu / name: Naveen Kumar (f6f3c9b, 0ef40c6)
✅ login: Dinesh-Mareedu (aacf746, c63337d)
✅ login: Ankit-Jaiswal-AMD (23b6ae9, 4b1fb88, 5c1f88b)
✅ login: mriganks-amd (7c034b2, 34aba7e)
✅ login: harshaladhav-amd / name: Harshal Adhav (5e9f300, f77fa44)
✅ login: Priyjain-amd / name: Priyansh Jain (a9b2cbe, bb928d9)
✅ login: charan-ponnada (d9966aa)

malfet

Why this needs to be in code? torch.aten.ops.zendnn_linear can very easily be added as an extension

hl475 · 2025-07-02T16:11:40Z

aten/src/ATen/native/zendnn/Linear.cpp

+#include <ATen/native/zendnn/Linear_utils.hpp>
+#if !AT_ZENDNN_ENABLED()
+namespace at::native {
+at::Tensor zendnn_linear(const at::Tensor &input, const at::Tensor &weight,


can you please point me where weight is prepacked? if done in zendnn, i would suggest we expose the api to zentorch as well, and then have an api to unpack.

Once we have pack/unpack apis for the weight, please add the unit test in the PR as well

can you please point me where weight is prepacked? if done in zendnn, i would suggest we expose the api to zentorch as well, and then have an api to unpack.

Once we have pack/unpack apis for the weight, please add the unit test in the PR as well

We are going to add linear weight prepack API in the revised PR. will add unit tests for the same. will get back to you on unpack op.

We already support weight prepacking for quantized weights in zentorch. Please refer to the links provided for more information.

https://github.com/amd/ZenDNN-pytorch-plugin/blob/2446996647fa950ccf56e53d827a1cb31f6e4109/src/cpu/cpp/WeightReorder.cpp#L23

https://github.com/amd/ZenDNN-pytorch-plugin/blob/main/src/cpu/python/zentorch/_StaticQuantizedLinear.py#L81

cc: @amukho

albanD · 2025-07-21T19:50:27Z

Just to make sure you're not spending too much time on a PR that is not going to be merged.
We do not accept large code-drop PRs in the repo. Every PR needs to be a self contained piece that can be easily reviewed and merged.
Please also make sure to address all the different concerns on the issue and get the green light there to ensure your plan is aligned with maintainers.

naveenthangudu · 2025-07-22T16:43:14Z

Just to make sure you're not spending too much time on a PR that is not going to be merged. We do not accept large code-drop PRs in the repo. Every PR needs to be a self contained piece that can be easily reviewed and merged. Please also make sure to address all the different concerns on the issue and get the green light there to ensure your plan is aligned with maintainers.

I've created this PR to demonstrate potential perf improvements with ZenDNN linear ops on EPYC servers. As suggested, I’ll split it into smaller, self-contained PRs. Note that the performance improvements are dependent on fusions, hence gains will become evident only after the first few PRs are merged.

naveenthangudu · 2025-08-06T11:02:32Z

with this POC PR

impact on binary size as below

Torch Binary	Size Without ZenDNN	Size with ZenDNN	Size Increase	Percentage Increase
Torch wheel file	192 MB	197 MB	5 MB	2.6%
Torch CPU library (libtorch_cpu.so)	432 MB	446 MB	14 MB	3.24%

fp32 perf data on genoa machines as below

Suite	Pass Rate	ZenDNN vs Inductor (geomean)
TorchBench	76/80	1.04
HuggingFace	42/45	1.43
TIMM	57/61	1

will generate and add bf16 data too

will also work to reproduce the numbers on inductor perf board

- Add ZenDNN as a third-party library and link it to ATen - Add ZenDNN(caffe2::zendnn) as a caffe2 dependency - Introduce USE_ZENDNN CMake ENV option - default OFF - user-enableable, valid only on x86_64 hosts - Provide torch._C.has_zendnn() to query availability from Python - Define AT_ZENDNN_ENABLED macro to gate C/C++ code - Add ZenDNN as deperdency to ATen library - Add ZenDNN as a submodule - Extend the CMake summary to report ZenDNN build status - Add USE_ZENDNN setting and its status to build settings string in __confg__.show() - Add variable substitutions for bazel build. - Modify zendnn inductor perf builds with "zen" keyword. - USE_ZENDNN set to 1 for *zen* build enviroments. Co-authored-by: Dinesh Mareedu <Dinesh.Mareedu@amd.com> Co-authored-by: Aakar Dwivedi <aakar.dwivedi@amd.com> Change-Id: I8fdd93e11384d3550557f163faf34a7b8f18a6a9

- Implement the linear op Co-authored-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com> Co-authored-by: Harshal Adhav <harshal.adhav@amd.com> Change-Id: Ie62ea7c102ec7d48280f8858695a92137ebca04c

Change-Id: Iedd2988244b35d44f596dd427f2a5378154143ee

-- Implement zendnn_weight_prepack_for_linear op to prepack weight into zendnn optimized blocked format -- Update zendnn_linear op to support prepacked weight -- Add unit tests to validate the weight prepacking zendnn_linear Change-Id: Ie1e2a2bb3561eb5f8f4dc64431e2c7a9ab2d434d

-- zendnn_linear and weigh prepack support is registered -- Added in native_functions.yaml -- Added fake tensor support by meta registrations -- Added shim file for AOT-Inductor support Change-Id: I8ecc4685d666dc7ff4364d13ed8dd48d02d98afe

- Add required infra. - Add optimize in joint_graph_passes. Co-authored-by: Dinesh Mareedu <Dinesh.Mareedu@amd.com> Co-authored-by: Charan Ponnada <charan.ponnada@amd.com> Change-Id: I371d50ca958e75bb048b60ab7521affa39617ae6

-Added amd zen4 detection function -Added Python binding for is_amd_zen4() -Check ZenDNN availability (torch._C.has_zendnn), user configuration (USE_ZENDNN env var), and CPU compatibility (torch._C._cpu._is_amd_zen4_or_newer()) Change-Id: Ia57243efdd92a4966b38ae2de715d45e87b9579f

-- Register graph replacement patterns to add weight prepack op into aot inductor graph -- Enable weight prepack optimization through zendnn's optimize api when inductor config for weight_prepack is True with inductor's freezing path -- Add test to validate accuracy when prepack op is inserted, as well to validate weight prepack op is inserted into the graph Change-Id: Ib4722e816cf19d6dadddc0a060067eca698d1063

Change-Id: If9cc90f54cb584430fc1d740594bb7292092fc97

- Enable linear unary and Binary fusions - Implement linear_unary_binary fusion op Change-Id: I5439323196a71fe0adc5eba22440456f9d688829

-- Added shim and meta functions for post-op and unary binary fusions Change-Id: I517963d8021890be50e47abdd041bd67f6ccd785

- Add relu, silu, gelu, tanh and sigmoid fusions with zendnn_linear. Co-authored-by: priyansh jain <priyansh.jain2@amd.com> Change-Id: I49c735b7e848f1c2f04d0dbe85b9a93e29a01c52

- Added tests for unary and binary fusions with zendnn_linear - Added tests for export path - Removed unnecessary comments from the test_zendnn_linear.py and test_zendnn_linear_fusions.py Change-Id: Ide4e649a8fc510ea567f07b45823cb21e921fa6d

amukho · 2025-08-08T17:27:30Z

hi @malfet can you please add the ciflow/inductor-perf-test-nightly-x86-zen label to this PR to enable triggering the inductor perf dashboard upload with the changes in this PR?

updated ZenDNN from work week 28 commit to work week 32 commit Change-Id: Ia91e671a9ba017d03060c493f7f24020d672dde6

pytorchbot added the open source label Jun 23, 2025

malfet requested changes Jun 25, 2025

View reviewed changes

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 26, 2025

hl475 reviewed Jul 2, 2025

View reviewed changes

naveenthangudu force-pushed the gh/naveenthangudu/zendnn-ph1-pr1 branch from b7eed5e to d84394b Compare July 21, 2025 19:17

naveenthangudu requested review from a team and albanD as code owners July 21, 2025 19:17

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor release notes: releng release notes category release notes: inductor (aoti) labels Jul 21, 2025

pytorch-bot bot added the ciflow/inductor label Jul 22, 2025

naveenthangudu force-pushed the gh/naveenthangudu/zendnn-ph1-pr1 branch from d84394b to 779b34b Compare August 5, 2025 13:35

pytorch-bot bot removed the ciflow/inductor label Aug 5, 2025

jithunnair-amd added the ciflow/inductor-perf-test-nightly-x86-zen Trigger inductor perf tests on Zen x86 CPUs label Aug 6, 2025

naveenthangudu and others added 9 commits August 7, 2025 10:44

[ZENDNN] Integrate ZenDNN Linear Op

a9b2cbe

- Implement the linear op Co-authored-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com> Co-authored-by: Harshal Adhav <harshal.adhav@amd.com> Change-Id: Ie62ea7c102ec7d48280f8858695a92137ebca04c

[ZENDNN] Added op tests for zendnn_linear

23b6ae9

Change-Id: Iedd2988244b35d44f596dd427f2a5378154143ee

[ZENDNN] Add zendnn optimize function

34aba7e

- Add required infra. - Add optimize in joint_graph_passes. Co-authored-by: Dinesh Mareedu <Dinesh.Mareedu@amd.com> Co-authored-by: Charan Ponnada <charan.ponnada@amd.com> Change-Id: I371d50ca958e75bb048b60ab7521affa39617ae6

[ZENDNN] Added model tests for zendnn_linear

4b1fb88

Change-Id: If9cc90f54cb584430fc1d740594bb7292092fc97

Priyjain-amd and others added 4 commits August 7, 2025 10:51

[ZENDNN] Integrate Single Post_op Fusions

bb928d9

- Enable linear unary and Binary fusions - Implement linear_unary_binary fusion op Change-Id: I5439323196a71fe0adc5eba22440456f9d688829

[ZENDNN] Updating meta and shim registration files for fusions

aacf746

-- Added shim and meta functions for post-op and unary binary fusions Change-Id: I517963d8021890be50e47abdd041bd67f6ccd785

[ZENDNN] Single linear post-op fusions.

7c034b2

- Add relu, silu, gelu, tanh and sigmoid fusions with zendnn_linear. Co-authored-by: priyansh jain <priyansh.jain2@amd.com> Change-Id: I49c735b7e848f1c2f04d0dbe85b9a93e29a01c52

[ZENDNN] Added fusion tests for zendnn_linear

5c1f88b

- Added tests for unary and binary fusions with zendnn_linear - Added tests for export path - Removed unnecessary comments from the test_zendnn_linear.py and test_zendnn_linear_fusions.py Change-Id: Ide4e649a8fc510ea567f07b45823cb21e921fa6d

naveenthangudu force-pushed the gh/naveenthangudu/zendnn-ph1-pr1 branch from 779b34b to 5c1f88b Compare August 7, 2025 16:55

pytorch-bot bot removed the ciflow/inductor-perf-test-nightly-x86-zen Trigger inductor perf tests on Zen x86 CPUs label Aug 7, 2025

[ZENDNN] update ZenDNN tag

0ef40c6

updated ZenDNN from work week 28 commit to work week 32 commit Change-Id: Ia91e671a9ba017d03060c493f7f24020d672dde6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests #156599

[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests #156599

Uh oh!

naveenthangudu commented Jun 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

malfet left a comment

Uh oh!

hl475 Jul 2, 2025

Uh oh!

naveenthangudu Jul 2, 2025

Uh oh!

albanD commented Jul 21, 2025

Uh oh!

naveenthangudu commented Jul 22, 2025

Uh oh!

naveenthangudu commented Aug 6, 2025 •

edited

Loading

Uh oh!

amukho commented Aug 8, 2025

Uh oh!

Uh oh!

[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests #156599

Are you sure you want to change the base?

[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests #156599

Uh oh!

Conversation

naveenthangudu commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156599

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

hl475 Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

naveenthangudu Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

albanD commented Jul 21, 2025

Uh oh!

naveenthangudu commented Jul 22, 2025

Uh oh!

naveenthangudu commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amukho commented Aug 8, 2025

Uh oh!

Uh oh!

naveenthangudu commented Jun 23, 2025 •

edited

Loading

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jun 23, 2025 •

edited

Loading

naveenthangudu commented Aug 6, 2025 •

edited

Loading