-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests #156599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests #156599
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156599
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0ef40c6 with merge base e273ff0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this needs to be in code? torch.aten.ops.zendnn_linear can very easily be added as an extension
#include <ATen/native/zendnn/Linear_utils.hpp> | ||
#if !AT_ZENDNN_ENABLED() | ||
namespace at::native { | ||
at::Tensor zendnn_linear(const at::Tensor &input, const at::Tensor &weight, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please point me where weight is prepacked? if done in zendnn, i would suggest we expose the api to zentorch as well, and then have an api to unpack.
Once we have pack/unpack apis for the weight, please add the unit test in the PR as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please point me where weight is prepacked? if done in zendnn, i would suggest we expose the api to zentorch as well, and then have an api to unpack.
Once we have pack/unpack apis for the weight, please add the unit test in the PR as well
We are going to add linear weight prepack API in the revised PR. will add unit tests for the same. will get back to you on unpack op.
We already support weight prepacking for quantized weights in zentorch. Please refer to the links provided for more information.
cc: @amukho
b7eed5e
to
d84394b
Compare
Just to make sure you're not spending too much time on a PR that is not going to be merged. |
I've created this PR to demonstrate potential perf improvements with ZenDNN linear ops on EPYC servers. As suggested, I’ll split it into smaller, self-contained PRs. Note that the performance improvements are dependent on fusions, hence gains will become evident only after the first few PRs are merged. |
d84394b
to
779b34b
Compare
with this POC PR impact on binary size as below
fp32 perf data on genoa machines as below
will generate and add bf16 data too will also work to reproduce the numbers on inductor perf board |
- Add ZenDNN as a third-party library and link it to ATen - Add ZenDNN(caffe2::zendnn) as a caffe2 dependency - Introduce USE_ZENDNN CMake ENV option - default OFF - user-enableable, valid only on x86_64 hosts - Provide torch._C.has_zendnn() to query availability from Python - Define AT_ZENDNN_ENABLED macro to gate C/C++ code - Add ZenDNN as deperdency to ATen library - Add ZenDNN as a submodule - Extend the CMake summary to report ZenDNN build status - Add USE_ZENDNN setting and its status to build settings string in __confg__.show() - Add variable substitutions for bazel build. - Modify zendnn inductor perf builds with "zen" keyword. - USE_ZENDNN set to 1 for *zen* build enviroments. Co-authored-by: Dinesh Mareedu <Dinesh.Mareedu@amd.com> Co-authored-by: Aakar Dwivedi <aakar.dwivedi@amd.com> Change-Id: I8fdd93e11384d3550557f163faf34a7b8f18a6a9
- Implement the linear op Co-authored-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com> Co-authored-by: Harshal Adhav <harshal.adhav@amd.com> Change-Id: Ie62ea7c102ec7d48280f8858695a92137ebca04c
Change-Id: Iedd2988244b35d44f596dd427f2a5378154143ee
-- Implement zendnn_weight_prepack_for_linear op to prepack weight into zendnn optimized blocked format -- Update zendnn_linear op to support prepacked weight -- Add unit tests to validate the weight prepacking zendnn_linear Change-Id: Ie1e2a2bb3561eb5f8f4dc64431e2c7a9ab2d434d
-- zendnn_linear and weigh prepack support is registered -- Added in native_functions.yaml -- Added fake tensor support by meta registrations -- Added shim file for AOT-Inductor support Change-Id: I8ecc4685d666dc7ff4364d13ed8dd48d02d98afe
- Add required infra. - Add optimize in joint_graph_passes. Co-authored-by: Dinesh Mareedu <Dinesh.Mareedu@amd.com> Co-authored-by: Charan Ponnada <charan.ponnada@amd.com> Change-Id: I371d50ca958e75bb048b60ab7521affa39617ae6
-Added amd zen4 detection function -Added Python binding for is_amd_zen4() -Check ZenDNN availability (torch._C.has_zendnn), user configuration (USE_ZENDNN env var), and CPU compatibility (torch._C._cpu._is_amd_zen4_or_newer()) Change-Id: Ia57243efdd92a4966b38ae2de715d45e87b9579f
-- Register graph replacement patterns to add weight prepack op into aot inductor graph -- Enable weight prepack optimization through zendnn's optimize api when inductor config for weight_prepack is True with inductor's freezing path -- Add test to validate accuracy when prepack op is inserted, as well to validate weight prepack op is inserted into the graph Change-Id: Ib4722e816cf19d6dadddc0a060067eca698d1063
Change-Id: If9cc90f54cb584430fc1d740594bb7292092fc97
- Enable linear unary and Binary fusions - Implement linear_unary_binary fusion op Change-Id: I5439323196a71fe0adc5eba22440456f9d688829
-- Added shim and meta functions for post-op and unary binary fusions Change-Id: I517963d8021890be50e47abdd041bd67f6ccd785
- Add relu, silu, gelu, tanh and sigmoid fusions with zendnn_linear. Co-authored-by: priyansh jain <priyansh.jain2@amd.com> Change-Id: I49c735b7e848f1c2f04d0dbe85b9a93e29a01c52
- Added tests for unary and binary fusions with zendnn_linear - Added tests for export path - Removed unnecessary comments from the test_zendnn_linear.py and test_zendnn_linear_fusions.py Change-Id: Ide4e649a8fc510ea567f07b45823cb21e921fa6d
779b34b
to
5c1f88b
Compare
hi @malfet can you please add the ciflow/inductor-perf-test-nightly-x86-zen label to this PR to enable triggering the inductor perf dashboard upload with the changes in this PR? |
updated ZenDNN from work week 28 commit to work week 32 commit Change-Id: Ia91e671a9ba017d03060c493f7f24020d672dde6
🚀 [ZENDNN] Integrate and Optimize zendnn_linear with Fusion and Prepack Support
📌 Summary
This PR introduces the zendnn_linear operator into PyTorch with full support for:
🔧 Key Features
Implemented zendnn_linear op
Registered in PyTorch with meta and shim support
Integrated into native_functions.yaml and AOT-Inductor backend
Enabled unary and binary post-op fusions
Implemented linear_unary_binary fusion op
Added meta and shim functions for fusion support
Introduced zendnn_weight_prepack_for_linear op
Added graph pass to insert prepack op into AOT-Inductor graph
Enabled via zendnn.optimize() API when weight_prepack=True
Validated via unit tests and graph inspection
Added unit tests for zendnn_linear and its fusions
Model-level tests for export and accuracy validation
Cleaned up redundant comments in test files
Added ZenDNN as an optional third-party library
Introduced USE_ZENDNN CMake flag
Added Python API torch._C.has_zendnn() for runtime checks
Zen4 CPU detection and compatibility validation
👥 Authors and Contributors
Naveen Kumar T — NAVEEN.THANGUDU@amd.com
Ankit Jaiswal — ankit.jaiswal@amd.com
Mrigank Srivastava — Mrigank.Srivastava@amd.com
Priyansh Jain — priyansh.jain2@amd.com
Dinesh Mareedu — dinesh.mareedu@amd.com
Harshal Adhav — harshal.adhav@amd.com
Charan Ponnada — charan.ponnada@amd.com
Chinmay Kulkarni — Chinmay.Kulkarni@amd.com
Aakar Dwivedi — aakar.dwivedi@amd.com
RFC
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela