-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512. #138388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512. #138388
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138388
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5730c09 with merge base 91e7c79 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "module: arm" |
@pytorchbot label "ciflow/linux-aarch64" |
Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help. |
ae0e3fd
to
ce24714
Compare
@pytorchbot label "ciflow/linux-aarch64" |
Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help. |
@malfet Can you please add the - ciflow/linux-aarch64 tag and trigger the CI pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick look, seems good to me at a high level, left some nit comments.
I will defer to @malfet for approval.
cmake/Modules/FindARM.cmake
Outdated
# If SVE256 support is not found, set CXX_SVE_FOUND to FALSE and notify the user | ||
if(NOT CXX_SVE256_FOUND) | ||
# If SVE128 and SVE256 and SVE512 support is not found, set CXX_SVE_FOUND to FALSE and notify the user | ||
if(NOT CXX_SVE128_FOUND AND NOT CXX_SVE256_FOUND AND NOT CXX_SVE512_FOUND) | ||
set(CXX_SVE_FOUND FALSE CACHE BOOL "SVE not available on host") | ||
message(STATUS "No SVE processor on this machine.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host machine doesn't have to have SVE support just the toolchain right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. You just need a toolchain (such as GCC or Clang) that has SVE support. Hardware doesn't need to have SVE support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mind changing the message to Current toolchain could not be used to generate SVE instructions
?
ce24714
to
697e057
Compare
[PyTorch] Use 128-bit vectors for ARM64 #137426 PR got merged few days back. |
cb02d8e
to
93927e2
Compare
Hello @malfet . Can you please look into this PR and approve it. All the pipelines should pass and all the conflicts have been taken care of. |
93927e2
to
5aab819
Compare
000faa7
to
93d64cc
Compare
Summary: Importing #138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
This is a follow up PR which extends SVE VEC backend support to SVE128 and SVE512:
Main OSS PR: #119571 - Extending the PyTorch VEC backend for SVE (ARM) with SVE256
Features:
Adding support only for SVE ISA with the vector length of 128 and 512.
Updates Dispatch mechanism to pick the right SVE backend during runtime depending on hardware's VL.
Extend Inductor support to SVE 512 and SVE 128.
Verifying builds and Tests:
*SVE128 VL is now available on HW's like Graviton4, Grace CPU's, etc.
*SVE512 VL is now available on HW's like Fugaku.
Perf Numbers:
Benchmarking Method: https://github.com/pytorch/benchmark
Metric used: Average Inference latency per iteration/request.
Perf Data Attachments:
1.Neon vs SVE128 - Compile Mode: HW: r8g.2xlarge (8vcpus)
pytorch_sve128_vs_neon_compile.xlsx
2.Neon vs SVE128 - Eager Mode: HW: r8g.2xlarge (8vcpus)
pytorch_sve128_vs_neon_eager.xlsx
3.Neon vs SVE256 - Compile Mode: HW: c7g.8xlarge (32vcpus)
pytorch_sve256_vs_neon_compile.xlsx
4.Neon vs SVE256 - Eager Mode: HW: c7g.8xlarge (32vcpus)
pytorch_sve256_vs_neon_eager.xlsx
5.Neon vs SVE512 - Compile Mode: HW: Fugaku CPU A64FX (48vcpus)
pytorch_sve512_vs_neon_compile.xlsx
6..Neon vs SVE512 - Eager Mode: HW: Fugaku CPU A64FX (48vcpus)
pytorch_sve512_vs_neon_eager.xlsx
Metric used in the sheets: (Less is Better)
Average Inference Latency (ms)
NOTE:
In the excel sheets, the rows highlighted are in three color modes:
Green - SVE outperforms Neon
Yellow - SVE is on par with Neon
Red - Neon is slightly better than SVE
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @yf225 @ColinPeppler @desertfire @rec