Skip to content

Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512. #138388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

maajidkhann
Copy link
Contributor

@maajidkhann maajidkhann commented Oct 19, 2024

This is a follow up PR which extends SVE VEC backend support to SVE128 and SVE512:

Main OSS PR: #119571 - Extending the PyTorch VEC backend for SVE (ARM) with SVE256

Features:
Adding support only for SVE ISA with the vector length of 128 and 512.

  • This will leverage the already existing SVE code as SVE is VLA.

Updates Dispatch mechanism to pick the right SVE backend during runtime depending on hardware's VL.

  • SVE512 , SVE256, SVE128 or DEFAULT (NEON)

Extend Inductor support to SVE 512 and SVE 128.

Verifying builds and Tests:
*SVE128 VL is now available on HW's like Graviton4, Grace CPU's, etc.
*SVE512 VL is now available on HW's like Fugaku.

Perf Numbers:
Benchmarking Method: https://github.com/pytorch/benchmark
Metric used: Average Inference latency per iteration/request.

Current Flow (Default/Neon) SVE Vector Length PR Link PR status SVE Perf Gains
Neon 128 SVE128 https://github.com/pytorch/ /pull/138388 Current PR Refer to attachments
Neon 128 SVE 256 https://github.com/pytorch/ /pull/119571 Merged Refer to attachments
Neon 128 SVE 512 https://github.com/pytorch/ /pull/138388 Current PR Refer to attachments

Perf Data Attachments:
1.Neon vs SVE128 - Compile Mode: HW: r8g.2xlarge (8vcpus)
pytorch_sve128_vs_neon_compile.xlsx

2.Neon vs SVE128 - Eager Mode: HW: r8g.2xlarge (8vcpus)
pytorch_sve128_vs_neon_eager.xlsx

3.Neon vs SVE256 - Compile Mode: HW: c7g.8xlarge (32vcpus)
pytorch_sve256_vs_neon_compile.xlsx

4.Neon vs SVE256 - Eager Mode: HW: c7g.8xlarge (32vcpus)
pytorch_sve256_vs_neon_eager.xlsx

5.Neon vs SVE512 - Compile Mode: HW: Fugaku CPU A64FX (48vcpus)
pytorch_sve512_vs_neon_compile.xlsx

6..Neon vs SVE512 - Eager Mode: HW: Fugaku CPU A64FX (48vcpus)
pytorch_sve512_vs_neon_eager.xlsx

Metric used in the sheets: (Less is Better)
Average Inference Latency (ms)

NOTE:
In the excel sheets, the rows highlighted are in three color modes:

Green - SVE outperforms Neon
Yellow - SVE is on par with Neon
Red - Neon is slightly better than SVE

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @yf225 @ColinPeppler @desertfire @rec

Copy link

pytorch-bot bot commented Oct 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138388

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5730c09 with merge base 91e7c79 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor release notes: sparse release notes category labels Oct 19, 2024
@maajidkhann
Copy link
Contributor Author

@pytorchbot label "module: arm"

@pytorch-bot pytorch-bot bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label Oct 19, 2024
@maajidkhann
Copy link
Contributor Author

@pytorchbot label "ciflow/linux-aarch64"

Copy link

pytorch-bot bot commented Oct 19, 2024

Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help.

@maajidkhann
Copy link
Contributor Author

CC @malfet @aditew01 @abhishek-iitmadras

@maajidkhann maajidkhann force-pushed the vec_backend_for_sve_128_512 branch from ae0e3fd to ce24714 Compare October 19, 2024 16:53
@abhishek-iitmadras
Copy link
Collaborator

@pytorchbot label "ciflow/linux-aarch64"

Copy link

pytorch-bot bot commented Oct 21, 2024

Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help.

@maajidkhann
Copy link
Contributor Author

@malfet Can you please add the - ciflow/linux-aarch64 tag and trigger the CI pipelines.

@digantdesai digantdesai added the ciflow/linux-aarch64 linux aarch64 CI workflow label Oct 23, 2024
@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 23, 2024
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look, seems good to me at a high level, left some nit comments.
I will defer to @malfet for approval.

# If SVE256 support is not found, set CXX_SVE_FOUND to FALSE and notify the user
if(NOT CXX_SVE256_FOUND)
# If SVE128 and SVE256 and SVE512 support is not found, set CXX_SVE_FOUND to FALSE and notify the user
if(NOT CXX_SVE128_FOUND AND NOT CXX_SVE256_FOUND AND NOT CXX_SVE512_FOUND)
set(CXX_SVE_FOUND FALSE CACHE BOOL "SVE not available on host")
message(STATUS "No SVE processor on this machine.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host machine doesn't have to have SVE support just the toolchain right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. You just need a toolchain (such as GCC or Clang) that has SVE support. Hardware doesn't need to have SVE support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you mind changing the message to Current toolchain could not be used to generate SVE instructions?

@maajidkhann maajidkhann force-pushed the vec_backend_for_sve_128_512 branch from ce24714 to 697e057 Compare October 28, 2024 09:28
@maajidkhann
Copy link
Contributor Author

[PyTorch] Use 128-bit vectors for ARM64 #137426 PR got merged few days back.
So I have rebased this PR with latest main and fixed all the conflicts.

@maajidkhann maajidkhann force-pushed the vec_backend_for_sve_128_512 branch 2 times, most recently from cb02d8e to 93927e2 Compare October 30, 2024 07:59
@maajidkhann
Copy link
Contributor Author

Hello @malfet . Can you please look into this PR and approve it. All the pipelines should pass and all the conflicts have been taken care of.

@maajidkhann maajidkhann force-pushed the vec_backend_for_sve_128_512 branch from 93927e2 to 5aab819 Compare November 6, 2024 12:18
@maajidkhann
Copy link
Contributor Author

#137775 Issue is now closed with the fix:
#137795

Branch rebased to latest main again and conflicts fixed.

@maajidkhann maajidkhann force-pushed the vec_backend_for_sve_128_512 branch 2 times, most recently from 000faa7 to 93d64cc Compare November 12, 2024 09:14
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Jul 31, 2025
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Jul 31, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 1, 2025
Summary:

Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 1, 2025
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 1, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary:

Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary:

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary:

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
pytorch-bot bot pushed a commit that referenced this pull request Aug 6, 2025
Summary:
Pull Request resolved: #158932

Importing #138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 7, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 8, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
pytorch-bot bot pushed a commit that referenced this pull request Aug 9, 2025
Summary:
Pull Request resolved: #158932

Importing #138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Aug 11, 2025
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 11, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 11, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 11, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 11, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Reviewed By: r1mikey

Differential Revision: D70788867
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 12, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Nicoshev pushed a commit to Nicoshev/pytorch that referenced this pull request Aug 12, 2025
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/linux-aarch64 linux aarch64 CI workflow module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor no-stale open source release notes: sparse release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.