Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512. #138388

maajidkhann · 2024-10-19T14:33:46Z

This is a follow up PR which extends SVE VEC backend support to SVE128 and SVE512:

Main OSS PR: #119571 - Extending the PyTorch VEC backend for SVE (ARM) with SVE256

Features:
Adding support only for SVE ISA with the vector length of 128 and 512.

This will leverage the already existing SVE code as SVE is VLA.

Updates Dispatch mechanism to pick the right SVE backend during runtime depending on hardware's VL.

SVE512 , SVE256, SVE128 or DEFAULT (NEON)

Extend Inductor support to SVE 512 and SVE 128.

Inductor support to SVE256 path was already introduced and merged using this PR: Extend vectorization with SVE(ARM) with Torch Compile (Inductor) #134672
Added a Python API to detect SVE maximum vector length.

Verifying builds and Tests:
*SVE128 VL is now available on HW's like Graviton4, Grace CPU's, etc.
*SVE512 VL is now available on HW's like Fugaku.

Perf Numbers:
Benchmarking Method: https://github.com/pytorch/benchmark
Metric used: Average Inference latency per iteration/request.

Current Flow (Default/Neon)	SVE Vector Length	PR Link	PR status	SVE Perf Gains
Neon 128	SVE128	https://github.com/pytorch/ /pull/138388	Current PR	Refer to attachments
Neon 128	SVE 256	https://github.com/pytorch/ /pull/119571	Merged	Refer to attachments
Neon 128	SVE 512	https://github.com/pytorch/ /pull/138388	Current PR	Refer to attachments

Perf Data Attachments:
1.Neon vs SVE128 - Compile Mode: HW: r8g.2xlarge (8vcpus)
pytorch_sve128_vs_neon_compile.xlsx

2.Neon vs SVE128 - Eager Mode: HW: r8g.2xlarge (8vcpus)
pytorch_sve128_vs_neon_eager.xlsx

3.Neon vs SVE256 - Compile Mode: HW: c7g.8xlarge (32vcpus)
pytorch_sve256_vs_neon_compile.xlsx

4.Neon vs SVE256 - Eager Mode: HW: c7g.8xlarge (32vcpus)
pytorch_sve256_vs_neon_eager.xlsx

5.Neon vs SVE512 - Compile Mode: HW: Fugaku CPU A64FX (48vcpus)
pytorch_sve512_vs_neon_compile.xlsx

6..Neon vs SVE512 - Eager Mode: HW: Fugaku CPU A64FX (48vcpus)
pytorch_sve512_vs_neon_eager.xlsx

Metric used in the sheets: (Less is Better)
Average Inference Latency (ms)

NOTE:
In the excel sheets, the rows highlighted are in three color modes:
Green - SVE outperforms Neon
Yellow - SVE is on par with Neon
Red - Neon is slightly better than SVE

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @yf225 @ColinPeppler @desertfire @rec

pytorch-bot · 2024-10-19T14:33:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138388

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5730c09 with merge base 91e7c79 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

maajidkhann · 2024-10-19T14:34:25Z

@pytorchbot label "module: arm"

maajidkhann · 2024-10-19T14:34:42Z

@pytorchbot label "ciflow/linux-aarch64"

pytorch-bot · 2024-10-19T14:34:50Z

Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help.

maajidkhann · 2024-10-19T14:39:15Z

CC @malfet @aditew01 @abhishek-iitmadras

abhishek-iitmadras · 2024-10-21T15:12:16Z

@pytorchbot label "ciflow/linux-aarch64"

pytorch-bot · 2024-10-21T15:12:25Z

Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help.

maajidkhann · 2024-10-21T15:18:01Z

@malfet Can you please add the - ciflow/linux-aarch64 tag and trigger the CI pipelines.

digantdesai

Took a quick look, seems good to me at a high level, left some nit comments.
I will defer to @malfet for approval.

aten/src/ATen/cpu/vec/sve/vec_common_sve.h

aten/src/ATen/test/vec_test_all_types.cpp

digantdesai · 2024-10-23T21:35:21Z

cmake/Modules/FindARM.cmake

-    # If SVE256 support is not found, set CXX_SVE_FOUND to FALSE and notify the user
-    if(NOT CXX_SVE256_FOUND)
+    # If SVE128 and SVE256 and SVE512 support is not found, set CXX_SVE_FOUND to FALSE and notify the user
+    if(NOT CXX_SVE128_FOUND AND NOT CXX_SVE256_FOUND AND NOT CXX_SVE512_FOUND)
      set(CXX_SVE_FOUND FALSE CACHE BOOL "SVE not available on host")
      message(STATUS "No SVE processor on this machine.")


The host machine doesn't have to have SVE support just the toolchain right?

Yes, that's right. You just need a toolchain (such as GCC or Clang) that has SVE support. Hardware doesn't need to have SVE support.

So do you mind changing the message to Current toolchain could not be used to generate SVE instructions?

torch/_inductor/cpu_vec_isa.py

maajidkhann · 2024-10-28T09:34:52Z

[PyTorch] Use 128-bit vectors for ARM64 #137426 PR got merged few days back.
So I have rebased this PR with latest main and fixed all the conflicts.

maajidkhann · 2024-10-31T05:43:09Z

Hello @malfet . Can you please look into this PR and approve it. All the pipelines should pass and all the conflicts have been taken care of.

maajidkhann · 2024-11-06T12:19:53Z

#137775 Issue is now closed with the fix:
#137795

Branch rebased to latest main again and conflicts fixed.

Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Importing pytorch#138388, as it provides a performance improvement over the NEON implementation Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Pull Request resolved: #158932 Importing #138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Pull Request resolved: #158932 Importing #138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

github-actions · 2025-08-11T10:40:43Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Reviewed By: r1mikey Differential Revision: D70788867

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

maajidkhann requested review from lezcano, nikitaved and IvanYashchuk as code owners October 19, 2024 14:33

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor release notes: sparse release notes category labels Oct 19, 2024

pytorch-bot bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label Oct 19, 2024

pytorchbot added the open source label Oct 19, 2024

maajidkhann force-pushed the vec_backend_for_sve_128_512 branch from ae0e3fd to ce24714 Compare October 19, 2024 16:53

digantdesai added the ciflow/linux-aarch64 linux aarch64 CI workflow label Oct 23, 2024

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 23, 2024

digantdesai reviewed Oct 23, 2024

View reviewed changes

maajidkhann force-pushed the vec_backend_for_sve_128_512 branch from ce24714 to 697e057 Compare October 28, 2024 09:28

maajidkhann force-pushed the vec_backend_for_sve_128_512 branch 2 times, most recently from cb02d8e to 93927e2 Compare October 30, 2024 07:59

maajidkhann force-pushed the vec_backend_for_sve_128_512 branch from 93927e2 to 5aab819 Compare November 6, 2024 12:18

maajidkhann force-pushed the vec_backend_for_sve_128_512 branch 2 times, most recently from 000faa7 to 93d64cc Compare November 12, 2024 09:14

github-actions bot added the Stale label Aug 11, 2025

abhishek-iitmadras added no-stale and removed Stale labels Aug 11, 2025

Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512. #138388

Are you sure you want to change the base?

Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512. #138388

Uh oh!

Conversation

maajidkhann commented Oct 19, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138388

✅ No Failures

Uh oh!

maajidkhann commented Oct 19, 2024

Uh oh!

maajidkhann commented Oct 19, 2024

Uh oh!

pytorch-bot bot commented Oct 19, 2024

Uh oh!

maajidkhann commented Oct 19, 2024

Uh oh!

abhishek-iitmadras commented Oct 21, 2024

Uh oh!

pytorch-bot bot commented Oct 21, 2024

Uh oh!

maajidkhann commented Oct 21, 2024

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

digantdesai Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

maajidkhann Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

malfet Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maajidkhann commented Oct 28, 2024

Uh oh!

maajidkhann commented Oct 31, 2024

Uh oh!

maajidkhann commented Nov 6, 2024

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

Uh oh!

maajidkhann commented Oct 19, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 19, 2024 •

edited

Loading