[Caffe2] Build perfkernels targeting SVE128 #159274

Nicoshev · 2025-07-28T18:02:47Z

Summary: We are now building perfkernels using SVE/Neon enhancements

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision: D78902495

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot · 2025-07-28T18:02:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159274

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit d3dd950 with merge base bfc873d ():

NEW FAILURES - The following jobs have failed:

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
RuntimeError: Eager run failed
Lint / lintrunner-clang / linux-job (gh)
>>> Lint for torch/nativert/kernels/GeneratedStaticDispatchKernels.cpp:
Lint / lintrunner-mypy / linux-job (gh)
>>> Lint for torch/_inductor/cpu_vec_isa.py:
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_inductor/codegen/cpp_micro_gemm.py:
pull / linux-jammy-cuda12.8-cudnn9-py3.9-clang12 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp:1176:71: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
pull / linux-jammy-py3.10-clang18-asan / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp:1176:71: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
pull / linux-jammy-py3.13-clang12 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp:1176:71: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
pull / linux-jammy-py3.9-clang12 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp:1176:71: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
pull / linux-jammy-py3.9-clang12-onnx / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp:1176:71: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-07-28T18:02:52Z

✅ login: Nicoshev / name: Nicolas De Carli (d3dd950)
❌ The email address for the commit (9d7d73f, 685e673) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please submit a support request ticket.

facebook-github-bot · 2025-07-28T18:03:18Z

This pull request was exported from Phabricator. Differential Revision: D78902495

Skylion007 · 2025-07-29T16:50:19Z

aten/src/ATen/cpu/vec/sve/vec_common_sve.h

+}
+template <typename T>
+std::ostream& operator<<(std::ostream& stream, const Vectorized<T>& vec) {
+  T buf[Vectorized<T>::size()];


Yikes dynamic array... is size constexpr? Use std::array if so

Skylion007 · 2025-07-29T16:55:03Z

aten/src/ATen/cpu/vec/functional_base.h

-    Vec v1 = v1_1;
+    float32x4_t v1_1 = vextq_f32(vReg, vReg, 2);
+
+     __at_align__ float v1[4];


Can you not do this with std::array and data?

facebook-github-bot · 2025-08-04T17:56:53Z

This pull request was exported from Phabricator. Differential Revision: D78902495

facebook-github-bot · 2025-08-04T18:17:32Z

This pull request was exported from Phabricator. Differential Revision: D78902495

) Summary: Pull Request resolved: pytorch#159274 We are introducing the SVE128 vectorized<> layer. Idea is to differentiate SVE128 perfkernels from the general SVE implementation Mixing NEON and SVE should maximize performance on SVE128 cpus Test Plan: Sigrid Predictor canary Rollback Plan: Differential Revision: D78902495

facebook-github-bot · 2025-08-05T02:43:05Z

This pull request was exported from Phabricator. Differential Revision: D78902495

) Summary: Pull Request resolved: pytorch#159274 We are introducing the SVE128 vectorized<> layer. Idea is to differentiate SVE128 perfkernels from the general SVE implementation Mixing NEON and SVE should maximize performance on SVE128 cpus Test Plan: Sigrid Predictor canary Rollback Plan: Differential Revision: D78902495

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

Summary: Enabling compilation targeting SVE128 Test Plan: AdRanker/AdFinder ServiceLab Differential Revision: D78691521

facebook-github-bot · 2025-08-11T15:35:57Z

This pull request was exported from Phabricator. Differential Revision: D78902495

) Summary: Pull Request resolved: pytorch#159274 We are introducing the SVE128 vectorized<> layer. Idea is to differentiate SVE128 perfkernels from the general SVE implementation Mixing NEON and SVE should maximize performance on SVE128 cpus Test Plan: Sigrid Predictor canary Rollback Plan: Differential Revision: D78902495

facebook-github-bot · 2025-08-12T03:10:17Z

This pull request was exported from Phabricator. Differential Revision: D78902495

Nicoshev requested review from lezcano, nikitaved, IvanYashchuk, jerryzh168, salilsdesai, kimishpatel, digantdesai and jianyuh as code owners July 28, 2025 18:02

pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: quantization release notes category labels Jul 28, 2025

facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue fb-exported labels Jul 28, 2025

Skylion007 reviewed Jul 29, 2025

View reviewed changes

Nicoshev force-pushed the export-D78902495 branch from e24bfba to 587299f Compare August 4, 2025 17:57

Nicoshev force-pushed the export-D78902495 branch from 587299f to 7c03eb8 Compare August 4, 2025 18:17

Nicoshev force-pushed the export-D78902495 branch from 7c03eb8 to f4df25b Compare August 5, 2025 02:43

Nicolas De Carli added 2 commits August 9, 2025 07:02

[Caffe2] Import SVE128 PR (pytorch#158932)

9d7d73f

Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey

[Caffe2] Enable SVE128

685e673

Summary: Enabling compilation targeting SVE128 Test Plan: AdRanker/AdFinder ServiceLab Differential Revision: D78691521

Nicoshev force-pushed the export-D78902495 branch from f4df25b to 1a4f936 Compare August 11, 2025 15:36

Nicoshev force-pushed the export-D78902495 branch from 1a4f936 to d3dd950 Compare August 12, 2025 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Caffe2] Build perfkernels targeting SVE128 #159274

[Caffe2] Build perfkernels targeting SVE128 #159274

Nicoshev commented Jul 28, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 28, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jul 28, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 28, 2025

Uh oh!

Skylion007 Jul 29, 2025

Uh oh!

Skylion007 Jul 29, 2025

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 5, 2025

Uh oh!

facebook-github-bot commented Aug 11, 2025

Uh oh!

facebook-github-bot commented Aug 12, 2025

Uh oh!

Uh oh!

[Caffe2] Build perfkernels targeting SVE128 #159274

Are you sure you want to change the base?

[Caffe2] Build perfkernels targeting SVE128 #159274

Conversation

Nicoshev commented Jul 28, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159274

❌ 9 New Failures

Uh oh!

linux-foundation-easycla bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 28, 2025

Uh oh!

Skylion007 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 5, 2025

Uh oh!

facebook-github-bot commented Aug 11, 2025

Uh oh!

facebook-github-bot commented Aug 12, 2025

Uh oh!

Uh oh!

Nicoshev commented Jul 28, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 28, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jul 28, 2025 •

edited

Loading