Skip to content

[Caffe2] Build perfkernels targeting SVE128 #159274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Nicoshev
Copy link
Contributor

@Nicoshev Nicoshev commented Jul 28, 2025

Summary: We are now building perfkernels using SVE/Neon enhancements

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision: D78902495

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

Copy link

pytorch-bot bot commented Jul 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159274

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit d3dd950 with merge base bfc873d (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

linux-foundation-easycla bot commented Jul 28, 2025

CLA Missing ID CLA Not Signed

@pytorch-bot pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: quantization release notes category labels Jul 28, 2025
@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue fb-exported labels Jul 28, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78902495

}
template <typename T>
std::ostream& operator<<(std::ostream& stream, const Vectorized<T>& vec) {
T buf[Vectorized<T>::size()];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes dynamic array... is size constexpr? Use std::array if so

Vec v1 = v1_1;
float32x4_t v1_1 = vextq_f32(vReg, vReg, 2);

__at_align__ float v1[4];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not do this with std::array and data?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78902495

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78902495

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 4, 2025
)

Summary:
Pull Request resolved: pytorch#159274

We are introducing the SVE128 vectorized<> layer.

Idea is to differentiate SVE128 perfkernels from the general SVE implementation

Mixing NEON and SVE should maximize performance on SVE128 cpus

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision: D78902495
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78902495

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
)

Summary:
Pull Request resolved: pytorch#159274

We are introducing the SVE128 vectorized<> layer.

Idea is to differentiate SVE128 perfkernels from the general SVE implementation

Mixing NEON and SVE should maximize performance on SVE128 cpus

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision: D78902495
Nicolas De Carli added 2 commits August 9, 2025 07:02
Summary:
Pull Request resolved: pytorch#158932

Importing pytorch#138388, as it improves SVE support for perfkernels

Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier

Differential Revision: D70788867

Reviewed By: r1mikey
Summary: Enabling compilation targeting SVE128

Test Plan: AdRanker/AdFinder ServiceLab

Differential Revision: D78691521
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78902495

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 11, 2025
)

Summary:
Pull Request resolved: pytorch#159274

We are introducing the SVE128 vectorized<> layer.

Idea is to differentiate SVE128 perfkernels from the general SVE implementation

Mixing NEON and SVE should maximize performance on SVE128 cpus

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision: D78902495
)

Summary:
Pull Request resolved: pytorch#159274

We are introducing the SVE128 vectorized<> layer.

Idea is to differentiate SVE128 perfkernels from the general SVE implementation

Mixing NEON and SVE should maximize performance on SVE128 cpus

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision: D78902495
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78902495

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor fb-exported module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor oncall: jit Add this issue/PR to JIT oncall triage queue release notes: quantization release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants