-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[Caffe2] Build perfkernels targeting SVE128 #159274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159274
Note: Links to docs will display an error until the docs builds have been completed. ❌ 9 New FailuresAs of commit d3dd950 with merge base bfc873d ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D78902495 |
} | ||
template <typename T> | ||
std::ostream& operator<<(std::ostream& stream, const Vectorized<T>& vec) { | ||
T buf[Vectorized<T>::size()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes dynamic array... is size constexpr? Use std::array if so
Vec v1 = v1_1; | ||
float32x4_t v1_1 = vextq_f32(vReg, vReg, 2); | ||
|
||
__at_align__ float v1[4]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not do this with std::array and data?
This pull request was exported from Phabricator. Differential Revision: D78902495 |
e24bfba
to
587299f
Compare
This pull request was exported from Phabricator. Differential Revision: D78902495 |
587299f
to
7c03eb8
Compare
) Summary: Pull Request resolved: pytorch#159274 We are introducing the SVE128 vectorized<> layer. Idea is to differentiate SVE128 perfkernels from the general SVE implementation Mixing NEON and SVE should maximize performance on SVE128 cpus Test Plan: Sigrid Predictor canary Rollback Plan: Differential Revision: D78902495
This pull request was exported from Phabricator. Differential Revision: D78902495 |
) Summary: Pull Request resolved: pytorch#159274 We are introducing the SVE128 vectorized<> layer. Idea is to differentiate SVE128 perfkernels from the general SVE implementation Mixing NEON and SVE should maximize performance on SVE128 cpus Test Plan: Sigrid Predictor canary Rollback Plan: Differential Revision: D78902495
7c03eb8
to
f4df25b
Compare
Summary: Pull Request resolved: pytorch#158932 Importing pytorch#138388, as it improves SVE support for perfkernels Test Plan: We will test it on AdFinder/AdRetriever/AdRanker offline tier Differential Revision: D70788867 Reviewed By: r1mikey
Summary: Enabling compilation targeting SVE128 Test Plan: AdRanker/AdFinder ServiceLab Differential Revision: D78691521
This pull request was exported from Phabricator. Differential Revision: D78902495 |
) Summary: Pull Request resolved: pytorch#159274 We are introducing the SVE128 vectorized<> layer. Idea is to differentiate SVE128 perfkernels from the general SVE implementation Mixing NEON and SVE should maximize performance on SVE128 cpus Test Plan: Sigrid Predictor canary Rollback Plan: Differential Revision: D78902495
f4df25b
to
1a4f936
Compare
) Summary: Pull Request resolved: pytorch#159274 We are introducing the SVE128 vectorized<> layer. Idea is to differentiate SVE128 perfkernels from the general SVE implementation Mixing NEON and SVE should maximize performance on SVE128 cpus Test Plan: Sigrid Predictor canary Rollback Plan: Differential Revision: D78902495
This pull request was exported from Phabricator. Differential Revision: D78902495 |
1a4f936
to
d3dd950
Compare
Summary: We are now building perfkernels using SVE/Neon enhancements
Test Plan:
Sigrid Predictor canary
Rollback Plan:
Differential Revision: D78902495
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben