Skip to content

[PyTorch] Check defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256) instead of defined(CPU_CAPABILITY_NEON) #137722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Oct 10, 2024

Stack from ghstack (oldest at bottom):

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: D64197046

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Oct 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137722

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1f40b24 with merge base 0786b37 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Oct 10, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

@swolchok
Copy link
Contributor Author

swolchok commented Oct 10, 2024

looks like this needs to be identical to the gating surrounding vec256_float_neon

@swolchok
Copy link
Contributor Author

swolchok commented Oct 10, 2024

looks like this needs to be identical to the gating surrounding vec256_float_neon

specifically, 1) we just use __aarch64__ to gate that implementation, so we should follow suit here
2) we also need to check !defined(CPU_CAPABILITY_SVE256) because this apparently messes with SVE

…PABILITY_NEON"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

@swolchok swolchok changed the title [PyTorch] Check if __ARM_NEON is defined instead of CPU_CAPABILITY_NEON [PyTorch] Check defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256) instead of defined(CPU_CAPABILITY_NEON) Oct 10, 2024
@swolchok
Copy link
Contributor Author

hooray for green tests

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 11, 2024
@abhishek-iitmadras
Copy link
Collaborator

Hi @swolchok
Wouldn't it be better to define CPU_CAPABILITY_SVE instead of just CPU_CAPABILITY_SVE256?

@swolchok
Copy link
Contributor Author

Hi @swolchok Wouldn't it be better to define CPU_CAPABILITY_SVE instead of just CPU_CAPABILITY_SVE256?

it is CPU_CAPABILITY_SVE256 because that is the gating surrounding vec256_float_neon.h: https://www.internalfb.com/code/fbsource/[0660504a24502363bb08b1b37504bc3ef6d878eb]/fbcode/caffe2/aten/src/ATen/cpu/vec/vec256/vec256.h?lines=10-15

…BILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

…BILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strips the ability of generating non-NEON accelerated code in torch.comple, which feels like a regression...
I.e. please undo changes to cpp_prefix.h/cpu_vec_isa.py

…h64__) && !defined(CPU_CAPABILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

swolchok added a commit that referenced this pull request Oct 15, 2024
…56)` instead of `defined(CPU_CAPABILITY_NEON)`

Pull Request resolved: #137722

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704
ghstack-source-id: 248161061
@exported-using-ghexport

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for the update

KnAwnime pushed a commit to KnAwnime/Biblioteka that referenced this pull request Oct 16, 2024
…56)` instead of `defined(CPU_CAPABILITY_NEON)`

Pull Request resolved: pytorch/pytorch#137722

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704
ghstack-source-id: 247383134
@exported-using-ghexport

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)
pytorchmergebot pushed a commit that referenced this pull request Oct 16, 2024
…8014)

This will break once we support 128-bit vectors, and there's no reason to do it.

Differential Revision: [D64421982](https://our.internmc.facebook.com/intern/diff/D64421982/)

Pull Request resolved: #138014
Approved by: https://github.com/malfet, https://github.com/Skylion007
ghstack dependencies: #137722
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

pytorchmergebot pushed a commit that referenced this pull request Oct 17, 2024
The ifdef as written just checks if the macOS 15.0-capable SDK is being used. You also need a runtime gate to make sure macOS 15 is in use.

Differential Revision: [D64429453](https://our.internmc.facebook.com/intern/diff/D64429453/)
Pull Request resolved: #138022
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #137722, #138014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants