Skip to content

[inductor] remove no_x_dim #159810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: gh/davidberard98/396/base
Choose a base branch
from

Conversation

davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Aug 4, 2025

Stack from ghstack (oldest at bottom):

no_x_dim is used to indicate that a reduction operates on a single row, and data loaded for the reduction is 1-dimensional.

no_x_dim was introduced in #102444 - in which there was bad perf in some reductions, and using 1D tensors fixed the perf issue.

However, it appears that this perf issue no longer exists in current Triton versions. #118822 checked this, and we can also check this on H100 benchmarks (linked below). And another motivation for removing this behavior is that it enables larger loads, which we observe is necessary for good performance on certain shapes on Blackwell.

H100 inference benchmarks:
https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2004%20Aug%202025%2004%3A13%3A24%20GMT&stopTime=Mon%2C%2011%20Aug%202025%2004%3A13%3A24%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/davidberard98/396/orig&lCommit=a6bcd4692fb39fa2fad260f290bff545d4425829&rBranch=main&rCommit=e96c7c4bb0f6aeae2ab3b6f040f7d67edbec199a

H100 training benchmarks:
https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2004%20Aug%202025%2004%3A13%3A24%20GMT&stopTime=Mon%2C%2011%20Aug%202025%2004%3A13%3A24%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/davidberard98/396/orig&lCommit=a6bcd4692fb39fa2fad260f290bff545d4425829&rBranch=main&rCommit=e96c7c4bb0f6aeae2ab3b6f040f7d67edbec199a

Overall, the benchmarks show minimal change in performance.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

Differential Revision: D79599286

Copy link

pytorch-bot bot commented Aug 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159810

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit aec33bd with merge base 2259dbe (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

davidberard98 added a commit that referenced this pull request Aug 4, 2025
ghstack-source-id: 97b6f1f
Pull Request resolved: #159810
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 4, 2025
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Differential Revision: [D79599286](https://our.internmc.facebook.com/intern/diff/D79599286)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 9, 2025
ghstack-source-id: b282c17
Pull Request resolved: #159810
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Differential Revision: [D79599286](https://our.internmc.facebook.com/intern/diff/D79599286)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 9, 2025
ghstack-source-id: e47f519
Pull Request resolved: #159810
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Differential Revision: [D79599286](https://our.internmc.facebook.com/intern/diff/D79599286)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 9, 2025
ghstack-source-id: 54c8302
Pull Request resolved: #159810
@davidberard98
Copy link
Contributor Author

@davidberard98 davidberard98 marked this pull request as ready for review August 11, 2025 04:20
@davidberard98 davidberard98 changed the title [WIP][inductor] remove no_x_dim [inductor] remove no_x_dim Aug 11, 2025
@davidberard98
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/davidberard98/396/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/159810)

pytorchmergebot pushed a commit that referenced this pull request Aug 11, 2025
ghstack-source-id: 18269c5
Pull Request resolved: #159810
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team Raised by workflow job

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants