Skip to content

[inductor] respect layout tags for ops with registered lowerings #159134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

xmfan
Copy link
Member

@xmfan xmfan commented Jul 25, 2025

Stack from ghstack (oldest at bottom):

scaled_grouped_mm's kernel only supports column-major on the second operand. I -think- this is just for efficiency reasons. But inductor treats that buffer as flexible and may tweak the strides to be row-major instead, as seen in the issue.

Tagging the op as "needs_fixed_stride_order"/"needs_exact_strides" does not work. Inductor only considers those tags for ops that don't have registered lowering (not sure if this is intended). scaled_grouped_mm does have a lowering, so we never check its tags. From discussion below, the op tags are expected to work.

FIXES #159097

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jul 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159134

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 50f10e8 with merge base 7ac70ac (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

xmfan added a commit that referenced this pull request Jul 25, 2025
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 25, 2025
@@ -2629,6 +2629,7 @@ def is_aligned(x):
make_fallback(aten._adaptive_avg_pool3d) # @isuruf
make_fallback(aten.adaptive_max_pool3d) # @isuruf
make_fallback(aten._scaled_dot_product_attention_math_for_mps) # @malfet
make_fallback(aten._scaled_grouped_mm, constrain_to_fx_strides)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constrain_to_fx_strides isn't good enough. You want constrain_to_fake_tensors (aka needs_fixed_strides).

constrain_to_fx_strides just makes sure the tensors have the same stride order. The tensor might still not be row major/column major.

Copy link
Contributor

@zou3519 zou3519 Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to merge this as a workaround, feel free, but there is going to need to be work to be done to audit and migrate usages of constrain_fx_strides to constrain_to_fake_tensors. cc @eellison

Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be specified as a tag on the operator.

needs_exact_strides. We rely on the tag being on the operator to appropriately track strides on the inputs through tracing.

@xmfan
Copy link
Member Author

xmfan commented Jul 25, 2025

@ellison that's what i added at first, but it looks like we only check the tags when the ops have no lowering, we should always respect the op tags right?

@eellison
Copy link
Contributor

@xmfan - yea, if there is no lowering, we should respect the tag.

[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 25, 2025
@xmfan xmfan changed the title [inductor] restrict _scaled_grouped_mm strides to match user strides [inductor] respect layout tags for ops with registered lowerings Jul 25, 2025
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 25, 2025
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 25, 2025
@xmfan xmfan marked this pull request as ready for review July 25, 2025 21:16
@xmfan xmfan requested review from zou3519 and eellison July 25, 2025 21:16
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 25, 2025
use_fast_accum=True,
)

fn()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xmfan can you add a test like in https://github.com/pytorch/pytorch/pull/159174/files ?

The idea is to run a pattern matching pass to swizzle an operation so that we are sure that we're testing the right thing.

I have done e2e tests like what is currently in this PR and they don't work out sometimes or inductor changes and they end up not testing the right thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think i understand that test. it still passes even when i comment out the impl2

def impl2(x):
  # return x.clone(memory_format=torch.channels_last)
  return x

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code look good to me, but if we can we should try to add a test that explicitly runs a pattern matching pass to swizzle input tensor strides. Such a test is more robust to changes in Inductor

@xmfan
Copy link
Member Author

xmfan commented Jul 30, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 30, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 30, 2025
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 31, 2025
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Jul 31, 2025
@xmfan
Copy link
Member Author

xmfan commented Jul 31, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
…9134)

scaled_grouped_mm's kernel only supports column-major on the second operand. I -think- this is just for efficiency reasons. But inductor treats that buffer as flexible and may tweak the strides to be row-major instead, as seen in the issue.

~Tagging the op as "needs_fixed_stride_order"/"needs_exact_strides" does not work. Inductor only considers those tags for ops that don't have registered lowering (not sure if this is intended). scaled_grouped_mm does have a lowering, so we never check its tags.~ From discussion below, the op tags are expected to work.

FIXES #159097

Pull Request resolved: #159134
Approved by: https://github.com/eellison
guangyey added a commit that referenced this pull request Aug 3, 2025
This reverts commit 0df78f0.

ghstack-source-id: 408feda
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
guangyey added a commit that referenced this pull request Aug 4, 2025
This reverts commit 0df78f0.

ghstack-source-id: 65756e0
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
guangyey added a commit that referenced this pull request Aug 4, 2025
This reverts commit 0df78f0.

ghstack-source-id: 503a42b
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
guangyey added a commit that referenced this pull request Aug 4, 2025
This reverts commit 0df78f0.

ghstack-source-id: ef4b5c5
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
guangyey added a commit that referenced this pull request Aug 4, 2025
This reverts commit 0df78f0.

ghstack-source-id: 1825d63
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
guangyey added a commit that referenced this pull request Aug 4, 2025
This reverts commit 0df78f0.

ghstack-source-id: 4f6e21d
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
guangyey added a commit that referenced this pull request Aug 5, 2025
This reverts commit 0df78f0.

ghstack-source-id: 33496dc
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
guangyey added a commit that referenced this pull request Aug 5, 2025
This reverts commit 0df78f0.

ghstack-source-id: 82e8552
Pull Request resolved: #159718

Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)"

This reverts commit 669009b.
pytorchmergebot pushed a commit that referenced this pull request Aug 5, 2025
Useful helper function for stage 1 export -> manual partitioner -> stage 2 compile users

Pull Request resolved: #159705
Approved by: https://github.com/zou3519
ghstack dependencies: #159134
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants