[inductor] respect layout tags for ops with registered lowerings #159134

xmfan · 2025-07-25T06:39:00Z

Stack from ghstack (oldest at bottom):

-> [inductor] respect layout tags for ops with registered lowerings #159134

scaled_grouped_mm's kernel only supports column-major on the second operand. I -think- this is just for efficiency reasons. But inductor treats that buffer as flexible and may tweak the strides to be row-major instead, as seen in the issue.

Tagging the op as "needs_fixed_stride_order"/"needs_exact_strides" does not work. Inductor only considers those tags for ops that don't have registered lowering (not sure if this is intended). scaled_grouped_mm does have a lowering, so we never check its tags. From discussion below, the op tags are expected to work.

FIXES #159097

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

[ghstack-poisoned]

pytorch-bot · 2025-07-25T06:39:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159134

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 50f10e8 with merge base 7ac70ac ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 9625c4c Pull Request resolved: #159134

[ghstack-poisoned]

ghstack-source-id: fd3295b Pull Request resolved: #159134

zou3519 · 2025-07-25T13:42:05Z

torch/_inductor/lowering.py

@@ -2629,6 +2629,7 @@ def is_aligned(x):
 make_fallback(aten._adaptive_avg_pool3d)  # @isuruf
 make_fallback(aten.adaptive_max_pool3d)  # @isuruf
 make_fallback(aten._scaled_dot_product_attention_math_for_mps)  # @malfet
+make_fallback(aten._scaled_grouped_mm, constrain_to_fx_strides)


constrain_to_fx_strides isn't good enough. You want constrain_to_fake_tensors (aka needs_fixed_strides).

constrain_to_fx_strides just makes sure the tensors have the same stride order. The tensor might still not be row major/column major.

If you want to merge this as a workaround, feel free, but there is going to need to be work to be done to audit and migrate usages of constrain_fx_strides to constrain_to_fake_tensors. cc @eellison

eellison

This should be specified as a tag on the operator.

needs_exact_strides. We rely on the tag being on the operator to appropriately track strides on the inputs through tracing.

xmfan · 2025-07-25T14:06:14Z

@ellison that's what i added at first, but it looks like we only check the tags when the ops have no lowering, we should always respect the op tags right?

eellison · 2025-07-25T14:09:39Z

@xmfan - yea, if there is no lowering, we should respect the tag.

[ghstack-poisoned]

ghstack-source-id: c8def80 Pull Request resolved: #159134

[ghstack-poisoned]

ghstack-source-id: 07ccabf Pull Request resolved: #159134

[ghstack-poisoned]

ghstack-source-id: 487497a Pull Request resolved: #159134

[ghstack-poisoned]

ghstack-source-id: fedeca6 Pull Request resolved: #159134

zou3519 · 2025-07-29T18:24:08Z

test/inductor/test_torchinductor.py

+                use_fast_accum=True,
+            )
+
+        fn()


@xmfan can you add a test like in https://github.com/pytorch/pytorch/pull/159174/files ?

The idea is to run a pattern matching pass to swizzle an operation so that we are sure that we're testing the right thing.

I have done e2e tests like what is currently in this PR and they don't work out sometimes or inductor changes and they end up not testing the right thing.

i don't think i understand that test. it still passes even when i comment out the impl2

def impl2(x): # return x.clone(memory_format=torch.channels_last) return x

zou3519

Code look good to me, but if we can we should try to add a test that explicitly runs a pattern matching pass to swizzle input tensor strides. Such a test is more robust to changes in Inductor

xmfan · 2025-07-30T03:47:44Z

@pytorchbot merge

pytorchmergebot · 2025-07-30T03:49:49Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-07-30T04:37:39Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.2)

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

ghstack-source-id: 154ee13 Pull Request resolved: #159134

[ghstack-poisoned]

ghstack-source-id: d9d2b57 Pull Request resolved: #159134

[ghstack-poisoned]

ghstack-source-id: 3e58cd9 Pull Request resolved: #159134

xmfan · 2025-07-31T21:22:17Z

@pytorchbot merge

pytorchmergebot · 2025-07-31T21:24:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…9134) scaled_grouped_mm's kernel only supports column-major on the second operand. I -think- this is just for efficiency reasons. But inductor treats that buffer as flexible and may tweak the strides to be row-major instead, as seen in the issue. ~Tagging the op as "needs_fixed_stride_order"/"needs_exact_strides" does not work. Inductor only considers those tags for ops that don't have registered lowering (not sure if this is intended). scaled_grouped_mm does have a lowering, so we never check its tags.~ From discussion below, the op tags are expected to work. FIXES #159097 Pull Request resolved: #159134 Approved by: https://github.com/eellison

This reverts commit 0df78f0. ghstack-source-id: 408feda Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

This reverts commit 0df78f0. ghstack-source-id: 65756e0 Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

This reverts commit 0df78f0. ghstack-source-id: 503a42b Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

This reverts commit 0df78f0. ghstack-source-id: ef4b5c5 Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

This reverts commit 0df78f0. ghstack-source-id: 1825d63 Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

This reverts commit 0df78f0. ghstack-source-id: 4f6e21d Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

This reverts commit 0df78f0. ghstack-source-id: 33496dc Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

This reverts commit 0df78f0. ghstack-source-id: 82e8552 Pull Request resolved: #159718 Revert "[inductor] respect layout tags for ops with registered lowerings (#159134)" This reverts commit 669009b.

Useful helper function for stage 1 export -> manual partitioner -> stage 2 compile users Pull Request resolved: #159705 Approved by: https://github.com/zou3519 ghstack dependencies: #159134

Update

1032feb

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 25, 2025

[inductor] restrict _scaled_grouped_mm strides to match user strides

8557d0c

ghstack-source-id: 9625c4c Pull Request resolved: #159134

pytorch-bot bot added ciflow/inductor module: inductor labels Jul 25, 2025

Update

0793566

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 25, 2025

[inductor] restrict _scaled_grouped_mm strides to match user strides

7285d90

ghstack-source-id: fd3295b Pull Request resolved: #159134

zou3519 reviewed Jul 25, 2025

View reviewed changes

eellison reviewed Jul 25, 2025

View reviewed changes

Update

70a24f2

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 25, 2025

[inductor] restrict scaled mm op strides to exactly match user strides

b8e4930

ghstack-source-id: c8def80 Pull Request resolved: #159134

xmfan changed the title ~~[inductor] restrict _scaled_grouped_mm strides to match user strides~~ [inductor] respect layout tags for ops with registered lowerings Jul 25, 2025

Update

4b04774

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 25, 2025

[inductor] restrict scaled mm op strides to exactly match user strides

1c847fd

ghstack-source-id: 07ccabf Pull Request resolved: #159134

Update

a7eea0e

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 25, 2025

[inductor] restrict scaled mm op strides to exactly match user strides

98262ba

ghstack-source-id: 487497a Pull Request resolved: #159134

xmfan added the release notes: inductor label Jul 25, 2025

xmfan marked this pull request as ready for review July 25, 2025 21:16

xmfan requested review from zou3519 and eellison July 25, 2025 21:16

Update

b3130fc

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 25, 2025

[inductor] restrict scaled mm op strides to exactly match user strides

a854d6a

ghstack-source-id: fedeca6 Pull Request resolved: #159134

eellison approved these changes Jul 29, 2025

View reviewed changes

zou3519 reviewed Jul 29, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 30, 2025

pytorchmergebot added the merging label Jul 30, 2025

pytorchmergebot removed the merging label Jul 30, 2025

Update

c29571b

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 30, 2025

[inductor] restrict scaled mm op strides to exactly match user strides

f39c3b3

ghstack-source-id: 154ee13 Pull Request resolved: #159134

Update

1a0201b

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 31, 2025

[inductor] restrict scaled mm op strides to exactly match user strides

b264192

ghstack-source-id: d9d2b57 Pull Request resolved: #159134

Update

50f10e8

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Jul 31, 2025

[inductor] restrict scaled mm op strides to exactly match user strides

094871d

ghstack-source-id: 3e58cd9 Pull Request resolved: #159134

pytorchmergebot added the merging label Jul 31, 2025

pytorchmergebot closed this in 669009b Jul 31, 2025

pytorchmergebot added Merged and removed merging labels Jul 31, 2025

danielvegamyhre mentioned this pull request Aug 7, 2025

[inductor] tune_scaled_grouped_mm fails with memory layout assertion, despite memory layout assertions prior to op call passing #156325

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] respect layout tags for ops with registered lowerings #159134

[inductor] respect layout tags for ops with registered lowerings #159134

Uh oh!

xmfan commented Jul 25, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 25, 2025 •

edited

Loading

Uh oh!

zou3519 Jul 25, 2025

Uh oh!

zou3519 Jul 25, 2025 •

edited

Loading

Uh oh!

eellison left a comment •

edited

Loading

Uh oh!

xmfan commented Jul 25, 2025

Uh oh!

eellison commented Jul 25, 2025

Uh oh!

zou3519 Jul 29, 2025

Uh oh!

xmfan Jul 30, 2025

Uh oh!

zou3519 left a comment

Uh oh!

xmfan commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

xmfan commented Jul 31, 2025

Uh oh!

pytorchmergebot commented Jul 31, 2025

Uh oh!

Uh oh!

[inductor] respect layout tags for ops with registered lowerings #159134

[inductor] respect layout tags for ops with registered lowerings #159134

Uh oh!

Conversation

xmfan commented Jul 25, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159134

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

zou3519 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xmfan commented Jul 25, 2025

Uh oh!

eellison commented Jul 25, 2025

Uh oh!

zou3519 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

xmfan Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

xmfan commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Merge started

Uh oh!

pytorchmergebot commented Jul 30, 2025

Merge failed

Uh oh!

xmfan commented Jul 31, 2025

Uh oh!

pytorchmergebot commented Jul 31, 2025

Merge started

Uh oh!

Uh oh!

xmfan commented Jul 25, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 25, 2025 •

edited

Loading

zou3519 Jul 25, 2025 •

edited

Loading

eellison left a comment •

edited

Loading