Skip to content

[inductor] skip bmm when converting channel last #159459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jianyizh
Copy link
Contributor

@jianyizh jianyizh commented Jul 30, 2025

Copy link

pytorch-bot bot commented Jul 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159459

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 217eaff with merge base 1e8e9f7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

linux-foundation-easycla bot commented Jul 30, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@jianyizh jianyizh force-pushed the jianyi/fix_channel_last branch from 130e41a to 217eaff Compare July 30, 2025 08:16
@jianyizh
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jul 30, 2025
@albanD albanD requested a review from eellison July 30, 2025 13:36
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 30, 2025
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but can you compare perf with requiring a dense [-1] or [-2] dimension for bmm?

@@ -800,12 +800,17 @@ def find_nodes_prefer_channels_last(self) -> OrderedSet[Node]:
With rule 2, we makes sure all the tensors in the chain uses channels last layout. So both copies
can be saved.
"""
last_conv = None
nodes_cannot_propagate = [torch.ops.aten.bmm.default]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the perf compare if we do a copy inside of aten.bmm.default if the dense dimension dim 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior in aten bmm is to add a clone to do the transpose for both cuda and xpu. I don't know where I can add something like require_stride_order to make bmm dim[-1] dense, but this seems to add another transpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: inductor open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants