-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[inductor][cpu] Fix double-offset issue in GEMM_TEMPLATE
#159233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159233
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled Job, 2 Unrelated FailuresAs of commit 674a976 with merge base f636736 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
The committers listed above are authorized under a signed CLA. |
@pytorchbot label "topic: not user facing" |
@CaoE could you help to take a look of this fix? |
hi @Phoslight, thanks for the fixing and want to understand more about why the offset will cause out-of-bounds. AFAIK, the offset should come from the original view node. |
Thank you for your reply, Leslie. Without this patch, the cpp template and the example inputs both add an offset, which causes the double-offset issue. Hope the above chart helps clarify the issue. |
Gentle ping. Any other reviews? Thanks in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not familiar with this code, but approving workflows to run
# GEMM_TEMPLATE emits code like: | ||
# W.data_ptr[offset + ...] | ||
# but the data_ptr already includes the offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it sound like the correct fix is to remove the offset from the index calculation in the emitted code rather than copy. I assume that would break something else?
Fixes #158076
Basically, the gemm template generates code like
However, when the input tensor W has a storage offset, this results in a double offset issue. That is, the resulting pointer is
2 * 200704000LL
away fromW.storage().data_ptr()
, which causes an out-of-bounds access.The storage offset of
W
is introduced by this patch, but I think it's a reasonable fix. Socpp_gemm_template.py
should handle input matrices with storage offsets properly.I think a good way to fix this issue is to create a new matrix that has no storage offset.
When
should_block_weights
is true,block_weight()
creates a clean new matrix, so that branch is not affected by this issue.BTW I've also examined the FX IRs generated by
torch.compile()
, as well as the generated python module, and they are correct.The newly-added test in
test_cpu_select_algorithm.py
can reproduce the issue. With this patch, the crash is fixed. It also resolves the crash reported in #158076.I ran CPU tests in
test_cpu_select_algorithm.py
, but many of them are skipped due to MKL and AMX. I'd be appreciated if someone can help verify the test.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben