-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[inductor] consolidate common GEMM triton param retrieval #159383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
\# Why - Make loop iteration simpler - Have a common spot where to make modifications that affect all the GEMM Triton templates, avoiding missed spots \# What - pull out commong logic of taking the BaseConfig objects and turning them into kwargs to feed into maybe_append_choice for Triton GEMM templates [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159383
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ 8 Pending, 3 Unrelated FailuresAs of commit 676e185 with merge base 5d89634 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
\# Why - Make loop iteration simpler - Have a common spot where to make modifications that affect all the GEMM Triton templates, avoiding missed spots \# What - pull out commong logic of taking the BaseConfig objects and turning them into kwargs to feed into maybe_append_choice for Triton GEMM templates Differential Revision: [D79186962](https://our.internmc.facebook.com/intern/diff/D79186962) Pull Request resolved: #159383 Approved by: https://github.com/jansel
https://hud.pytorch.org/pytorch/pytorch/commit/e7cc42df58a86bee05944f6e80c535aa1d099443 This seems to have caused breakages on rocm
Presumably this change is now making rocm only check 2 configs instead of 4 for mm_plus_mm autotune. @huydhn @amdfaa May need to issue revert here to recover ROCm CI. Testing a revert here #159642 |
…torch#159383)" This reverts commit e7cc42d.
https://hud.pytorch.org/pr/159642 |
@pytorchbot revert -c nosignal -m "sorry but rocm CI is broken due to this PR" |
@pytorchbot successfully started a revert job. Check the current status here. |
…59383)" This reverts commit e7cc42d. Reverted #159383 on behalf of https://github.com/jataylo due to sorry but rocm CI is broken due to this PR ([comment](#159383 (comment)))
@coconutruben your PR has been successfully reverted. |
Thanks for reverting and sorry to break the ROCm CI I'll take a look |
Thanks @coconutruben let us know if any help is needed at all from our side to take a look |
\# Why - Make loop iteration simpler - Have a common spot where to make modifications that affect all the GEMM Triton templates, avoiding missed spots \# What - pull out commong logic of taking the BaseConfig objects and turning them into kwargs to feed into maybe_append_choice for Triton GEMM templates cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D79186962](https://our.internmc.facebook.com/intern/diff/D79186962) [ghstack-poisoned]
Thank you @jataylo! I think I oversaw two things in the refactor that should be fixed now.
What would be helpful
I'll leave the logic as is in the PR once the tests are green as this should be a straight refactor and we can follow up on (2) later |
@coconutruben I can take a look at this monday, theory makes sense to me, if we filter out configs due to duplicates because we use static num_stages then would make sense that config selection reduced. |
@jataylo, do you think the results here are fine to just merge in now? I saw all the rocm tests timed out, I assume there aren't enough runners available? |
Hi @coconutruben looks like mi300 runners are facing some issues at the minute but the regular ROCm CI still ran: I've just added keep-going label and rerunning the failing shard, it failed a helion test and exited, so trying with keep-going to make sure we don't run into any other failures then should be okay to merge. In order to get the mi300 ci running may need to rebase the PR, but the failure happened in both mi300/mi200 so we should be okay to go forward if the above action shows no suspicious failures. |
thanks @jataylo, I'm seeing the same helion failure again, though this seems to be classified as a trunk failure. I'll merge in with -f now to bypass the mi300 ci issues, and let's keep an eye on it if we see anything else suspicious |
@pytorchbot merge -f "ignoring mi300 pending as per comment above" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This pull request was exported from Phabricator. Differential Revision: D79186962 |
Summary: This reverts the part of #159383 for scaled_mm where now, like before, we pass through the normal input_nodes (not the triton_input_nodes) to select_algorithm - #159383 refactored how kwargs are retrieved - it introduced this notion of KernelInputs that wrap input_nodes - scaled_mm uses unsqueezed input nodes for triton to retrieve params - the issue: it uses a squeezed (regular) bias for select_algorithm instead This fixes that by passing the original input nodes rather than the triton input nodes. Test Plan: ``` buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False (caffe2.test.inductor.test_fp8.TestFP8Lowering)' buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_True (caffe2.test.inductor.test_fp8.TestFP8Lowering)' ``` This set of tests was failing, and is passing now Side note: these tests were failing I believe because the unsqueezed bias made the ATEN choice no longer eligible, and there is some minor numerical discrepancy between ATEN and Triton for this. I'm not sure the test should be written like that, as we're implicitly relying on ATEN being the choice here. Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: This reverts the part of #159383 for scaled_mm where now, like before, we pass through the normal input_nodes (not the triton_input_nodes) to select_algorithm - #159383 refactored how kwargs are retrieved - it introduced this notion of KernelInputs that wrap input_nodes - scaled_mm uses unsqueezed input nodes for triton to retrieve params - the issue: it uses a squeezed (regular) bias for select_algorithm instead This fixes that by passing the original input nodes rather than the triton input nodes. Test Plan: ``` buck test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False (caffe2.test.inductor.test_fp8.TestFP8Lowering)' buck test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_True (caffe2.test.inductor.test_fp8.TestFP8Lowering)' ``` This set of tests was failing, and is passing now Side note: these tests were failing I believe because the unsqueezed bias made the ATEN choice no longer eligible, and there is some minor numerical discrepancy between ATEN and Triton for this. I'm not sure the test should be written like that, as we're implicitly relying on ATEN being the choice here. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b7b8caa Pull Request resolved: #159948
Summary: This reverts the part of #159383 for scaled_mm where now, like before, we pass through the normal input_nodes (not the triton_input_nodes) to select_algorithm - #159383 refactored how kwargs are retrieved - it introduced this notion of KernelInputs that wrap input_nodes - scaled_mm uses unsqueezed input nodes for triton to retrieve params - the issue: it uses a squeezed (regular) bias for select_algorithm instead This fixes that by passing the original input nodes rather than the triton input nodes. Test Plan: ``` buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False (caffe2.test.inductor.test_fp8.TestFP8Lowering)' buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_True (caffe2.test.inductor.test_fp8.TestFP8Lowering)' ``` This set of tests was failing, and is passing now Side note: these tests were failing I believe because the unsqueezed bias made the ATEN choice no longer eligible, and there is some minor numerical discrepancy between ATEN and Triton for this. I'm not sure the test should be written like that, as we're implicitly relying on ATEN being the choice here. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D79717654](https://our.internmc.facebook.com/intern/diff/D79717654) Pull Request resolved: #159948 Approved by: https://github.com/izaitsevfb, https://github.com/eellison
Stack from ghstack (oldest at bottom):
# Why
all the GEMM Triton templates, avoiding missed spots
# What
and turning them into kwargs to feed into maybe_append_choice
for Triton GEMM templates
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov
Differential Revision: D79186962