[Inductor][CPP] Remove redundant Buffers after Grouped GEMM Fusion #143904

leslie-fang-intel · 2024-12-27T10:33:03Z

Stack from ghstack (oldest at bottom):

Summary
In this PR, we remove the extra kernel arguments and the extra buffers allocation when any MultiOutput Buffer is consumed by an out-template epilogue. If any MultiOutput Buffer is consumed by an out-template epilogue, the Grouped GEMM Template should bypass storing it in the MultiOutput Buffer and instead write it directly to the corresponding out-template epilogue.

Remove extra kernel arguments
For the case listed above, a MultiOutput Buffer shouldn't exist in the Kernel's args if it's consumed by an out-template epilogue. We mark this MultiOutput Buffer as REMOVED for this case.

Remove the extra buffers allocation
For the case listed above, a MultiOutput Buffer shouldn't be allocated. We introduce the outputs_removed attribute in the CppTemplateBuffer. This attribute tracks MultiOutput Buffers that are directly used by out-template epilogues. During code generation, if a MultiOutput Buffer is listed in outputs_removed, its buffer allocation line is omitted to prevent unnecessary memory usage.

Test Plan

python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_grouped_linear_epilogue

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-12-27T10:33:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143904

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 71d497a with merge base 744a303 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: f3c3ee0 Pull Request resolved: #143904

[ghstack-poisoned]

ghstack-source-id: 311e548 Pull Request resolved: #143904

[ghstack-poisoned]

ghstack-source-id: 317bc2a Pull Request resolved: #143904

ghstack-source-id: f3c3ee0 Pull Request resolved: pytorch#143904

Update

c1e2482

[ghstack-poisoned]

leslie-fang-intel mentioned this pull request Dec 27, 2024

[Inductor][CPP] Enable Grouped GEMM Template #143796

Closed

leslie-fang-intel mentioned this pull request Dec 27, 2024

[Inductor][CPP] Enable Epilogue Fusion for Grouped GEMM Template #143897

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Dec 27, 2024

leslie-fang-intel marked this pull request as draft December 27, 2024 10:33

leslie-fang-intel added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Dec 27, 2024

Update

2da3b9e

[ghstack-poisoned]

leslie-fang-intel added a commit that referenced this pull request Dec 27, 2024

[Inductor][CPP] Remove redundant Buffers after Group GEMM Fusion

de4aad2

ghstack-source-id: f3c3ee0 Pull Request resolved: #143904

pytorchbot added the open source label Dec 27, 2024

Update

0b9d97e

[ghstack-poisoned]

leslie-fang-intel added a commit that referenced this pull request Dec 30, 2024

[Inductor][CPP] Remove redundant Buffers after Group GEMM Fusion

82c8f8c

ghstack-source-id: 311e548 Pull Request resolved: #143904

Update

277257b

[ghstack-poisoned]

Update

40d9b20

[ghstack-poisoned]

Update

ecbbbdd

[ghstack-poisoned]

Update

612aae4

[ghstack-poisoned]

leslie-fang-intel marked this pull request as ready for review December 31, 2024 05:57

leslie-fang-intel requested review from jgong5 and chunyuan-w December 31, 2024 06:12

leslie-fang-intel changed the title ~~[Inductor][CPP] Remove redundant Buffers after Group GEMM Fusion~~ [Inductor][CPP] Remove redundant Buffers after Grouped GEMM Fusion Dec 31, 2024

Update

71d497a

[ghstack-poisoned]

leslie-fang-intel added a commit that referenced this pull request Jan 2, 2025

[Inductor][CPP] Remove redundant Buffers after Group GEMM Fusion

abf9aa8

ghstack-source-id: 317bc2a Pull Request resolved: #143904

leslie-fang-intel closed this Jan 3, 2025

github-actions bot deleted the gh/leslie-fang-intel/176/head branch February 3, 2025 02:03

sunjiweiswift pushed a commit to sunjiweiswift/pytorch that referenced this pull request Jul 4, 2025

[Inductor][CPP] Remove redundant Buffers after Group GEMM Fusion

e97bc71

ghstack-source-id: f3c3ee0 Pull Request resolved: pytorch#143904

sunjiweiswift pushed a commit to sunjiweiswift/pytorch that referenced this pull request Aug 1, 2025

[Inductor][CPP] Remove redundant Buffers after Group GEMM Fusion

59c42a9

ghstack-source-id: f3c3ee0 Pull Request resolved: pytorch#143904

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor][CPP] Remove redundant Buffers after Grouped GEMM Fusion #143904

[Inductor][CPP] Remove redundant Buffers after Grouped GEMM Fusion #143904

Uh oh!

leslie-fang-intel commented Dec 27, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 27, 2024 •

edited

Loading

Uh oh!

Uh oh!

[Inductor][CPP] Remove redundant Buffers after Grouped GEMM Fusion #143904

[Inductor][CPP] Remove redundant Buffers after Grouped GEMM Fusion #143904

Uh oh!

Conversation

leslie-fang-intel commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143904

✅ No Failures

Uh oh!

Uh oh!

leslie-fang-intel commented Dec 27, 2024 •

edited

Loading

pytorch-bot bot commented Dec 27, 2024 •

edited

Loading