Unfuse bias add before pointwise ops #106912

eellison · 2023-08-09T21:41:51Z

Stack from ghstack (oldest at bottom):

I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬

Kicking off another run in cublas 12, it's possible that the results have changed since.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

[ghstack-poisoned]

pytorch-bot · 2023-08-09T21:41:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106912

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit 3f14bcd with merge base ed07821 ():

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vadimkantorov · 2023-08-09T22:03:50Z

for some ops, cublasLt can do mm+add+relu by itself, right?

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: 41f16bf Pull Request resolved: #106912

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: 37dd68a Pull Request resolved: #106912

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

eellison · 2023-08-16T17:19:48Z

@pytorchbot merge -f "unrelated"

pytorch-bot · 2023-08-16T17:19:51Z

You need to provide a reason for using force merge, in the format @pytorchbot merge -f 'Explanation'.
The explanation needs to be clear on why this is needed. Here are some good examples:

Bypass checks due to unrelated upstream failures from ...
This is a minor fix to ..., which shouldn't break anything
This is pre-tested in a previous CI run
Bypass flaky ... check

eellison · 2023-08-16T17:20:44Z

@pytorchbot merge -f "unrelated failure"

pytorchmergebot · 2023-08-16T17:22:18Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬 Kicking off another run in cublas 12, it's possible that the results have changed since. Pull Request resolved: pytorch#106912 Approved by: https://github.com/jansel ghstack dependencies: pytorch#106911

This can lead to a large speedup when max autotune is set, e.g. resnet 2.1x -> 2.5x, particularly in combination with freezing. Pull Request resolved: #107004 Approved by: https://github.com/jansel, https://github.com/shunting314, https://github.com/int3 ghstack dependencies: #106911, #106912

ZainRizvi · 2023-08-17T16:35:06Z

@pytorchbot merge -f "unrelated failure"

Hi @eellison, fyi this merge didn't need the -f flag. If the [Dr. CI comment up top]#106912 (comment)) shows you the green checkmark, it means it's detected that the failures are not your fault and pytorchbot will let the regular merge command succeed

This is a proposal to remove no-op force merges masquerading as impatient force merges. IMO, a legit force merge needs to satisfy one of the two conditions below: 1. `skip_mandatory_checks` is true (-f) and `failed_checks_count > 0` (with failures) or `pending_checks_count > 0` (impatience). Under this condition, If a force merge (-f) is done when there is no failure and all jobs have finished, it's arguably just a regular merge in disguise. This is the main point of this proposal. Here is an example of a no-op force merge pytorch/pytorch#106912 (comment) (out of many) 2. `ignore_current` is true (-i) and `is_failed` is false (indicating a successful merge) and `ignored_checks_count > 0` (with failures). As `-i` still waits for all remaining jobs to finish, this shouldn't be counted toward force merge due to impatience If none applies, the merge should be counted as a regular merge regardless of the use of `-f` or `-i`. We could track that (regular merges masquerading as force merges) to understand how devs use (or abuse) these flags, but that should be tracked separately IMO because it's arguably a different case altogether. Technically, this uses `pending_checks` field from the `merges` collection that we haven't made use of till date to remove a significant portion of no-op force merges masquerading as impatient force merges. IMO, impatient force merges is not the negative part of force merge with failures, thus the way these queries are atm is a bit wrong (a force merge is either due to failures or due to impatience, there is no other). This PR fixes them according to the definition above. ### Testing https://torchci-git-fork-huydhn-force-merge-no-pending-fbopensource.vercel.app/metrics. As expected, % force merges with failures remains unchanged but we get a more accurate view of % impatient force merges. The latter now requires `pending_checks_count > 0`. Same goes for https://torchci-git-fork-huydhn-force-merge-no-pending-fbopensource.vercel.app/kpis.

ezyang · 2023-08-20T23:25:36Z

@eellison can you link your perf runs on the PRs? Thanks

ezyang · 2023-08-20T23:27:39Z

Do you expect this to always be a win? It seems to fairly unambiguously regress basic_gnn_edgecnn

which also agrees with your local run https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Sun%2C%2013%20Aug%202023%2023%3A25%3A52%20GMT&stopTime=Sun%2C%2020%20Aug%202023%2023%3A25%3A52%20GMT&granularity=hour&mode=inference&dtype=bfloat16&lBranch=gh/eellison/513/head&lCommit=3f14bcd1aefba2cd2fbac622669381e2bcc45b26&rBranch=main&rCommit=2b1058c54273a73ed407f2a6495063e9ef18b54f

ezyang · 2023-08-20T23:37:50Z

It looks like sam also does worse with this change.

eellison · 2023-08-21T23:45:44Z

@ezyang on master I benchmarked with and without this commit and could not repro a sam regression, but I could repro an gnn_edgecnn regression.

Also, yes, it is expected that this doesn't universally cause speedups. For more guaranteed wins that overcome limitations of local heuristics, use MAX_AUTOTUNE.

ghstack-source-id: 69c827f Pull Request resolved: pytorch#106912

Unfuse bias add before pointwise ops

31b73d8

[ghstack-poisoned]

eellison mentioned this pull request Aug 9, 2023

Generate mypy hints for torch.Tag, add a couple of pointwise ops #106910

Closed

eellison mentioned this pull request Aug 9, 2023

Make Nd tensors hit fused addmm pass #106911

Closed

github-actions bot added module: inductor ciflow/inductor labels Aug 9, 2023

eellison closed this Aug 9, 2023

eellison reopened this Aug 9, 2023

Update on "Unfuse bias add before pointwise ops"

c9b5ad2

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

eellison added a commit that referenced this pull request Aug 9, 2023

Unfuse bias add before pointwise ops

04b28d3

ghstack-source-id: 41f16bf Pull Request resolved: #106912

Update on "Unfuse bias add before pointwise ops"

5c67293

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

eellison added a commit that referenced this pull request Aug 11, 2023

Unfuse bias add before pointwise ops

982d0d1

ghstack-source-id: 37dd68a Pull Request resolved: #106912

This was referenced Aug 11, 2023

tmp test #107003

Closed

Enable Lowering Channels last Conv1x1 when max autotune is set #107004

Closed

Update on "Unfuse bias add before pointwise ops"

3f14bcd

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

eellison requested review from aakhundov, jansel and Chillee August 15, 2023 19:11

jansel approved these changes Aug 15, 2023

View reviewed changes

eellison added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 16, 2023

eellison added the topic: not user facing topic category label Aug 16, 2023

pytorchmergebot added the merging label Aug 16, 2023

pytorchmergebot added Merged and removed merging labels Aug 16, 2023

pytorchmergebot closed this in e9ae820 Aug 16, 2023

huydhn mentioned this pull request Aug 17, 2023

Remove no-op force merges masquerading as impatient force merges pytorch/test-infra#4486

Merged

facebook-github-bot deleted the gh/eellison/513/head branch August 20, 2023 14:16

eellison added a commit to eellison/pytorch that referenced this pull request Mar 8, 2024

Unfuse bias add before pointwise ops

0509a59

ghstack-source-id: 69c827f Pull Request resolved: pytorch#106912

AaronWang04 mentioned this pull request Jul 12, 2025

[Inductor] addmm + activation function fusion #158137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unfuse bias add before pointwise ops #106912

Unfuse bias add before pointwise ops #106912

Uh oh!

eellison commented Aug 9, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 9, 2023 •

edited

Loading

Uh oh!

vadimkantorov commented Aug 9, 2023

Uh oh!

eellison commented Aug 16, 2023

Uh oh!

pytorch-bot bot commented Aug 16, 2023

Uh oh!

eellison commented Aug 16, 2023

Uh oh!

pytorchmergebot commented Aug 16, 2023

Uh oh!

ZainRizvi commented Aug 17, 2023

Uh oh!

ezyang commented Aug 20, 2023

Uh oh!

ezyang commented Aug 20, 2023

Uh oh!

ezyang commented Aug 20, 2023

Uh oh!

eellison commented Aug 21, 2023 •

edited

Loading

Uh oh!

Uh oh!

Unfuse bias add before pointwise ops #106912

Unfuse bias add before pointwise ops #106912

Uh oh!

Conversation

eellison commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106912

✅ 1 Unrelated Failure

Uh oh!

vadimkantorov commented Aug 9, 2023

Uh oh!

eellison commented Aug 16, 2023

Uh oh!

pytorch-bot bot commented Aug 16, 2023

Uh oh!

eellison commented Aug 16, 2023

Uh oh!

pytorchmergebot commented Aug 16, 2023

Merge started

Uh oh!

ZainRizvi commented Aug 17, 2023

Uh oh!

ezyang commented Aug 20, 2023

Uh oh!

ezyang commented Aug 20, 2023

Uh oh!

ezyang commented Aug 20, 2023

Uh oh!

eellison commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eellison commented Aug 9, 2023 •

edited

Loading

pytorch-bot bot commented Aug 9, 2023 •

edited

Loading

eellison commented Aug 21, 2023 •

edited

Loading