Skip to content

Unfuse bias add before pointwise ops #106912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

eellison
Copy link
Contributor

@eellison eellison commented Aug 9, 2023

Stack from ghstack (oldest at bottom):

I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬

Kicking off another run in cublas 12, it's possible that the results have changed since.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 9, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106912

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit 3f14bcd with merge base ed07821 (image):

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@vadimkantorov
Copy link
Contributor

for some ops, cublasLt can do mm+add+relu by itself, right?

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
eellison added a commit that referenced this pull request Aug 9, 2023
ghstack-source-id: 41f16bf
Pull Request resolved: #106912
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
eellison added a commit that referenced this pull request Aug 11, 2023
ghstack-source-id: 37dd68a
Pull Request resolved: #106912
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
@eellison eellison added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 16, 2023
@eellison
Copy link
Contributor Author

@pytorchbot merge -f "unrelated"

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 16, 2023

You need to provide a reason for using force merge, in the format @pytorchbot merge -f 'Explanation'.
The explanation needs to be clear on why this is needed. Here are some good examples:

  • Bypass checks due to unrelated upstream failures from ...
  • This is a minor fix to ..., which shouldn't break anything
  • This is pre-tested in a previous CI run
  • Bypass flaky ... check

@eellison eellison added the topic: not user facing topic category label Aug 16, 2023
@eellison
Copy link
Contributor Author

@pytorchbot merge -f "unrelated failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

summerdo pushed a commit to summerdo/pytorch that referenced this pull request Aug 17, 2023
I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬

Kicking off another run in cublas 12, it's possible that the results have changed since.

Pull Request resolved: pytorch#106912
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#106911
pytorchmergebot pushed a commit that referenced this pull request Aug 17, 2023
This can lead to a large speedup when max autotune is set, e.g. resnet 2.1x -> 2.5x, particularly in combination with freezing.

Pull Request resolved: #107004
Approved by: https://github.com/jansel, https://github.com/shunting314, https://github.com/int3
ghstack dependencies: #106911, #106912
@ZainRizvi
Copy link
Contributor

@pytorchbot merge -f "unrelated failure"

Hi @eellison, fyi this merge didn't need the -f flag. If the [Dr. CI comment up top]#106912 (comment)) shows you the green checkmark, it means it's detected that the failures are not your fault and pytorchbot will let the regular merge command succeed

huydhn added a commit to pytorch/test-infra that referenced this pull request Aug 17, 2023
This is a proposal to remove no-op force merges masquerading as
impatient force merges. IMO, a legit force merge needs to satisfy one of
the two conditions below:

1. `skip_mandatory_checks` is true (-f) and `failed_checks_count > 0`
(with failures) or `pending_checks_count > 0` (impatience). Under this
condition, If a force merge (-f) is done when there is no failure and
all jobs have finished, it's arguably just a regular merge in disguise.
This is the main point of this proposal. Here is an example of a no-op
force merge
pytorch/pytorch#106912 (comment)
(out of many)
2. `ignore_current` is true (-i) and `is_failed` is false (indicating a
successful merge) and `ignored_checks_count > 0` (with failures). As
`-i` still waits for all remaining jobs to finish, this shouldn't be
counted toward force merge due to impatience

If none applies, the merge should be counted as a regular merge
regardless of the use of `-f` or `-i`. We could track that (regular
merges masquerading as force merges) to understand how devs use (or
abuse) these flags, but that should be tracked separately IMO because
it's arguably a different case altogether.

Technically, this uses `pending_checks` field from the `merges`
collection that we haven't made use of till date to remove a significant
portion of no-op force merges masquerading as impatient force merges.
IMO, impatient force merges is not the negative part of force merge with
failures, thus the way these queries are atm is a bit wrong (a force
merge is either due to failures or due to impatience, there is no
other). This PR fixes them according to the definition above.

### Testing


https://torchci-git-fork-huydhn-force-merge-no-pending-fbopensource.vercel.app/metrics.
As expected, % force merges with failures remains unchanged but we get a
more accurate view of % impatient force merges. The latter now requires
`pending_checks_count > 0`.

Same goes for
https://torchci-git-fork-huydhn-force-merge-no-pending-fbopensource.vercel.app/kpis.
@facebook-github-bot facebook-github-bot deleted the gh/eellison/513/head branch August 20, 2023 14:16
@ezyang
Copy link
Contributor

ezyang commented Aug 20, 2023

@eellison can you link your perf runs on the PRs? Thanks

@ezyang
Copy link
Contributor

ezyang commented Aug 20, 2023

It looks like sam also does worse with this change.

@eellison
Copy link
Contributor Author

eellison commented Aug 21, 2023

@ezyang on master I benchmarked with and without this commit and could not repro a sam regression, but I could repro an gnn_edgecnn regression.

Also, yes, it is expected that this doesn't universally cause speedups. For more guaranteed wins that overcome limitations of local heuristics, use MAX_AUTOTUNE.

eellison added a commit to eellison/pytorch that referenced this pull request Mar 8, 2024
ghstack-source-id: 69c827f
Pull Request resolved: pytorch#106912
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants