-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Unfuse bias add before pointwise ops #106912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106912
Note: Links to docs will display an error until the docs builds have been completed. ✅ 1 Unrelated FailureAs of commit 3f14bcd with merge base ed07821 ( UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
for some ops, cublasLt can do mm+add+relu by itself, right? |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
@pytorchbot merge -f "unrelated" |
You need to provide a reason for using force merge, in the format @pytorchbot merge -f 'Explanation'.
|
@pytorchbot merge -f "unrelated failure" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬 Kicking off another run in cublas 12, it's possible that the results have changed since. Pull Request resolved: pytorch#106912 Approved by: https://github.com/jansel ghstack dependencies: pytorch#106911
This can lead to a large speedup when max autotune is set, e.g. resnet 2.1x -> 2.5x, particularly in combination with freezing. Pull Request resolved: #107004 Approved by: https://github.com/jansel, https://github.com/shunting314, https://github.com/int3 ghstack dependencies: #106911, #106912
Hi @eellison, fyi this merge didn't need the |
This is a proposal to remove no-op force merges masquerading as impatient force merges. IMO, a legit force merge needs to satisfy one of the two conditions below: 1. `skip_mandatory_checks` is true (-f) and `failed_checks_count > 0` (with failures) or `pending_checks_count > 0` (impatience). Under this condition, If a force merge (-f) is done when there is no failure and all jobs have finished, it's arguably just a regular merge in disguise. This is the main point of this proposal. Here is an example of a no-op force merge pytorch/pytorch#106912 (comment) (out of many) 2. `ignore_current` is true (-i) and `is_failed` is false (indicating a successful merge) and `ignored_checks_count > 0` (with failures). As `-i` still waits for all remaining jobs to finish, this shouldn't be counted toward force merge due to impatience If none applies, the merge should be counted as a regular merge regardless of the use of `-f` or `-i`. We could track that (regular merges masquerading as force merges) to understand how devs use (or abuse) these flags, but that should be tracked separately IMO because it's arguably a different case altogether. Technically, this uses `pending_checks` field from the `merges` collection that we haven't made use of till date to remove a significant portion of no-op force merges masquerading as impatient force merges. IMO, impatient force merges is not the negative part of force merge with failures, thus the way these queries are atm is a bit wrong (a force merge is either due to failures or due to impatience, there is no other). This PR fixes them according to the definition above. ### Testing https://torchci-git-fork-huydhn-force-merge-no-pending-fbopensource.vercel.app/metrics. As expected, % force merges with failures remains unchanged but we get a more accurate view of % impatient force merges. The latter now requires `pending_checks_count > 0`. Same goes for https://torchci-git-fork-huydhn-force-merge-no-pending-fbopensource.vercel.app/kpis.
@eellison can you link your perf runs on the PRs? Thanks |
Do you expect this to always be a win? It seems to fairly unambiguously regress basic_gnn_edgecnn ![]() |
It looks like sam also does worse with this change. |
@ezyang on master I benchmarked with and without this commit and could not repro a sam regression, but I could repro an gnn_edgecnn regression. Also, yes, it is expected that this doesn't universally cause speedups. For more guaranteed wins that overcome limitations of local heuristics, use |
ghstack-source-id: 69c827f Pull Request resolved: pytorch#106912
Stack from ghstack (oldest at bottom):
I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬
Kicking off another run in cublas 12, it's possible that the results have changed since.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov