Make Nd tensors hit fused addmm pass #106911

eellison · 2023-08-09T21:41:47Z

Stack from ghstack (oldest at bottom):

Replace #106433 since I had a bad cla commit.

Speeds up eager convnext bfloat16 inference by 35%., and eager timm bfloat16 inference average by .5%

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305

[ghstack-poisoned]

pytorch-bot · 2023-08-09T21:41:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106911

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit 4adfdee with merge base ed07821 ():

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison · 2023-08-09T22:29:24Z

test/inductor/test_mkldnn_pattern_matcher.py

+                mod_c = torch.compile(mod)
+                out, code = run_and_get_code(mod_c, v, other)
+                self.assertEqual(out, mod(v, other), rtol=1e-2, atol=1e-2)
+                # TODO - assert fusions work code


cc @jgong5, would you mind taking a look at this ?

Replace #106433 since I had a bad cla commit. Speeds up eager convnext bfloat16 inference by 35%., and eager timm bfloat16 inference average by `.5%` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 [ghstack-poisoned]

ezyang · 2023-08-10T03:07:56Z

test/inductor/test_aot_inductor.py

@@ -152,7 +152,6 @@ def __init__(self):
                )
                self.bn1 = torch.nn.BatchNorm2d(num_features=16)
                self.relu1 = torch.nn.ReLU()
-                self.fc1 = torch.nn.Linear(in_features=1638400, out_features=1)


tell me more?

ezyang · 2023-08-10T03:09:18Z

aten/src/ATen/native/Linear.cpp

 #include <c10/util/MaybeOwned.h>
 #include <ATen/TensorSubclassLikeUtils.h>
+#include <iostream>


dead now right

ezyang · 2023-08-10T03:10:33Z

aten/src/ATen/native/Linear.cpp

-    return result.view_symint({input_sizes[0], input_sizes[1], result.sym_size(1)});
+    // can't use -1 in reshape because it errors when a dimension is 0
+    c10::SymInt flattened_dim = 1;
+    for (size_t i = 0, ndim = input_sizes.size(); i < ndim - 1; ++i) {


ndim - 1 here hazardous because you are using unsigned size_t type, you will overflow if ndim == 0. Just use int64_t, the unsigned type here really is not worth it.

ezyang · 2023-08-10T03:12:37Z

aten/src/ATen/native/Linear.cpp

+    auto inp_reshape = input.reshape_symint({flattened_dim, input_sizes.at(input_sizes.size() -1)});
+    const auto result = at::addmm(bias, inp_reshape, weight.t());
+    auto new_size = input_sizes.slice(0, input_sizes.size() - 1);
+    std::vector<SymInt> sizes_vec(new_size.begin(), new_size.end());


Consider using SymDimVector to avoid the heap allocation

ezyang

okey dokey

Replace #106433 since I had a bad cla commit. Speeds up eager convnext bfloat16 inference by 35%., and eager timm bfloat16 inference average by `.5%` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 [ghstack-poisoned]

eellison · 2023-08-16T17:10:29Z

@pytorchbot merge -f "unrelated failure"

pytorchmergebot · 2023-08-16T17:12:05Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬 Kicking off another run in cublas 12, it's possible that the results have changed since. Pull Request resolved: #106912 Approved by: https://github.com/jansel ghstack dependencies: #106911

Replace pytorch#106433 since I had a bad cla commit. Speeds up eager convnext bfloat16 inference by 35%., and eager timm bfloat16 inference average by `.5%` Pull Request resolved: pytorch#106911 Approved by: https://github.com/ezyang

I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬 Kicking off another run in cublas 12, it's possible that the results have changed since. Pull Request resolved: pytorch#106912 Approved by: https://github.com/jansel ghstack dependencies: pytorch#106911

This can lead to a large speedup when max autotune is set, e.g. resnet 2.1x -> 2.5x, particularly in combination with freezing. Pull Request resolved: #107004 Approved by: https://github.com/jansel, https://github.com/shunting314, https://github.com/int3 ghstack dependencies: #106911, #106912

ghstack-source-id: 603bdd0 Pull Request resolved: pytorch#106911

Make Nd tensors hit fused addmm pass

7381566

[ghstack-poisoned]

This was referenced Aug 9, 2023

Generate mypy hints for torch.Tag, add a couple of pointwise ops #106910

Closed

Unfuse bias add before pointwise ops #106912

Closed

github-actions bot added module: inductor module: dynamo module: export labels Aug 9, 2023

eellison closed this Aug 9, 2023

eellison reopened this Aug 9, 2023

eellison commented Aug 9, 2023

View reviewed changes

eellison requested review from albanD and ezyang August 9, 2023 22:30

ezyang reviewed Aug 10, 2023

View reviewed changes

ezyang added the ciflow/inductor label Aug 10, 2023

ezyang approved these changes Aug 10, 2023

View reviewed changes

This was referenced Aug 11, 2023

tmp test #107003

Closed

Enable Lowering Channels last Conv1x1 when max autotune is set #107004

Closed

albanD removed their request for review August 11, 2023 20:12

eellison added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 16, 2023

pytorchmergebot added merging Merged and removed merging labels Aug 16, 2023

pytorchmergebot closed this in c88775b Aug 16, 2023

facebook-github-bot deleted the gh/eellison/512/head branch August 20, 2023 14:16

eellison added a commit to eellison/pytorch that referenced this pull request Mar 8, 2024

Make Nd tensors hit fused addmm pass

d71e8ba

ghstack-source-id: 603bdd0 Pull Request resolved: pytorch#106911

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make Nd tensors hit fused addmm pass #106911

Make Nd tensors hit fused addmm pass #106911

Uh oh!

eellison commented Aug 9, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 9, 2023 •

edited

Loading

Uh oh!

eellison Aug 9, 2023

Uh oh!

ezyang Aug 10, 2023

Uh oh!

ezyang Aug 10, 2023

Uh oh!

ezyang Aug 10, 2023

Uh oh!

ezyang Aug 10, 2023

Uh oh!

ezyang left a comment

Uh oh!

eellison commented Aug 16, 2023

Uh oh!

pytorchmergebot commented Aug 16, 2023

Uh oh!

Uh oh!

Make Nd tensors hit fused addmm pass #106911

Make Nd tensors hit fused addmm pass #106911

Uh oh!

Conversation

eellison commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106911

✅ 1 Unrelated Failure

Uh oh!

eellison Aug 9, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

eellison commented Aug 16, 2023

Uh oh!

pytorchmergebot commented Aug 16, 2023

Merge started

Uh oh!

Uh oh!

eellison commented Aug 9, 2023 •

edited

Loading

pytorch-bot bot commented Aug 9, 2023 •

edited

Loading