[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel #156140

eqy · 2025-06-17T00:18:37Z

The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN

#155225

cc @csarofeen @ptrblck @xwang233 @msaroufim @jerryzh168 @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2025-06-17T00:18:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156140

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 03c052e with merge base ba37f58 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4) (gh) (similar failure)
distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
/var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel · 2025-06-17T16:38:16Z

Out of curiosity, how did torch.compile work in the original bug? Shouldn't it hit the same error?

eqy · 2025-06-17T23:54:11Z

Would it show up if the problematic convolution got folded into a triton kernel fusion?

eqy · 2025-06-18T17:25:07Z

@pytorchmergebot merge

pytorchmergebot · 2025-06-18T17:27:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ngimel · 2025-06-18T17:31:54Z

Without max-autotune triton won't generate convolution kernels. Even with max-autotune, kernels are usually pretty slow, so unlikely one would be 1) picked up 2) produce correct results with large tensors

atalman · 2025-06-19T15:07:53Z

@pytorchmergebot revert -c ghfirst -m "breaks internal builds"

pytorchmergebot · 2025-06-19T15:09:22Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

… dispatch condition to match native kernel (#156140)" This reverts commit a5f59cc. Reverted #156140 on behalf of https://github.com/atalman due to breaks internal builds ([comment](#156140 (comment)))

pytorchmergebot · 2025-06-19T15:09:32Z

@eqy your PR has been successfully reverted.

atalman · 2025-06-19T15:15:19Z

Here is the exception:

Signal 6 (SIGABRT) 
exception stack complete
terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [PG ID 1 PG GUID 1 Rank 0] Process group watchdog thread terminated with exception: std::future_error: Broken promise
Exception raised from run at fbcode/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2042 (most recent call first):

# 2  c10d::ProcessGroupNCCL::Watchdog::run()
# 3  execute_native_thread_routine
# 4  start_thread
# 5  __clone3

eqy · 2025-07-03T04:00:45Z

@pytorchmergebot rebase

pytorchmergebot · 2025-07-03T04:02:13Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-07-03T04:02:16Z

Successfully rebased depthwisecudnncanuse64 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout depthwisecudnncanuse64 && git pull --rebase)

facebook-github-bot · 2025-07-15T17:52:16Z

@atalman has imported this pull request. If you are a Meta employee, you can view this in D78355783.

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jun 17, 2025

eqy marked this pull request as ready for review June 17, 2025 00:18

eqy mentioned this pull request Jun 17, 2025

canUse32BitIndexMath set to False with efficient net #155225

Open

eqy added the release notes: cuda release notes category label Jun 17, 2025

eqy changed the title ~~[cuDNN][64-bit indexing] update 64bit depthwise indexing dispatch condition to match kernel~~ [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel Jun 17, 2025

colesbury requested a review from ngimel June 17, 2025 12:28

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 17, 2025

ngimel approved these changes Jun 17, 2025

View reviewed changes

pytorchmergebot added the merging label Jun 18, 2025

pytorchmergebot closed this in a5f59cc Jun 18, 2025

pytorchmergebot added Merged and removed merging labels Jun 18, 2025

pytorchmergebot added the Reverted label Jun 19, 2025

pytorchmergebot added the ci-no-td Do not run TD on this PR label Jun 19, 2025

pytorchmergebot reopened this Jun 19, 2025

eqy added 2 commits July 3, 2025 04:02

check in

3447570

lint

58267d4

pytorchmergebot force-pushed the depthwisecudnncanuse64 branch from bbc5dc1 to 58267d4 Compare July 3, 2025 04:02

Merge branch 'main' into depthwisecudnncanuse64

03c052e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel #156140

[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel #156140

Uh oh!

eqy commented Jun 17, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

ngimel commented Jun 17, 2025

Uh oh!

eqy commented Jun 17, 2025

Uh oh!

eqy commented Jun 18, 2025

Uh oh!

pytorchmergebot commented Jun 18, 2025

Uh oh!

ngimel commented Jun 18, 2025

Uh oh!

atalman commented Jun 19, 2025

Uh oh!

pytorchmergebot commented Jun 19, 2025

Uh oh!

pytorchmergebot commented Jun 19, 2025

Uh oh!

atalman commented Jun 19, 2025

Uh oh!

eqy commented Jul 3, 2025

Uh oh!

pytorchmergebot commented Jul 3, 2025

Uh oh!

pytorchmergebot commented Jul 3, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

Uh oh!

[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel #156140

Are you sure you want to change the base?

[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel #156140

Uh oh!

Conversation

eqy commented Jun 17, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156140

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

ngimel commented Jun 17, 2025

Uh oh!

eqy commented Jun 17, 2025

Uh oh!

eqy commented Jun 18, 2025

Uh oh!

pytorchmergebot commented Jun 18, 2025

Merge started

Uh oh!

ngimel commented Jun 18, 2025

Uh oh!

atalman commented Jun 19, 2025

Uh oh!

pytorchmergebot commented Jun 19, 2025

Uh oh!

pytorchmergebot commented Jun 19, 2025

Uh oh!

atalman commented Jun 19, 2025

Uh oh!

eqy commented Jul 3, 2025

Uh oh!

pytorchmergebot commented Jul 3, 2025

Uh oh!

pytorchmergebot commented Jul 3, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

Uh oh!

eqy commented Jun 17, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 17, 2025 •

edited

Loading