Skip to content

[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel #156140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eqy
Copy link
Collaborator

@eqy eqy commented Jun 17, 2025

The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN

#155225

cc @csarofeen @ptrblck @xwang233 @msaroufim @jerryzh168 @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@eqy eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: cuda Related to torch.cuda, and CUDA support in general module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) open source ciflow/trunk Trigger trunk jobs on your pull request topic: bug fixes topic category labels Jun 17, 2025
@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jun 17, 2025
Copy link

pytorch-bot bot commented Jun 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156140

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 03c052e with merge base ba37f58 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

  • pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
    /var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy eqy marked this pull request as ready for review June 17, 2025 00:18
@eqy eqy added the release notes: cuda release notes category label Jun 17, 2025
@eqy eqy changed the title [cuDNN][64-bit indexing] update 64bit depthwise indexing dispatch condition to match kernel [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel Jun 17, 2025
@colesbury colesbury requested a review from ngimel June 17, 2025 12:28
@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 17, 2025
@ngimel
Copy link
Collaborator

ngimel commented Jun 17, 2025

Out of curiosity, how did torch.compile work in the original bug? Shouldn't it hit the same error?

@eqy
Copy link
Collaborator Author

eqy commented Jun 17, 2025

Would it show up if the problematic convolution got folded into a triton kernel fusion?

@eqy
Copy link
Collaborator Author

eqy commented Jun 18, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ngimel
Copy link
Collaborator

ngimel commented Jun 18, 2025

Without max-autotune triton won't generate convolution kernels. Even with max-autotune, kernels are usually pretty slow, so unlikely one would be 1) picked up 2) produce correct results with large tensors

@atalman
Copy link
Contributor

atalman commented Jun 19, 2025

@pytorchmergebot revert -c ghfirst -m "breaks internal builds"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Jun 19, 2025
… dispatch condition to match native kernel (#156140)"

This reverts commit a5f59cc.

Reverted #156140 on behalf of https://github.com/atalman due to breaks internal builds ([comment](#156140 (comment)))
@pytorchmergebot
Copy link
Collaborator

@eqy your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added the ci-no-td Do not run TD on this PR label Jun 19, 2025
@atalman
Copy link
Contributor

atalman commented Jun 19, 2025

Here is the exception:

Signal 6 (SIGABRT) 
exception stack complete
terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [PG ID 1 PG GUID 1 Rank 0] Process group watchdog thread terminated with exception: std::future_error: Broken promise
Exception raised from run at fbcode/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2042 (most recent call first):

# 2  c10d::ProcessGroupNCCL::Watchdog::run()
# 3  execute_native_thread_routine
# 4  start_thread
# 5  __clone3

@eqy
Copy link
Collaborator Author

eqy commented Jul 3, 2025

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased depthwisecudnncanuse64 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout depthwisecudnncanuse64 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the depthwisecudnncanuse64 branch from bbc5dc1 to 58267d4 Compare July 3, 2025 04:02
@facebook-github-bot
Copy link
Contributor

@atalman has imported this pull request. If you are a Meta employee, you can view this in D78355783.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: cudnn Related to torch.backends.cudnn, and CuDNN support open source release notes: cuda release notes category Reverted topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants