[DTensor] Fix aten.all strategy with min instead of sum as the reduce_op #155420

ooooo-create · 2025-06-08T09:53:21Z

For aten.all, when all_reduce with min, if one rank returns torch.Tensor([False]), the reduction result will be torch.Tensor([False]). Only when all ranks return torch.Tensor([True]) will return torch.Tensor([True]). the behavior matches the aten.all.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

pytorch-bot · 2025-06-08T09:53:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155420

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (7 Unrelated Failures)

As of commit e456fea with merge base 706bc41 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / unit-test / cuda12.8-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #126867 but the issue was closed recently and a rebase is needed to make it pass)
inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_mm_plus_mm
inductor / unit-test / cuda12.8-py3.10-gcc9-sm86 / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu) (gh) (disabled by #147726 but the issue was closed recently and a rebase is needed to make it pass)
distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced
inductor / unit-test / cuda12.8-py3.10-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #147726 but the issue was closed recently and a rebase is needed to make it pass)
distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced
inductor / unit-test / cuda12.8-py3.12-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #147726 but the issue was closed recently and a rebase is needed to make it pass)
distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced
inductor / unit-test / cuda12.8-py3.13-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #147726 but the issue was closed recently and a rebase is needed to make it pass)
distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced
inductor-rocm / rocm-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh) (disabled by #147726 but the issue was closed recently and a rebase is needed to make it pass)
distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3-clang12-executorch / build (gh) (trunk failure)
Final attempt failed. Child_process exited with error code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ooooo-create · 2025-06-08T10:04:20Z

@pytorchbot label "topic: not user facing"

github-actions · 2025-08-09T14:37:36Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

fix all strategy with min instead of sum as reduce_op

e456fea

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jun 8, 2025

pytorchbot added the open source label Jun 8, 2025

pytorch-bot bot added the topic: not user facing topic category label Jun 8, 2025

albanD requested a review from wconstab June 10, 2025 13:41

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 10, 2025

github-actions bot added the Stale label Aug 9, 2025

pytorch-bot bot added the ciflow/inductor label Aug 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DTensor] Fix aten.all strategy with min instead of sum as the reduce_op #155420

[DTensor] Fix aten.all strategy with min instead of sum as the reduce_op #155420

ooooo-create commented Jun 8, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jun 8, 2025 •

edited

Loading

Uh oh!

ooooo-create commented Jun 8, 2025

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

Uh oh!

[DTensor] Fix aten.all strategy with min instead of sum as the reduce_op #155420

Are you sure you want to change the base?

[DTensor] Fix aten.all strategy with min instead of sum as the reduce_op #155420

Conversation

ooooo-create commented Jun 8, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155420

✅ You can merge normally! (7 Unrelated Failures)

Uh oh!

ooooo-create commented Jun 8, 2025

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

Uh oh!

ooooo-create commented Jun 8, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 8, 2025 •

edited

Loading