[WIP]port sevearl test files under test/distributed to Intel GPU #159473

daisyden · 2025-07-30T13:03:09Z

For #114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:

instantiate_device_type_tests()
use "torch.accelerator.current_accelerator()" to determine the accelerator backend
use requires_accelerator_dist_backend to allow both nccl and xccl test
enabled XPU for some test path
Change the hardcoded world_size according to device_count.
Unify some common code under torch/testing/_internal for multiple backend, for example:
Added xpu for Backend.backend_capability and dist.Backend.register_backend()

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @gujinghui @EikanWang @fengyuan14 @guangyey

pytorch-bot · 2025-07-30T13:03:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159473

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit daf3833 with merge base 512b473 ():

NEW FAILURE - The following job has failed:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 4, 6, linux.idc.xpu) (gh)
inductor/test_snode_runtime.py::TestCommAnalysis::test_reduce_scatter_tensor_coalesced

This comment was automatically generated by Dr. CI and updates every 15 minutes.

daisyden · 2025-08-01T06:44:28Z

@pytorchbot label "ciflow/xpu"

pytorch-bot · 2025-08-01T06:44:30Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'ciflow/xpu' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

…aisyden/distributed_s2

daisyden · 2025-08-01T07:03:00Z

@pytorchbot label "module: xpu"
@pytorchbot label "triaged"

…evice arg is None. Revised the _initialized checking in test_store.py and test_c10d_common.py

…aisyden/distributed_s2

…issue when world size is 4

daisyden · 2025-08-07T03:02:44Z

@pytorchbot rebase

pytorch-bot · 2025-08-07T03:02:52Z

Didn't find following labels among repository labels: rebase

…/pytorch into daisyden/distributed_s2

port sevearl test files under test/distributed to Intel GPU

06f62e3

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Jul 30, 2025

pytorchbot added the open source label Jul 30, 2025

resolve conflict

40c2575

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 30, 2025

fix issues exposed in cuda and cpu backend test

1e51121

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 31, 2025

etaf added the ciflow/xpu Run XPU CI tasks label Jul 31, 2025

fix some issues detected in CI

40df11b

pytorch-bot bot added the ciflow/inductor label Aug 1, 2025

daisyden added 2 commits August 1, 2025 06:58

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

7e190cd

…aisyden/distributed_s2

fix conflict

740b271

pytorch-bot bot added the module: xpu Intel XPU related issues label Aug 1, 2025

fix lint issue

94c57bd

guangyey added this to PyTorch Intel Aug 1, 2025

daisyden added 5 commits August 5, 2025 06:48

only add xpu backend when xpu is available in register_backend when d…

bc56799

…evice arg is None. Revised the _initialized checking in test_store.py and test_c10d_common.py

Do not port unsupported test

66b7b9f

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

686b49e

…aisyden/distributed_s2

recover test_2d_mesh_eager_init_subgroup

fe42807

revert world size of DeviceMeshTestNDim to avoid test_device_mesh_nd …

41faad2

…issue when world size is 4

daisyden added the keep-going Don't stop on first failure, keep running tests until the end label Aug 7, 2025

daisyden added 2 commits August 7, 2025 03:15

Merge branch 'daisyden/upstream_rebase' into daisyden/distributed_s2

38162cd

Merge branch 'daisyden/distributed_s2' of https://github.com/daisyden…

42202b2

…/pytorch into daisyden/distributed_s2

daisyden added 4 commits August 7, 2025 05:41

revert the updates for world_size

82daea7

Merge branch 'daisyden/distributed_s2' of https://github.com/daisyden…

e8eebdf

…/pytorch into daisyden/distributed_s2

revert updates for world_size, continued

f3caf7f

fix lint issue

daf3833

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP]port sevearl test files under test/distributed to Intel GPU #159473

[WIP]port sevearl test files under test/distributed to Intel GPU #159473

Uh oh!

daisyden commented Jul 30, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading

Uh oh!

daisyden commented Aug 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 1, 2025

Uh oh!

daisyden commented Aug 1, 2025

Uh oh!

daisyden commented Aug 7, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 7, 2025

Uh oh!

Uh oh!

[WIP]port sevearl test files under test/distributed to Intel GPU #159473

Are you sure you want to change the base?

[WIP]port sevearl test files under test/distributed to Intel GPU #159473

Uh oh!

Conversation

daisyden commented Jul 30, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159473

❌ 1 New Failure

Uh oh!

daisyden commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 1, 2025

Uh oh!

daisyden commented Aug 1, 2025

Uh oh!

daisyden commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 7, 2025

Uh oh!

Uh oh!

daisyden commented Jul 30, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading

daisyden commented Aug 1, 2025 •

edited

Loading

daisyden commented Aug 7, 2025 •

edited

Loading