Skip to content

[WIP][2/N] Port 5 _composable distributed test to Intel GPU #159241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zxd1997066
Copy link
Contributor

@zxd1997066 zxd1997066 commented Jul 28, 2025

For #114850, we will port distributed tests to Intel GPU. This is the second PR for _composable cases, the first is #159118.
We could enable Intel GPU with following methods and try the best to keep the original code styles:

  • Use "torch.accelerator.current_accelerator()" to determine the accelerator backend
  • Enabled XPU for some test path
  • Skip some test cases which Intel GPU does not support
  • Added "cpu:gloo,xpu:xccl" for distributed backend

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @gujinghui @EikanWang @fengyuan14 @guangyey

Copy link

pytorch-bot bot commented Jul 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159241

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9d14e4b with merge base 2dccff7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 28, 2025
@zxd1997066
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@@ -58,6 +58,11 @@
from torch.testing._internal.distributed.checkpoint_utils import with_temp_dir


device = torch.accelerator.current_accelerator()
device_module = torch.get_device_module(device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is device_module used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is device_module used?

will remove it.

@@ -58,6 +58,11 @@
from torch.testing._internal.distributed.checkpoint_utils import with_temp_dir


device = torch.accelerator.current_accelerator()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
device = torch.accelerator.current_accelerator()
device_type = acc.type if (acc := torch.accelerator.current_accelerator()) else "cpu"

@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_upstream_p2 branch from 6a5b8de to 4e028bf Compare July 30, 2025 07:44
@zxd1997066 zxd1997066 changed the title [WIP][2/N] Port 4 _composable distributed test to Intel GPU [WIP][2/N] Port 5 _composable distributed test to Intel GPU Jul 30, 2025
@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_upstream_p2 branch from 4e028bf to a055b23 Compare July 30, 2025 08:45
@zxd1997066 zxd1997066 marked this pull request as ready for review July 30, 2025 09:10
@zxd1997066
Copy link
Contributor Author

@pytorchbot label "module: xpu"

@pytorch-bot pytorch-bot bot added the module: xpu Intel XPU related issues label Jul 30, 2025
@zxd1997066
Copy link
Contributor Author

@pytorchbot label "triaged"

@pytorch-bot pytorch-bot bot added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 30, 2025
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Jul 30, 2025
@@ -98,6 +98,8 @@ def _test_compile(
self.create_pg(device)
torch._dynamo.config.optimize_ddp = "python_reducer"
torch.manual_seed(123)
if device_type == "xpu":
Copy link
Collaborator

@guangyey guangyey Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If without this code change, which test case will fail due to the non-deterministic accuracy issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_compile_backward_only, test_compile_bf16, test_compile_fp16, test_compile_gpu, test_compile_gpu_ac, listed in intel/torch-xpu-ops#1668

If without this code change, which test case will fail due to the non-deterministic accuracy issue.

for linear in model.linears:
fully_shard(linear)
fully_shard(model.linears)
replicate(model, device_id=torch.cuda.current_device())
replicate(model, device_id=device_module.current_device())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
replicate(model, device_id=device_module.current_device())
replicate(model, device_id=torch.accelerator.current_device_index())

@@ -273,14 +283,14 @@ def forward(self, x: torch.Tensor):
return y

self._init_pg()
torch.cuda.set_device(self.rank)
device_module.set_device(self.rank)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
device_module.set_device(self.rank)
torch.accelerator.set_device_index(self.rank)

# DDP instance is attached in first pre forward
model_cuda(torch.randn(2, 2))
replicate_ddp_weakref = replicate.state(model_cuda)._ddp_weakref()
self.assertEqual([0], replicate_ddp_weakref.device_ids)
# Pass in int as device_id
replicate(model_cuda2, device_id=int(torch.cuda.current_device()))
replicate(model_cuda2, device_id=int(device_module.current_device()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -233,13 +242,13 @@ def test_replicate_device_id(self):
# Should be None for CPU training
self.assertEqual(None, replicate_ddp_weakref.device_ids)

replicate(model_cuda, device_id=torch.device(torch.cuda.current_device()))
replicate(model_cuda, device_id=torch.device(device_module.current_device()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

def test_replicate_ignore_module(self):
self._init_pg()
torch.cuda.set_device(self.rank)
device_module.set_device(self.rank)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

torch.cuda.set_device(self.rank)
model = MyNet().cuda()
replicate(model, device_id=torch.cuda.current_device())
device_module.set_device(self.rank)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

replicate(model, device_id=torch.cuda.current_device())
device_module.set_device(self.rank)
model = MyNet().to(device)
replicate(model, device_id=device_module.current_device())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use torch.accelerator API as much as possible.

@zxd1997066
Copy link
Contributor Author

Please use torch.accelerator API as much as possible.

Thanks, will modify it.

@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_upstream_p2 branch from a055b23 to 9d14e4b Compare July 31, 2025 07:02
@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: xpu Intel XPU related issues oncall: distributed Add this issue/PR to distributed oncall triage queue open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants