[WIP][2/N] Port 5 _composable distributed test to Intel GPU #159241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

zxd1997066 wants to merge 1 commit into pytorch:main from zxd1997066:xiangdong/dist_upstream_p2

Contributor

zxd1997066 commented Jul 28, 2025 •

edited

Loading

For #114850, we will port distributed tests to Intel GPU. This is the second PR for _composable cases, the first is #159118.
We could enable Intel GPU with following methods and try the best to keep the original code styles:

Use "torch.accelerator.current_accelerator()" to determine the accelerator backend
Enabled XPU for some test path
Skip some test cases which Intel GPU does not support
Added "cpu:gloo,xpu:xccl" for distributed backend

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @gujinghui @EikanWang @fengyuan14 @guangyey

pytorch-bot bot commented Jul 28, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159241

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9d14e4b with merge base 2dccff7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added the oncall: distributed label

Contributor Author

zxd1997066 commented Jul 28, 2025

@pytorchbot label "topic: not user facing"

pytorch-bot bot added the topic: not user facing label

pytorchbot added the open source label

guangyey added this to PyTorch Intel

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_composability/test_2d_composability.py Outdated

@@ @@ -58,6 +58,11 @@ @@
               from torch.testing._internal.distributed.checkpoint_utils import with_temp_dir
+              device = torch.accelerator.current_accelerator()
+              device_module = torch.get_device_module(device)

Collaborator

guangyey Jul 30, 2025

Where is device_module used?

Contributor Author

zxd1997066 Jul 30, 2025

Where is device_module used?

will remove it.

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_composability/test_2d_composability.py Outdated

		@@ -58,6 +58,11 @@
		from torch.testing._internal.distributed.checkpoint_utils import with_temp_dir


		device = torch.accelerator.current_accelerator()

Collaborator

guangyey Jul 30, 2025

Suggested change

      
            device = torch.accelerator.current_accelerator()
          
            device_type = acc.type if (acc := torch.accelerator.current_accelerator()) else "cpu"

zxd1997066 force-pushed the xiangdong/dist_upstream_p2 branch from 6a5b8de to 4e028bf Compare

July 30, 2025 07:44

zxd1997066 changed the title ~~[WIP][2/N] Port 4 _composable distributed test to Intel GPU~~ [WIP][2/N] Port 5 _composable distributed test to Intel GPU

zxd1997066 force-pushed the xiangdong/dist_upstream_p2 branch from 4e028bf to a055b23 Compare

July 30, 2025 08:45

zxd1997066 marked this pull request as ready for review

July 30, 2025 09:10

Contributor Author

zxd1997066 commented Jul 30, 2025

@pytorchbot label "module: xpu"

pytorch-bot bot added the module: xpu label

Contributor Author

zxd1997066 commented Jul 30, 2025

@pytorchbot label "triaged"

pytorch-bot bot added the triaged label

guangyey added the ciflow/xpu label

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate_with_compiler.py

@@ @@ -98,6 +98,8 @@ def _test_compile( @@
                       self.create_pg(device)
                       torch._dynamo.config.optimize_ddp = "python_reducer"
                       torch.manual_seed(123)
+                      if device_type == "xpu":

Collaborator

guangyey Jul 30, 2025 •

edited

Loading

If without this code change, which test case will fail due to the non-deterministic accuracy issue.

Contributor Author

zxd1997066 Jul 30, 2025

test_compile_backward_only, test_compile_bf16, test_compile_fp16, test_compile_gpu, test_compile_gpu_ac, listed in intel/torch-xpu-ops#1668

If without this code change, which test case will fail due to the non-deterministic accuracy issue.

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate.py Outdated

                       for linear in model.linears:
                           fully_shard(linear)
                       fully_shard(model.linears)
-                      replicate(model, device_id=torch.cuda.current_device())
+                      replicate(model, device_id=device_module.current_device())

Collaborator

guangyey Jul 31, 2025

Suggested change

      
                    replicate(model, device_id=device_module.current_device())
          
                    replicate(model, device_id=torch.accelerator.current_device_index())

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate.py Outdated

@@ @@ -273,14 +283,14 @@ def forward(self, x: torch.Tensor): @@
                               return y
                       self._init_pg()
-                      torch.cuda.set_device(self.rank)
+                      device_module.set_device(self.rank)

Collaborator

guangyey Jul 31, 2025

Suggested change

      
                    device_module.set_device(self.rank)
          
                    torch.accelerator.set_device_index(self.rank)

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate.py Outdated

                       # DDP instance is attached in first pre forward
                       model_cuda(torch.randn(2, 2))
                       replicate_ddp_weakref = replicate.state(model_cuda)._ddp_weakref()
                       self.assertEqual([0], replicate_ddp_weakref.device_ids)
                       # Pass in int as device_id
-                      replicate(model_cuda2, device_id=int(torch.cuda.current_device()))
+                      replicate(model_cuda2, device_id=int(device_module.current_device()))

Collaborator

guangyey Jul 31, 2025

ditto

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate.py Outdated

@@ @@ -233,13 +242,13 @@ def test_replicate_device_id(self): @@
                       # Should be None for CPU training
                       self.assertEqual(None, replicate_ddp_weakref.device_ids)
-                      replicate(model_cuda, device_id=torch.device(torch.cuda.current_device()))
+                      replicate(model_cuda, device_id=torch.device(device_module.current_device()))

Collaborator

guangyey Jul 31, 2025

ditto

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate.py Outdated

                   def test_replicate_ignore_module(self):
                       self._init_pg()
-                      torch.cuda.set_device(self.rank)
+                      device_module.set_device(self.rank)

Collaborator

guangyey Jul 31, 2025

ditto

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate.py Outdated

-                      torch.cuda.set_device(self.rank)
-                      model = MyNet().cuda()
-                      replicate(model, device_id=torch.cuda.current_device())
+                      device_module.set_device(self.rank)

Collaborator

guangyey Jul 31, 2025

ditto

guangyey reviewed

View reviewed changes

test/distributed/_composable/test_replicate.py Outdated

    
                      replicate(model, device_id=torch.cuda.current_device())

                      device_module.set_device(self.rank)

                      model = MyNet().to(device)

                      replicate(model, device_id=device_module.current_device())

Collaborator

guangyey Jul 31, 2025

ditto

guangyey reviewed

View reviewed changes

Collaborator

guangyey left a comment

Please use torch.accelerator API as much as possible.

Contributor Author

zxd1997066 commented Jul 31, 2025

Please use torch.accelerator API as much as possible.

Thanks, will modify it.


          port 5 distributed test to Intel GPU

9d14e4b

zxd1997066 force-pushed the xiangdong/dist_upstream_p2 branch from a055b23 to 9d14e4b Compare

July 31, 2025 07:02

pytorch-bot bot removed the ciflow/xpu label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: xpu oncall: distributed open source topic: not user facing triaged