[1/N]Port 3 distributed/_tools test cases to Intel GPU #159543

libohao1201 · 2025-07-31T03:31:20Z

For #114850, we will port distributed tests to Intel GPU.

We could enable Intel GPU with following methods and try the best to keep the original code styles:

use "torch.accelerator.current_accelerator()" to determine the accelerator backend
enabled XPU for some test path
skip some test cases which Intel GPU does not support

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

We could enable Intel GPU with following methods and try the best to keep the original code styles: 1. use "torch.accelerator.current_accelerator()" to determine the accelerator backend 2. enabled XPU for some test path 3. skip some test cases which Intel GPU does not support

pytorch-bot · 2025-07-31T03:31:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159543

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 4995377 with merge base fc80f68 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 5, 6, linux.idc.xpu) (gh) (disabled by #160243 but the issue was closed recently and a rebase is needed to make it pass)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_xpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-07-31T03:31:24Z

The committers listed above are authorized under a signed CLA.

✅ login: libohao1201 / name: Libo Hao (25e70c1, 6df1197, 7ec8435, 8751b0b, 63cf20f, 70e2b22)
✅ login: guangyey / name: Yu, Guangye (7c9bc06, 4995377)

daisyden · 2025-07-31T05:24:22Z

test/distributed/_tools/test_mem_tracker.py


    @skipIfTorchDynamo("https://github.com/pytorch/pytorch/issues/115653")
-    @unittest.skipIf(not TEST_CUDA, "CUDA not available")
+    @unittest.skipIf(not torch.accelerator.is_available(), "Accelerator not available")


@unittest.skipIf(not TEST_CUDA and not TEST_XPU, "Neither CUDA or XPU is not available")

daisyden · 2025-07-31T05:26:10Z

test/distributed/_tools/test_fsdp2_mem_tracker.py

@@ -77,17 +77,18 @@ def _test_tracker_multi_group(
        mp_policy: MixedPrecisionPolicy,
    ):
        debug = False
-        dev = torch.device(torch.cuda.current_device())
+        dev = torch.device(torch.accelerator.current_device_index())


torch.acceleartor does not apply to cpu.

But I think the cpu has been skipped by @skip_if_lt_x_gpu(2).

guangyey

I introduce a few memory-related APIs under torch.accelerator in #152932
We could use torch.accelerator APIs instead of get_device_module once #152932 landed.

d4l3k

LGTM

guangyey · 2025-08-12T01:53:19Z

@pytorchbot rebase

pytorchmergebot · 2025-08-12T01:54:44Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

We could enable Intel GPU with following methods and try the best to keep the original code styles: 1. use "torch.accelerator.current_accelerator()" to determine the accelerator backend 2. enabled XPU for some test path 3. skip some test cases which Intel GPU does not support

pytorchmergebot · 2025-08-12T01:54:47Z

Successfully rebased libo/distributed_ut_p1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout libo/distributed_ut_p1 && git pull --rebase)

guangyey · 2025-08-12T05:21:52Z

@libohao1201 please help fix the lint error.

…01/pytorch into libo/distributed_ut_p1

pytorch-bot · 2025-08-13T03:22:59Z

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

test/distributed/_tools/test_fsdp2_mem_tracker.py

test/distributed/_tools/test_memory_tracker.py

guangyey · 2025-08-13T07:36:21Z

@pytorchbot merge

pytorchmergebot · 2025-08-13T07:38:11Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

guangyey · 2025-08-13T07:43:52Z

@libohao1201 you need to sign EasyCLA before land this PR.

libohao1201 · 2025-08-13T08:47:39Z

@libohao1201 you need to sign EasyCLA before land this PR.

Done.

guangyey · 2025-08-13T09:10:05Z

@pytorchbot merge

pytorchmergebot · 2025-08-13T09:12:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Jul 31, 2025

pytorchbot added the open source label Jul 31, 2025

daisyden reviewed Jul 31, 2025

View reviewed changes

Specify GPU device to make sure cuda and xpu are tested.

70e2b22

guangyey approved these changes Aug 5, 2025

View reviewed changes

guangyey changed the title ~~[WIP][1/N]Port 3 distributed/_tools test cases to Intel GPU~~ [1/N]Port 3 distributed/_tools test cases to Intel GPU Aug 5, 2025

guangyey requested a review from d4l3k August 5, 2025 08:16

guangyey added this to PyTorch Intel Aug 5, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Aug 5, 2025

guangyey moved this to Review Required in PyTorch Intel Aug 5, 2025

d4l3k approved these changes Aug 11, 2025

View reviewed changes

libohao1201 added 2 commits August 12, 2025 01:54

Specify GPU device to make sure cuda and xpu are tested.

63cf20f

pytorchmergebot force-pushed the libo/distributed_ut_p1 branch from 70e2b22 to 63cf20f Compare August 12, 2025 01:54

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Aug 12, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Aug 12, 2025

guangyey moved this from Review Required to Approved in PyTorch Intel Aug 12, 2025

Merge branch 'libo/distributed_ut_p1' of https://github.com/libohao12…

7ec8435

…01/pytorch into libo/distributed_ut_p1

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Aug 13, 2025

Fix lint errors

6df1197

daisyden added the ciflow/xpu Run XPU CI tasks label Aug 13, 2025

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Aug 13, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Aug 13, 2025

guangyey reviewed Aug 13, 2025

View reviewed changes

test/distributed/_tools/test_fsdp2_mem_tracker.py Show resolved Hide resolved

Update test/distributed/_tools/test_fsdp2_mem_tracker.py

7c9bc06

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Aug 13, 2025

guangyey reviewed Aug 13, 2025

View reviewed changes

test/distributed/_tools/test_memory_tracker.py Outdated Show resolved Hide resolved

Update test/distributed/_tools/test_memory_tracker.py

4995377

guangyey added the ciflow/xpu Run XPU CI tasks label Aug 13, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 13, 2025

pytorchmergebot added the merging label Aug 13, 2025

pytorchmergebot removed the merging label Aug 13, 2025

pytorchmergebot added the merging label Aug 13, 2025

pytorchmergebot added the Merged label Aug 13, 2025

pytorchmergebot closed this in ee1b041 Aug 13, 2025

github-project-automation bot moved this from Approved to Done in PyTorch Intel Aug 13, 2025

pytorchmergebot removed the merging label Aug 13, 2025

[1/N]Port 3 distributed/_tools test cases to Intel GPU #159543

[1/N]Port 3 distributed/_tools test cases to Intel GPU #159543

Conversation

libohao1201 commented Jul 31, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159543

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

linux-foundation-easycla bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daisyden Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

daisyden Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

libohao1201 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

d4l3k left a comment

Choose a reason for hiding this comment

Uh oh!

guangyey commented Aug 12, 2025

Uh oh!

pytorchmergebot commented Aug 12, 2025

Uh oh!

pytorchmergebot commented Aug 12, 2025

Uh oh!

guangyey commented Aug 12, 2025

Uh oh!

pytorch-bot bot commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

guangyey commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge failed

Uh oh!

guangyey commented Aug 13, 2025

Uh oh!

libohao1201 commented Aug 13, 2025

Uh oh!

guangyey commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge started

Uh oh!

Uh oh!

libohao1201 commented Jul 31, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 31, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jul 31, 2025 •

edited

Loading