Skip to content

[WIP]port sevearl test files under test/distributed to Intel GPU #159473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

daisyden
Copy link
Collaborator

@daisyden daisyden commented Jul 30, 2025

For #114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:

  • instantiate_device_type_tests()
  • use "torch.accelerator.current_accelerator()" to determine the accelerator backend
  • use requires_accelerator_dist_backend to allow both nccl and xccl test
  • enabled XPU for some test path
  • Change the hardcoded world_size according to device_count.
  • Unify some common code under torch/testing/_internal for multiple backend, for example:
    Added xpu for Backend.backend_capability and dist.Backend.register_backend()

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @gujinghui @EikanWang @fengyuan14 @guangyey

Copy link

pytorch-bot bot commented Jul 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159473

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit daf3833 with merge base 512b473 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Jul 30, 2025
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Jul 30, 2025
@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 31, 2025
@etaf etaf added the ciflow/xpu Run XPU CI tasks label Jul 31, 2025
@daisyden
Copy link
Collaborator Author

daisyden commented Aug 1, 2025

@pytorchbot label "ciflow/xpu"

Copy link

pytorch-bot bot commented Aug 1, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'ciflow/xpu' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

@daisyden
Copy link
Collaborator Author

daisyden commented Aug 1, 2025

@pytorchbot label "module: xpu"
@pytorchbot label "triaged"

@pytorch-bot pytorch-bot bot added the module: xpu Intel XPU related issues label Aug 1, 2025
@daisyden daisyden added the keep-going Don't stop on first failure, keep running tests until the end label Aug 7, 2025
@daisyden
Copy link
Collaborator Author

daisyden commented Aug 7, 2025

@pytorchbot rebase

Copy link

pytorch-bot bot commented Aug 7, 2025

Didn't find following labels among repository labels: rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/xpu Run XPU CI tasks keep-going Don't stop on first failure, keep running tests until the end module: xpu Intel XPU related issues oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (fsdp) release notes category
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants