-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[WIP][2/N] Port 5 _composable distributed test to Intel GPU #159241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159241
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9d14e4b with merge base 2dccff7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "topic: not user facing" |
@@ -58,6 +58,11 @@ | |||
from torch.testing._internal.distributed.checkpoint_utils import with_temp_dir | |||
|
|||
|
|||
device = torch.accelerator.current_accelerator() | |||
device_module = torch.get_device_module(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is device_module
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is
device_module
used?
will remove it.
@@ -58,6 +58,11 @@ | |||
from torch.testing._internal.distributed.checkpoint_utils import with_temp_dir | |||
|
|||
|
|||
device = torch.accelerator.current_accelerator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device = torch.accelerator.current_accelerator() | |
device_type = acc.type if (acc := torch.accelerator.current_accelerator()) else "cpu" |
6a5b8de
to
4e028bf
Compare
4e028bf
to
a055b23
Compare
@pytorchbot label "module: xpu" |
@pytorchbot label "triaged" |
@@ -98,6 +98,8 @@ def _test_compile( | |||
self.create_pg(device) | |||
torch._dynamo.config.optimize_ddp = "python_reducer" | |||
torch.manual_seed(123) | |||
if device_type == "xpu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If without this code change, which test case will fail due to the non-deterministic accuracy issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_compile_backward_only, test_compile_bf16, test_compile_fp16, test_compile_gpu, test_compile_gpu_ac, listed in intel/torch-xpu-ops#1668
If without this code change, which test case will fail due to the non-deterministic accuracy issue.
for linear in model.linears: | ||
fully_shard(linear) | ||
fully_shard(model.linears) | ||
replicate(model, device_id=torch.cuda.current_device()) | ||
replicate(model, device_id=device_module.current_device()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replicate(model, device_id=device_module.current_device()) | |
replicate(model, device_id=torch.accelerator.current_device_index()) |
@@ -273,14 +283,14 @@ def forward(self, x: torch.Tensor): | |||
return y | |||
|
|||
self._init_pg() | |||
torch.cuda.set_device(self.rank) | |||
device_module.set_device(self.rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device_module.set_device(self.rank) | |
torch.accelerator.set_device_index(self.rank) |
# DDP instance is attached in first pre forward | ||
model_cuda(torch.randn(2, 2)) | ||
replicate_ddp_weakref = replicate.state(model_cuda)._ddp_weakref() | ||
self.assertEqual([0], replicate_ddp_weakref.device_ids) | ||
# Pass in int as device_id | ||
replicate(model_cuda2, device_id=int(torch.cuda.current_device())) | ||
replicate(model_cuda2, device_id=int(device_module.current_device())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -233,13 +242,13 @@ def test_replicate_device_id(self): | |||
# Should be None for CPU training | |||
self.assertEqual(None, replicate_ddp_weakref.device_ids) | |||
|
|||
replicate(model_cuda, device_id=torch.device(torch.cuda.current_device())) | |||
replicate(model_cuda, device_id=torch.device(device_module.current_device())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
def test_replicate_ignore_module(self): | ||
self._init_pg() | ||
torch.cuda.set_device(self.rank) | ||
device_module.set_device(self.rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
torch.cuda.set_device(self.rank) | ||
model = MyNet().cuda() | ||
replicate(model, device_id=torch.cuda.current_device()) | ||
device_module.set_device(self.rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
replicate(model, device_id=torch.cuda.current_device()) | ||
device_module.set_device(self.rank) | ||
model = MyNet().to(device) | ||
replicate(model, device_id=device_module.current_device()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use torch.accelerator
API as much as possible.
Thanks, will modify it. |
a055b23
to
9d14e4b
Compare
For #114850, we will port distributed tests to Intel GPU. This is the second PR for _composable cases, the first is #159118.
We could enable Intel GPU with following methods and try the best to keep the original code styles:
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @gujinghui @EikanWang @fengyuan14 @guangyey