Cannot configure dist_timeout when using device_mesh

### 🐛 Describe the bug

When calling `dist.initialize_dist`, I can specify a `dist_timeout` argument. 

When training with FSDP and device_mesh, I want to call `from torch.distributed._tensor import init_device_mesh` and pass the `device_mesh` into FSDP. However, it seems that the process groups created do not respect `dist_timeout` through this process

https://github.com/pytorch/pytorch/blob/5d6e323549bd5d3997e8c344532e611078be7011/torch/distributed/device_mesh.py#L292

### Versions

Torch nightly 1/10/2024

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @LucasLLC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot configure dist_timeout when using device_mesh #119574

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot configure dist_timeout when using device_mesh #119574

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions