Update on "Support of dtensor redistribute with device order"

[Prototype; for RFC, not ready for review] Now redistribute dtensor honors the device ordering. If no order information specified, it will use the default device order [0,1,2,...]. We can specify `device_order` as follow: ``` sharded_dt = distribute_tensor(input_data, mesh, placement, device_order) ``` and ``` out_dt = sharded_dt.redistribute(mesh, placement, device_order) ``` Note that device order information is added into the DTensorSpec. So `redistribute_local_tensor` doesn't need the `src_device_order` and `dst_device_order`. I leave them here as a reference for AutoParallel (cc fmassa ). I will remove those order related args from redistributed related API in this PR. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
pytorch · zpcore · Aug 10, 2025 · Aug 10, 2025 · Aug 10, 2025 · Aug 10, 2025
commit 12fe67c7ec73dfb1a8ce1f082638af812ef5bdcf
diff --git a/torch/distributed/tensor/_redistribute.py b/torch/distributed/tensor/_redistribute.py
@@ -193,7 +193,7 @@ def _map_tensor_dim_to_mesh_dim(placements, device_order):
                         and dst_device_order_to_mesh_dims[j] == [mesh_dim]
                     ):
                         mesh_dim_size = device_mesh.size(mesh_dim=mesh_dim)
-                        current_placement = sorted_dst_placement[mesh_dim]   # <<<<<<<<<<<<<<<<<<<<<<, error
+                        current_placement = sorted_dst_placement[mesh_dim]
                         assert isinstance(current_placement, Shard)
                         # alltoall from Shard(tensor_dim) to Shard()
                         transform_infos.append(

diff --git a/torch/distributed/tensor/placement_types.py b/torch/distributed/tensor/placement_types.py
@@ -121,7 +121,7 @@ def _local_shard_size_and_offset(
         computes the new local shard size and offset given the desired number of chunks
         (num_chunks is generally equal to the size of the current sharding dim).
 
-        Note: T234040481 new local shard offset is relative to the current sharded tensor, not the global tensor.
+        Note: new local shard offset is relative to the current sharded tensor, not the global tensor.
         See `_utils.compute_local_shape_and_global_offset` for computing global offset.
 
         Returns (new local shard size, offset)