Skip to content

Commit 016aac8

Browse files
authored
Merge pull request #985 from pytorch/holly1238-patch-1
Update 2022-3-14-introducing-pytorch-fully-sharded-data-parallel-api.md
2 parents 921db12 + 8f0c67d commit 016aac8

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2022-3-14-introducing-pytorch-fully-sharded-data-parallel-api.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ There are two ways to wrap a model with PyTorch FSDP. Auto wrapping is a drop-in
4141

4242
Model layers should be wrapped in FSDP in a nested way to save peak memory and enable communication and computation overlapping. The simplest way to do it is auto wrapping, which can serve as a drop-in replacement for DDP without changing the rest of the code.
4343

44-
fsdp_auto_wrap_policy argument allows specifying a callable function to recursively wrap layers with FSDP. default_auto_wrap_policy function provided by the PyTorch FSDP recursively wraps layers with the number of parameters larger than 100M. You can supply your own wrapping policy as needed. The example of writing a customized wrapping policy is shown in the [FSDP API doc](https://docs-preview.pytorch.org/72084/fsdp.html?highlight=fsdp#module-torch.distributed.fsdp).
44+
fsdp_auto_wrap_policy argument allows specifying a callable function to recursively wrap layers with FSDP. default_auto_wrap_policy function provided by the PyTorch FSDP recursively wraps layers with the number of parameters larger than 100M. You can supply your own wrapping policy as needed. The example of writing a customized wrapping policy is shown in the [FSDP API doc](https://pytorch.org/docs/stable/fsdp.html).
4545

4646
In addition, cpu_offload could be configured optionally to offload wrapped parameters to CPUs when these parameters are not used in computation. This can further improve memory efficiency at the cost of data transfer overhead between host and device.
4747

0 commit comments

Comments
 (0)