File tree Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -31,7 +31,7 @@ won't be possible on a single GPU.
31
31
32
32
🤗 Transformers integrates [DeepSpeed](https://github.com/microsoft/DeepSpeed) via 2 options:
33
33
34
- 1. Integration of the core DeepSpeed features via [`Trainer`]. This is everything done for your type
34
+ 1. Integration of the core DeepSpeed features via [`Trainer`]. This is an everything- done- for-you type
35
35
of integration - just supply your custom config file or use our template and you have nothing else to do. Most of
36
36
this document is focused on this feature.
37
37
2. If you don't use [`Trainer`] and want to use your own Trainer where you integrated DeepSpeed
@@ -604,7 +604,7 @@ The following is an example of configuration for ZeRO stage 2:
604
604
** Performance tuning:**
605
605
606
606
- enabling ` offload_optimizer ` should reduce GPU RAM usage (it requires ` "stage": 2 ` )
607
- - ` "overlap_comm": true ` trade offs increased GPU RAM usage to lower all-reduce latency. ` overlap_comm ` uses 4.5x
607
+ - ` "overlap_comm": true ` trades off increased GPU RAM usage to lower all-reduce latency. ` overlap_comm ` uses 4.5x
608
608
the ` allgather_bucket_size ` and ` reduce_bucket_size ` values. So if they are set to 5e8, this requires a 9GB
609
609
footprint (` 5e8 x 2Bytes x 2 x 4.5 ` ). Therefore, if you have a GPU with 8GB or less RAM, to avoid getting
610
610
OOM-errors you will need to reduce those parameters to about ` 2e8 ` , which would require 3.6GB. You will want to do
You can’t perform that action at this time.
0 commit comments