You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: inference/README.md
+5-3Lines changed: 5 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -21,12 +21,13 @@ sitting waiting for the checkpoint shards to download.
21
21
22
22
### Run Arctic Example
23
23
24
-
Due to the model size we recommend using a single 8xH100 instance from your
24
+
Due to the model size we recommend using a single 8xH100-80GB instance from your
25
25
favorite cloud provider such as: AWS [p5.48xlarge](https://aws.amazon.com/ec2/instance-types/p5/),
26
-
Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc.
26
+
Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc.
27
+
We have only tested this setup with 8xH100-80GB, however 8xA100-80GB should also work.
27
28
28
29
In this example we are using FP8 quantization provided by DeepSpeed in the backend, we can also use FP6
29
-
quantization by specifying `q_bits=6` in the `ArcticQuantizationConfig` config. The `"150GiB"` setting
30
+
quantization by specifying `q_bits=6` in the `QuantizationConfig` config. The `"150GiB"` setting
30
31
for max_memory is required until we can get DeepSpeed's FP quantization supported natively as a [HFQuantizer](https://huggingface.co/docs/transformers/main/en/hf_quantizer#build-a-new-hfquantizer-class) which we
0 commit comments