Skip to content

Commit 3753536

Browse files
authored
Merge pull request Snowflake-Labs#18 from Snowflake-Labs/Jeff-patch-2
update wording of HF example and update hero figure to align with blog
2 parents 686f1ba + 06df60f commit 3753536

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

assets/Training Efficiency Figure.png

-149 KB
Loading

inference/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,13 @@ sitting waiting for the checkpoint shards to download.
2121

2222
### Run Arctic Example
2323

24-
Due to the model size we recommend using a single 8xH100 instance from your
24+
Due to the model size we recommend using a single 8xH100-80GB instance from your
2525
favorite cloud provider such as: AWS [p5.48xlarge](https://aws.amazon.com/ec2/instance-types/p5/),
26-
Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc.
26+
Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc.
27+
We have only tested this setup with 8xH100-80GB, however 8xA100-80GB should also work.
2728

2829
In this example we are using FP8 quantization provided by DeepSpeed in the backend, we can also use FP6
29-
quantization by specifying `q_bits=6` in the `ArcticQuantizationConfig` config. The `"150GiB"` setting
30+
quantization by specifying `q_bits=6` in the `QuantizationConfig` config. The `"150GiB"` setting
3031
for max_memory is required until we can get DeepSpeed's FP quantization supported natively as a [HFQuantizer](https://huggingface.co/docs/transformers/main/en/hf_quantizer#build-a-new-hfquantizer-class) which we
3132
are actively working on.
3233

@@ -46,6 +47,7 @@ tokenizer = AutoTokenizer.from_pretrained(
4647

4748
quant_config = QuantizationConfig(q_bits=8)
4849

50+
# The 150GiB number is a workaround until we have HFQuantizer support, must be ~1.9x of the available GPU memory
4951
model = AutoModelForCausalLM.from_pretrained(
5052
"Snowflake/snowflake-arctic-instruct",
5153
low_cpu_mem_usage=True,

0 commit comments

Comments
 (0)