You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [Achieve Low-Latency and High-Throughput Inference with Meta's Llama 3.1 405B using Snowflake’s Optimized AI Stack](https://www.snowflake.com/engineering-blog/optimize-LLMs-with-llama-snowflake-ai-stack/)
10
+
* [Fine-Tune Llama 3.1 405B on a Single Node using Snowflake’s Memory-Optimized AI Stack](https://www.snowflake.com/engineering-blog/fine-tune-llama-single-node-snowflake/)
7
11
*[04/24/2024][Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open](https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/)
8
12
9
13
## Overview
10
14
15
+
The Snowflake AI Research team is conducting open, foundational research to advance the field of AI while making enterprise AI easy, efficient, and trusted. This repo contains several artifacts to help efficiently train and inference popular LLMs in practice. We released [Arctic](https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/) in April of 2023 and are proud to announce the release of our Massive LLM inference and fine-tuning stacks specifically tailored to Llama 3.1 405B.
16
+
17
+
## Llama 3.1 405B
18
+
19
+
In collaboration with DeepSpeed, Hugging Face, vLLM, and the broader AI community we are excited to open-source our inference and fine-tuning stacks optimized for Llama 3.1 405B. For inference we support a massive 128K context window from day one, while enabling real-time inference with up to 3x lower end-to-end latency and 1.4x higher throughput than existing open source solutions. Please see our blog, [Achieve Low-Latency and High-Throughput Inference with Meta's Llama 3.1 405B using Snowflake’s Optimized AI Stack](https://www.snowflake.com/engineering-blog/optimize-LLMs-with-llama-snowflake-ai-stack/), that deep dive into all of these innovations. For fine-tuning we support training on a single and multi-node training environments using the latest in memory efficient training techniques such as parameter-efficient fine-tuning, FP8 quantization, ZeRO-3-inspired sharding, and targeted parameter offloading (when necessary). Please see our blog, [Fine-Tune Llama 3.1 405B on a Single Node using Snowflake’s Memory-Optimized AI Stack](https://www.snowflake.com/engineering-blog/fine-tune-llama-single-node-snowflake/), for a deep dive into how we did this.
20
+
21
+
### Getting started
22
+
23
+
*[Inference deployment and benchmarks with vLLM](inference/llama3.1)
24
+
*[Fine-Tuning Support for Llama 3.1 405B](training/llama3.1)
25
+
26
+
## Arctic
27
+
11
28
At Snowflake, we see a consistent pattern in AI needs and use cases from our enterprise customers. Enterprises want to use LLMs to build conversational SQL data copilots, code copilots and RAG chat bots. From a metrics perspective, this translates to LLMs that excel at SQL, code, complex instruction following and the ability to produce grounded answers. We capture these abilities into a single metric we call enterprise intelligence by taking an average of Coding (HumanEval+ and MBPP+), SQL Generation (Spider), and Instruction following (IFEval).
12
29
13
30
<palign="center">
@@ -28,38 +45,28 @@ The Snowflake AI Research Team is thrilled to introduce Snowflake Arctic, a top-
28
45
29
46
* Truly Open: Apache 2.0 license provides ungated access to weights and code. In addition, we are also open sourcing all of our data recipes and research insights.
30
47
31
-
## Getting Started
48
+
###Getting Started
32
49
33
-
### Inference API Providers 🚀
50
+
**Inference API Providers**
34
51
Access Arctic via your model garden or catalog of choice including AWS, NVIDIA AI Catalog, Replicate, Lamini, Perplexity, and Together AI over the next coming days.
35
52
36
-
### Model Weights 🤗
53
+
**Model Weights**
37
54
The best way to get yourself running with Arctic is through Hugging Face. We have uploaded both the Base and Instruct model variants to the Hugging Face hub:
We provide two different tutorials on standing up Arctic for inference:
45
62
46
-
*[Basic Hugging Face setup](inference/)
47
-
*[vLLM Deployment](inference/vllm/)
63
+
*[Basic Hugging Face setup](inference/arctic)
64
+
*[vLLM Deployment](inference/arctic/vllm/)
48
65
49
-
## Cookbooks/Tutorials
66
+
**Cookbooks/Tutorials**
50
67
51
68
We believe in a thriving research community, and we are committed to sharing our insights as we build the Arctic family of models, to advance research and reduce the cost of LLM training and inference for everyone. Please check out our [on-going cookbook releases](https://www.snowflake.com/en/data-cloud/arctic/cookbook/) where we will dive deeper into several areas crucial for training models like Arctic.
52
69
53
70
*[Exploring Mixture of Experts (MoE)](https://medium.com/snowflake/snowflake-arctic-cookbook-series-exploring-mixture-of-experts-moe-c7d6b8f14d16)
54
71
*[Building an Efficient Training System for Arctic](https://medium.com/snowflake/snowflake-arctic-cookbook-series-building-an-efficient-training-system-for-arctic-6658b9bdfcae)
55
72
*[Arctic’s Approach to Data](https://medium.com/snowflake/snowflake-arctic-cookbook-series-arctics-approach-to-data-b81a8a0958bd)
56
-
* More coming soon..
57
-
58
-
## Coming Soon
59
-
60
-
Continue to watch this space we plan to frequently add new things here including:
61
-
62
-
* Fine-tuning tutorials
63
-
* Further improvements to inference performance
64
-
* HFQuantizer support for DeepSpeed's FP Quantization
65
-
* Upstreaming Arctic support for both transformers and vLLM
0 commit comments