@@ -21,19 +21,21 @@ You can download the following table to see the various parameters for your use
21
21
22
22
<div align =" center " >
23
23
24
- | ** Model** | ** #Total Params** | ** #Activated Params** | ** Context Length** | ** Download** |
25
- | :----------------: | :---------------: | :-------------------: | :----------------: | :----------: |
26
- | Ling-lite-base | 16.8B | 2.75B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-lite-base ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-lite-base ) |
27
- | Ling-lite | 16.8B | 2.75B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-lite ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-lite ) |
28
- | Ling-plus-base | 290B | 28.8B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-plus-base ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-plus-base ) |
29
- | Ling-plus | 290B | 28.8B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-plus ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-plus ) |
24
+ | ** Model** | ** #Total Params** | ** #Activated Params** | ** Context Length** | ** Download** |
25
+ | :------------: | :---------------: | :-------------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
26
+ | Ling-lite-base | 16.8B | 2.75B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-lite-base ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-lite-base ) |
27
+ | Ling-lite | 16.8B | 2.75B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-lite ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-lite ) |
28
+ | Ling-plus-base | 290B | 28.8B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-plus-base ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-plus-base ) |
29
+ | Ling-plus | 290B | 28.8B | 64K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-plus ) <br >[ 🤖 ModelScope] ( https://www.modelscope.cn/models/inclusionAI/Ling-plus ) |
30
+
30
31
</div >
31
32
32
33
## Evaluation
33
34
34
35
Detailed evaluation results are reported in our [ technical report] ( https://github.com/inclusionAI/Ling/blob/master/Ling_Technical_Report_V1.pdf ) .
35
36
36
37
## Quickstart
38
+
37
39
### 🤗 Hugging Face Transformers
38
40
39
41
Here is a code snippet to show you how to use the chat model with ` transformers ` :
@@ -80,17 +82,22 @@ If you're in mainland China, we strongly recommend you to use our model from
80
82
## Deployment
81
83
82
84
### vLLM
85
+
83
86
vLLM supports offline batched inference or launching an OpenAI-Compatible API Service for online inference.
84
87
85
88
#### Environment Preparation
89
+
86
90
Since the Pull Request (PR) has not been submitted to the vLLM community at this stage, please prepare the environment by following the steps below:
91
+
87
92
``` bash
88
93
git clone -b v0.7.3 https://github.com/vllm-project/vllm.git
89
94
cd vllm
90
95
git apply Ling/inference/vllm/bailing_moe.patch
91
96
pip install -e .
92
97
```
98
+
93
99
#### Offline Inference:
100
+
94
101
``` bash
95
102
from transformers import AutoTokenizer
96
103
from vllm import LLM, SamplingParams
@@ -115,24 +122,27 @@ outputs = llm.generate([text], sampling_params)
115
122
116
123
117
124
` ` `
125
+
118
126
# ### Online Inference:
119
127
120
128
` ` ` bash
121
129
vllm serve inclusionAI/Ling-lite \
122
130
--tensor-parallel-size 2 \
123
131
--pipeline-parrallel-size 1 \
124
132
--use-v2-block-manager \
125
- --gpu-memory-utilization 0.90
133
+ --gpu-memory-utilization 0.90
126
134
` ` `
135
+
127
136
For detailed guidance, please refer to the vLLM [` instructions` ](https://docs.vllm.ai/en/latest/).
128
137
129
138
# ## MindIE
130
139
131
140
This topic describes the main steps to run an Ling MoE model based on Huawei NPU cards and the MindIE inference framework
132
141
133
142
# ### Hardware Requirements
134
- - The MoE Plus model requires at least 2 Atlas 800I A2 (8* 64G) servers.
135
- - The MoE Lite model requires at least 1 Atlas 800I A2 (8* 64G) server.
143
+
144
+ - The MoE Plus model requires at least 2 Atlas 800I A2 (8\* 64G) servers.
145
+ - The MoE Lite model requires at least 1 Atlas 800I A2 (8\* 64G) server.
136
146
137
147
# ### Configure preparation
138
148
@@ -146,7 +156,8 @@ git clone git@github.com:inclusionAI/Ling.git
146
156
```
147
157
148
158
#### Machine network environment check
149
- ```
159
+
160
+ ```bash
150
161
# Check the physical link
151
162
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
152
163
# Check the links
@@ -164,7 +175,8 @@ for i in {0..7}; do hccn_tool -i $i -tls -s enable 0; done
164
175
```
165
176
166
177
#### Pull the image
167
- Go to [Ascend Community/Development Resources] (https://www.hiascend.com/developer/ascendhub) and pull the mindie image
178
+
179
+ Go to [Ascend Community/Development Resources](https://www.hiascend.com/developer/ascendhub) and pull the mindie image
168
180
169
181
Image version: 1.0.0-800I-A2-py311-openeuler24.03-lts
170
182
@@ -176,10 +188,8 @@ Component | Version |
176
188
|PTA | 6.0.0.beta1 |
177
189
|HDK | 24.1.0 |
178
190
179
-
180
191
#### Container startup and configuration changes
181
192
182
-
183
193
##### Start the container
184
194
185
195
Execute the following startup command (reference):
@@ -207,14 +217,17 @@ docker run -itd --privileged --name=container name --net=host \
207
217
mindie: 1.0.0-XXX-800I-A2-arm64-py3.11 (modified according to the name of the loaded image) \
208
218
bash
209
219
```
220
+
210
221
##### Download the model
211
222
212
223
In this case, we use ModelScope to download the model, and install ModelScope first:
224
+
213
225
```bash
214
226
pip install modelscope
215
227
```
216
228
217
229
Download the model:
230
+
218
231
```bash
219
232
# The model takes a long time to download and can be executed in the background
220
233
nohup modelscope download --model inclusionAI/Ling-plus --local_dir /home/HwHiAiUser/Ascend/Ling_plus 2>&1 > /tmp/ling_plus.log &
@@ -289,11 +302,13 @@ bash /home/HwHiAiUser/Ascend/Ling/inference/mindie/patch_atb_llm.sh
289
302
# ### Stand-alone Servitization Inference (Ling lite)
290
303
291
304
Set the underlying environment variables:
305
+
292
306
` ` ` bash
293
307
source /usr/local/Ascend/atb-models/set_env.sh
294
308
` ` `
295
309
296
310
Set different mindie configurations according to the model type:
311
+
297
312
` ` ` bash
298
313
# Ling Lite
299
314
cp /home/HwHiAiUser/Ascend/Ling/inference/mindie/lite/config.json /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
@@ -303,6 +318,7 @@ cp /home/HwHiAiUser/Ascend/Ling/inference/mindie/lite/config.base.json /usr/loca
303
318
` ` `
304
319
305
320
Start the mindie service:
321
+
306
322
` ` ` bash
307
323
chmod 640 /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
308
324
@@ -313,7 +329,8 @@ nohup ./bin/mindieservice_daemon > /tmp/service.log 2>&1 &
313
329
Check /tmp/service.log to check whether the output is Daemon start success! , if so, it means that MindIE-Service has started successfully.
314
330
315
331
Test if the request is correct:
316
- ` ` `
332
+
333
+ ` ` ` bash
317
334
# Chat model
318
335
wget -O- --post-data=" {\" messages\" :[{\" role\" : \" system\" , \" content\" : \" You are a helpful assistant.\" }, {\" role\" : \" user\" , \" content\" : \" Who are you?\" }], \" stream\" : false, \" max_tokens\" :100, \" model\" : \" bailing_moe\" , \" temperature\" :0}" \
319
336
--header=' Content-Type:application/json' \
@@ -333,12 +350,14 @@ All of the following commands need to be executed simultaneously on all machines
333
350
To enable multi-machine service-based inference, you need to configure a multi-machine ranktable file.
334
351
335
352
- Get the IP address of each card (on the host)
353
+
336
354
` ` ` bash
337
355
for i in {0..7}; do hccn_tool -i $i -ip -g; done
338
356
` ` `
339
357
340
358
- Configure ' rank_table.json' in the following format and put it in ' /root/models' so that it can be mounted to the container
341
- ` ` `
359
+
360
+ ` ` ` json
342
361
{
343
362
" server_count" : " ..." , # Total number of nodes
344
363
# The first server in the server_list is the primary node
@@ -364,7 +383,6 @@ for i in {0..7}; do hccn_tool -i $i -ip -g; done
364
383
365
384
Enter the container and run the following command:
366
385
367
-
368
386
` ` ` bash
369
387
# Set the basic environment variables:
370
388
source /home/HwHiAiUser/Ascend/Ling/inference/mindie/set_env.sh
@@ -391,6 +409,7 @@ export MIES_CONTAINER_IP=IP address of the container
391
409
` ` `
392
410
393
411
Set different mindie configurations according to the model type:
412
+
394
413
` ` ` bash
395
414
# Ling plus
396
415
cp /home/HwHiAiUser/Ascend/Ling/inference/mindie/plus/config.json /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
@@ -411,23 +430,26 @@ vim conf/config.json
411
430
` ` `
412
431
413
432
To set the memory usage ratio:
433
+
414
434
` ` ` bash
415
435
export NPU_MEMORY_FRACTION=0.95
416
436
` ` `
417
437
418
438
Pull up servitization:
439
+
419
440
` ` ` bash
420
441
cd $MIES_INSTALL_PATH
421
442
nohup ./bin/mindieservice_daemon > /tmp/service.log 2>&1 &
422
443
` ` `
444
+
423
445
When the command is executed, all the parameters used for this startup are first printed, and then until the following output appears:
424
446
425
447
` Daemon start success! `
426
448
427
449
The service is considered to have started successfully.
428
450
429
-
430
451
Test if the request is correct:
452
+
431
453
` ` `
432
454
# Chat model
433
455
wget -O- --post-data=" {\" messages\" :[{\" role\" : \" system\" , \" content\" : \" You are a helpful assistant.\" }, {\" role\" : \" user\" , \" content\" : \" Who are you?\" }], \" stream\" : false, \" max_tokens\" :100, \" model\" : \" bailing_moe\" , \" temperature\" :0}" \
@@ -449,9 +471,9 @@ We use [`identity`](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/iden
449
471
450
472
` ` ` json
451
473
{
452
- " instruction" : " hi" ,
453
- " input" : " " ,
454
- " output" : " Hello! I am Ling, an AI assistant developed by inclusionAI. How can I assist you today?"
474
+ " instruction" : " hi" ,
475
+ " input" : " " ,
476
+ " output" : " Hello! I am Ling, an AI assistant developed by inclusionAI. How can I assist you today?"
455
477
}
456
478
` ` `
457
479
@@ -462,7 +484,9 @@ llamafactory-cli train examples/sft/ling_full_sft.yaml
462
484
` ` `
463
485
464
486
# # License
487
+
465
488
This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling/blob/master/LICENCE).
466
489
467
490
# # Citation
491
+
468
492
[TBD]
0 commit comments