Skip to content

Commit c2bc8c9

Browse files
authored
docs: improve README.md formatting and readability (inclusionAI#7)
Signed-off-by: Gaius <gaius.qi@gmail.com> Signed-off-by: mingcheng <mingcheng@apache.org>
1 parent e50ade0 commit c2bc8c9

File tree

1 file changed

+44
-20
lines changed

1 file changed

+44
-20
lines changed

README.md

Lines changed: 44 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -21,19 +21,21 @@ You can download the following table to see the various parameters for your use
2121

2222
<div align="center">
2323

24-
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
25-
| :----------------: | :---------------: | :-------------------: | :----------------: | :----------: |
26-
| Ling-lite-base | 16.8B | 2.75B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-lite-base) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-lite-base) |
27-
| Ling-lite | 16.8B | 2.75B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-lite) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-lite) |
28-
| Ling-plus-base | 290B | 28.8B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-plus-base) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-plus-base) |
29-
| Ling-plus | 290B | 28.8B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-plus) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-plus) |
24+
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
25+
| :------------: | :---------------: | :-------------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
26+
| Ling-lite-base | 16.8B | 2.75B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-lite-base) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-lite-base) |
27+
| Ling-lite | 16.8B | 2.75B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-lite) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-lite) |
28+
| Ling-plus-base | 290B | 28.8B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-plus-base) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-plus-base) |
29+
| Ling-plus | 290B | 28.8B | 64K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-plus) <br>[🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-plus) |
30+
3031
</div>
3132

3233
## Evaluation
3334

3435
Detailed evaluation results are reported in our [technical report](https://github.com/inclusionAI/Ling/blob/master/Ling_Technical_Report_V1.pdf).
3536

3637
## Quickstart
38+
3739
### 🤗 Hugging Face Transformers
3840

3941
Here is a code snippet to show you how to use the chat model with `transformers`:
@@ -80,17 +82,22 @@ If you're in mainland China, we strongly recommend you to use our model from
8082
## Deployment
8183

8284
### vLLM
85+
8386
vLLM supports offline batched inference or launching an OpenAI-Compatible API Service for online inference.
8487

8588
#### Environment Preparation
89+
8690
Since the Pull Request (PR) has not been submitted to the vLLM community at this stage, please prepare the environment by following the steps below:
91+
8792
```bash
8893
git clone -b v0.7.3 https://github.com/vllm-project/vllm.git
8994
cd vllm
9095
git apply Ling/inference/vllm/bailing_moe.patch
9196
pip install -e .
9297
```
98+
9399
#### Offline Inference:
100+
94101
```bash
95102
from transformers import AutoTokenizer
96103
from vllm import LLM, SamplingParams
@@ -115,24 +122,27 @@ outputs = llm.generate([text], sampling_params)
115122

116123

117124
```
125+
118126
#### Online Inference:
119127
120128
```bash
121129
vllm serve inclusionAI/Ling-lite \
122130
--tensor-parallel-size 2 \
123131
--pipeline-parrallel-size 1 \
124132
--use-v2-block-manager \
125-
--gpu-memory-utilization 0.90
133+
--gpu-memory-utilization 0.90
126134
```
135+
127136
For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
128137
129138
### MindIE
130139
131140
This topic describes the main steps to run an Ling MoE model based on Huawei NPU cards and the MindIE inference framework
132141
133142
#### Hardware Requirements
134-
- The MoE Plus model requires at least 2 Atlas 800I A2 (8*64G) servers.
135-
- The MoE Lite model requires at least 1 Atlas 800I A2 (8*64G) server.
143+
144+
- The MoE Plus model requires at least 2 Atlas 800I A2 (8\*64G) servers.
145+
- The MoE Lite model requires at least 1 Atlas 800I A2 (8\*64G) server.
136146
137147
#### Configure preparation
138148
@@ -146,7 +156,8 @@ git clone git@github.com:inclusionAI/Ling.git
146156
```
147157
148158
#### Machine network environment check
149-
```
159+
160+
```bash
150161
# Check the physical link
151162
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
152163
# Check the links
@@ -164,7 +175,8 @@ for i in {0..7}; do hccn_tool -i $i -tls -s enable 0; done
164175
```
165176
166177
#### Pull the image
167-
Go to [Ascend Community/Development Resources] (https://www.hiascend.com/developer/ascendhub) and pull the mindie image
178+
179+
Go to [Ascend Community/Development Resources](https://www.hiascend.com/developer/ascendhub) and pull the mindie image
168180
169181
Image version: 1.0.0-800I-A2-py311-openeuler24.03-lts
170182
@@ -176,10 +188,8 @@ Component | Version |
176188
|PTA | 6.0.0.beta1 |
177189
|HDK | 24.1.0 |
178190
179-
180191
#### Container startup and configuration changes
181192
182-
183193
##### Start the container
184194
185195
Execute the following startup command (reference):
@@ -207,14 +217,17 @@ docker run -itd --privileged --name=container name --net=host \
207217
mindie: 1.0.0-XXX-800I-A2-arm64-py3.11 (modified according to the name of the loaded image) \
208218
bash
209219
```
220+
210221
##### Download the model
211222
212223
In this case, we use ModelScope to download the model, and install ModelScope first:
224+
213225
```bash
214226
pip install modelscope
215227
```
216228
217229
Download the model:
230+
218231
```bash
219232
# The model takes a long time to download and can be executed in the background
220233
nohup modelscope download --model inclusionAI/Ling-plus --local_dir /home/HwHiAiUser/Ascend/Ling_plus 2>&1 > /tmp/ling_plus.log &
@@ -289,11 +302,13 @@ bash /home/HwHiAiUser/Ascend/Ling/inference/mindie/patch_atb_llm.sh
289302
#### Stand-alone Servitization Inference (Ling lite)
290303

291304
Set the underlying environment variables:
305+
292306
```bash
293307
source /usr/local/Ascend/atb-models/set_env.sh
294308
```
295309

296310
Set different mindie configurations according to the model type:
311+
297312
```bash
298313
# Ling Lite
299314
cp /home/HwHiAiUser/Ascend/Ling/inference/mindie/lite/config.json /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
@@ -303,6 +318,7 @@ cp /home/HwHiAiUser/Ascend/Ling/inference/mindie/lite/config.base.json /usr/loca
303318
```
304319

305320
Start the mindie service:
321+
306322
```bash
307323
chmod 640 /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
308324
@@ -313,7 +329,8 @@ nohup ./bin/mindieservice_daemon > /tmp/service.log 2>&1 &
313329
Check /tmp/service.log to check whether the output is Daemon start success!, if so, it means that MindIE-Service has started successfully.
314330

315331
Test if the request is correct:
316-
```
332+
333+
```bash
317334
# Chat model
318335
wget -O- --post-data="{\"messages\":[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"Who are you?\"}], \"stream\": false, \"max_tokens\":100, \"model\": \"bailing_moe\", \"temperature\":0}" \
319336
--header='Content-Type:application/json' \
@@ -333,12 +350,14 @@ All of the following commands need to be executed simultaneously on all machines
333350
To enable multi-machine service-based inference, you need to configure a multi-machine ranktable file.
334351

335352
- Get the IP address of each card (on the host)
353+
336354
```bash
337355
for i in {0..7}; do hccn_tool -i $i -ip -g; done
338356
```
339357

340358
- Configure 'rank_table.json' in the following format and put it in '/root/models' so that it can be mounted to the container
341-
```
359+
360+
```json
342361
{
343362
"server_count": "...", # Total number of nodes
344363
# The first server in the server_list is the primary node
@@ -364,7 +383,6 @@ for i in {0..7}; do hccn_tool -i $i -ip -g; done
364383

365384
Enter the container and run the following command:
366385

367-
368386
```bash
369387
# Set the basic environment variables:
370388
source /home/HwHiAiUser/Ascend/Ling/inference/mindie/set_env.sh
@@ -391,6 +409,7 @@ export MIES_CONTAINER_IP=IP address of the container
391409
```
392410

393411
Set different mindie configurations according to the model type:
412+
394413
```bash
395414
# Ling plus
396415
cp /home/HwHiAiUser/Ascend/Ling/inference/mindie/plus/config.json /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
@@ -411,23 +430,26 @@ vim conf/config.json
411430
```
412431

413432
To set the memory usage ratio:
433+
414434
```bash
415435
export NPU_MEMORY_FRACTION=0.95
416436
```
417437

418438
Pull up servitization:
439+
419440
```bash
420441
cd $MIES_INSTALL_PATH
421442
nohup ./bin/mindieservice_daemon > /tmp/service.log 2>&1 &
422443
```
444+
423445
When the command is executed, all the parameters used for this startup are first printed, and then until the following output appears:
424446

425447
`Daemon start success!`
426448

427449
The service is considered to have started successfully.
428450

429-
430451
Test if the request is correct:
452+
431453
```
432454
# Chat model
433455
wget -O- --post-data="{\"messages\":[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"Who are you?\"}], \"stream\": false, \"max_tokens\":100, \"model\": \"bailing_moe\", \"temperature\":0}" \
@@ -449,9 +471,9 @@ We use [`identity`](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/iden
449471

450472
```json
451473
{
452-
"instruction": "hi",
453-
"input": "",
454-
"output": "Hello! I am Ling, an AI assistant developed by inclusionAI. How can I assist you today?"
474+
"instruction": "hi",
475+
"input": "",
476+
"output": "Hello! I am Ling, an AI assistant developed by inclusionAI. How can I assist you today?"
455477
}
456478
```
457479

@@ -462,7 +484,9 @@ llamafactory-cli train examples/sft/ling_full_sft.yaml
462484
```
463485

464486
## License
487+
465488
This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling/blob/master/LICENCE).
466489

467490
## Citation
491+
468492
[TBD]

0 commit comments

Comments
 (0)