Skip to content

Commit f095f4a

Browse files
committed
add vllm infer
1 parent 176b663 commit f095f4a

File tree

2 files changed

+727
-0
lines changed

2 files changed

+727
-0
lines changed

README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,51 @@ If you're in mainland China, we strongly recommend you to use our model from
8080
## Deployment
8181

8282
### vLLM
83+
vllm supports offline batched inference or launching an OpenAI-Compatible API Service for online inference.
84+
85+
#### Environment Preparation
86+
Since the Pull Request (PR) has not been submitted to the vLLM community at this stage, please prepare the environment by following the steps below:
87+
```bash
88+
git clone -b v0.7.3 https://github.com/vllm-project/vllm.git
89+
cd vllm
90+
git apply Ling/inference/vllm/bailing_moe.patch
91+
pip install -e .
92+
```
93+
#### Offline Inference:
94+
```bash
95+
from transformers import AutoTokenizer
96+
from vllm import LLM, SamplingParams
97+
98+
tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ling-lite")
99+
100+
sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=512)
101+
102+
llm = LLM(model="inclusionAI/Ling-lite",
103+
prompt = "Give me a short introduction to large language models."
104+
messages = [
105+
{"role": "system", "content": "You are Ling, an assistant created by inclusionAI"},
106+
{"role": "user", "content": prompt}
107+
]
108+
109+
text = tokenizer.apply_chat_template(
110+
messages,
111+
tokenize=False,
112+
add_generation_prompt=True
113+
)
114+
outputs = llm.generate([text], sampling_params)
115+
116+
117+
```
118+
#### Online Inference:
119+
120+
```bash
121+
VLLM_USE_V1=1 vllm serve inclusionAI/Ling-lite \
122+
--tensor-parallel-size 2 \
123+
--pipeline-parrallel-size 1 \
124+
--use-v2-block-manager \
125+
--gpu-memory-utilization 0.90
126+
```
127+
For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
83128
84129
### MindIE
85130

0 commit comments

Comments
 (0)