Merge pull request inclusionAI#12 from inclusionAI/zhujiangang-patch-1

zhujiangang · web-flow · commit cfe0330de9bc · 2025-03-17T17:50:23.000+08:00
Update README.md
diff --git a/README.md b/README.md
@@ -125,6 +125,19 @@ outputs = llm.generate([text], sampling_params)
 
 ```
 
+We utilize YaRN in vLLM to handle long context by add a `rope_scaling` field to the `config.json` file of the model. For example,
+
+```json
+{
+  ...,
+  "rope_scaling": {
+    "factor": 4.0,
+    "original_max_position_embeddings": 16384,
+    "type": "yarn"
+  }
+}
+```
+
 #### Online Inference:
 
 ```bash