We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
2 parents 35d38cb + 059fedc commit cfe0330Copy full SHA for cfe0330
README.md
@@ -125,6 +125,19 @@ outputs = llm.generate([text], sampling_params)
125
126
```
127
128
+We utilize YaRN in vLLM to handle long context by add a `rope_scaling` field to the `config.json` file of the model. For example,
129
+
130
+```json
131
+{
132
+ ...,
133
+ "rope_scaling": {
134
+ "factor": 4.0,
135
+ "original_max_position_embeddings": 16384,
136
+ "type": "yarn"
137
+ }
138
+}
139
+```
140
141
#### Online Inference:
142
143
```bash
0 commit comments