-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Fix Ollama GPT-OSS streaming with 'thinking' field #13375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix Ollama GPT-OSS streaming with 'thinking' field #13375
Conversation
- Handle chunks containing 'thinking' field with empty 'response' - Treat these as intermediate chunks that don't contain user content - Add comprehensive tests for chunk parsing scenarios - Resolves APIConnectionError for GPT-OSS model streaming Fixes BerriAI#13340
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Fixed MyPy type error in Ollama completion transformation where finish_reason was set to None instead of expected string type. Changed finish_reason=None to finish_reason="" to match GenericStreamingChunk TypedDict requirements. Also updated corresponding test to expect empty string instead of None.
@@ -459,6 +459,15 @@ def chunk_parser(self, chunk: dict) -> GenericStreamingChunk: | |||
finish_reason="stop", | |||
usage=None, | |||
) | |||
elif "thinking" in chunk and not chunk["response"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colesmcintosh can we return a ModelResponseStream instead?
This way we can include the reasoning content in the response, and allow it to be displayed on chat ui's like OpenWebUI - See openrouter implementation -
def chunk_parser(self, chunk: dict) -> ModelResponseStream: |
Title
Fix Ollama GPT-OSS streaming with 'thinking' field
Relevant issues
Fixes #13340
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/
directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit
Type
🐛 Bug Fix
Changes
Problem
Ollama GPT-OSS models were failing with
APIConnectionError: Unable to parse ollama chunk
when streaming responses contained a'thinking'
field with empty'response'
. The chunk parser didn't handle this specific case, causing streaming to fail.Solution
'thinking'
field with empty'response'
Code Changes
OllamaTextCompletionResponseIterator.chunk_parser()
inlitellm/llms/ollama/completion/transformation.py
"thinking" in chunk and not chunk["response"]
GenericStreamingChunk
for these intermediate chunksTests Added
test_chunk_parser_with_thinking_field()
: Tests the exact problematic chunk from the issuetest_chunk_parser_normal_response()
: Ensures normal chunks still worktest_chunk_parser_done_chunk()
: Verifies done chunks work correctlyVerification
Tested with the exact chunk from the error:
{'model': 'gpt-oss:20b', 'created_at': '2025-08-06T14:34:31.5276077Z', 'response': '', 'thinking': 'User', 'done': False}
Result: Successfully parsed without errors, returns empty text chunk allowing stream to continue.