Skip to content

Fix Ollama GPT-OSS streaming with 'thinking' field #13375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

colesmcintosh
Copy link
Collaborator

Title

Fix Ollama GPT-OSS streaming with 'thinking' field

Relevant issues

Fixes #13340

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

Problem

Ollama GPT-OSS models were failing with APIConnectionError: Unable to parse ollama chunk when streaming responses contained a 'thinking' field with empty 'response'. The chunk parser didn't handle this specific case, causing streaming to fail.

Solution

  • Added handling for chunks containing 'thinking' field with empty 'response'
  • These chunks are treated as intermediate chunks that don't contain user-facing content
  • Allows streaming to continue until actual response content arrives

Code Changes

  • Modified OllamaTextCompletionResponseIterator.chunk_parser() in litellm/llms/ollama/completion/transformation.py
  • Added condition to handle "thinking" in chunk and not chunk["response"]
  • Returns empty GenericStreamingChunk for these intermediate chunks

Tests Added

  • test_chunk_parser_with_thinking_field(): Tests the exact problematic chunk from the issue
  • test_chunk_parser_normal_response(): Ensures normal chunks still work
  • test_chunk_parser_done_chunk(): Verifies done chunks work correctly

Verification

Tested with the exact chunk from the error:

{'model': 'gpt-oss:20b', 'created_at': '2025-08-06T14:34:31.5276077Z', 'response': '', 'thinking': 'User', 'done': False}

Result: Successfully parsed without errors, returns empty text chunk allowing stream to continue.

- Handle chunks containing 'thinking' field with empty 'response'
- Treat these as intermediate chunks that don't contain user content
- Add comprehensive tests for chunk parsing scenarios
- Resolves APIConnectionError for GPT-OSS model streaming

Fixes BerriAI#13340
Copy link

vercel bot commented Aug 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 8, 2025 9:08pm

Fixed MyPy type error in Ollama completion transformation where finish_reason
was set to None instead of expected string type. Changed finish_reason=None to
finish_reason="" to match GenericStreamingChunk TypedDict requirements.

Also updated corresponding test to expect empty string instead of None.
@@ -459,6 +459,15 @@ def chunk_parser(self, chunk: dict) -> GenericStreamingChunk:
finish_reason="stop",
usage=None,
)
elif "thinking" in chunk and not chunk["response"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesmcintosh can we return a ModelResponseStream instead?

This way we can include the reasoning content in the response, and allow it to be displayed on chat ui's like OpenWebUI - See openrouter implementation -

def chunk_parser(self, chunk: dict) -> ModelResponseStream:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: ollama gpt-oss not working
2 participants