Skip to content

Custom-LLM <think> tag handling #703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
odrobnik opened this issue May 15, 2025 · 2 comments
Open

Custom-LLM <think> tag handling #703

odrobnik opened this issue May 15, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@odrobnik
Copy link

When using custom models with Agents, they might return their reasoning between tokens as the ChatCompletion response. Should this text possibly be turned into a reasoning output item? (with the caveat that we still need it as input message on the next step)

I think it would make the process simpler using custom LLMs if the reasoning part is separated from the actual final answer.

The problem is that the API only mentions a reasoning type "summary". Would we maybe want to invent a "detail" type just for local use?

@odrobnik odrobnik added the enhancement New feature or request label May 15, 2025
@krrishdholakia
Copy link

Hey @odrobnik litellm should already be handling this - do you have a specific model called via litellm where you're seeing this behaviour?

@odrobnik
Copy link
Author

My use case if when you don't use litellm, like in custom_example_agent.py. In particular I am referring to tracing where it looks like this:

Image

So my feature suggestion is not to ADD like litellm does, but rather to remove it from the final_output and possibly move it into a separate field (like DeekSeek does it) or even into a separate output item, like Responses as reasoning item:

Image

In the least I would expect the reasoning to be logged in the tracing separately, even if nothing else is changed. But final_output from the runner shouldn't contain reasoning IMHO. The philosophy of Runner is to deal with Response objects, even when they are made up from Chat Completion results. For regular Response-based running, the reasoning is not part of final_output, so for ChatCompletion-wrapped running they shouldn't be either.

The ARE useful for tracing, so that's why they could be included, but possibly in the same way that reasoning summaries are included for Response-based traces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants