-
Notifications
You must be signed in to change notification settings - Fork 12.8k
common : add GLM-4.5 tool calling support #15186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
common : add GLM-4.5 tool calling support #15186
Conversation
- Add COMMON_CHAT_FORMAT_GLM_4_5 format enum - Implement GLM-4.5 tool call parser for <tool_call><arg_key><arg_value> format - Add template detection based on <arg_key> and <arg_value> tags - Fix null content handling in message parsing and serialization - Ensure GLM-4.5 detection runs before Hermes to avoid misidentification This enables tool calling functionality for GLM-4.5 models when using --jinja flag. The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.
I tried the PR, and it fixes tool calling on GLM 4.5 Air (unsloth version) getting called correctly. |
But its Qwen tool calling issue right? I think once other pending PRs are merged you should not see the issue. |
Yea, I don't think is related to this specific PR. But the problem is shared with this Qwen tool calling issue. |
Hey, quick thought — I might be misunderstanding this, but it looks like this PR will parse GLM’s XML-style tool calls and turn them into JSON If that’s the case, projects like Roo Code (which currently only know how to handle XML tool calls) might suddenly stop recognizing the output from GLM models when running through llama.cpp. Am I right about this? |
Does this template parse the thinking tags correctly? I'm getting my responses inline instead of in the reasoning_content field. |
Very nice! #15162 aims to achieve the same for Qwen3 Coder; only seems more mature/higher quality (using minja and letting it handle quoting/escaping argument strings, storing the jinja template in ./models/templates, having test cases in ./tests/test-chat.cpp,). Maybe @ochafik and @dhandhalyabhavik can sync up/collaborate and bring both PRs in a consistent way forward? |
Hello everyone, thanks for insightful comments, Let me answer all of you, @TNohSam There are two ways to implement tool calling, I have tested Roo Code just now, it is working fine. Both type of function or tool calling will work with the current PR. @jfgonsalves enable @bfroemel sure, @ochafik can you please review my added changes? Help me merge this PR. I would really appreciate. Thank you. |
You can enable There is parser logic common for all models that will do this job. Check out the code here This PR has nothing to do with it. Thank you for pointing it out though. |
I am still having trouble getting llama.cpp to identify the GLM-4.5 chat template. Am I missing something in my command? srv params_from_: Chat format: Hermes 2 Pro ./build/bin/llama-server --model /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/Q5_K_M/GLM-4.5-Q5_K_M-00001-of-00006.gguf --alias glm-4.5 --no-webui --threads 44 --ctx-size 131072 --n-gpu-layers 94 -ot exps=CPU -ub 2048 -b 2048 --temp 0.6 --top-p 1.0 --flash-attn --host 0.0.0.0 --jinja --port 8099 --chat-template-file /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/template.jinja |
Hey @trilog-inc I rebuild and its working fine for me. I got GLM 4.5 in my logs, tool calling also works well.
I hope you have build it correctly using FYI, I have used GLM 4.5 Air for testing. Both should work as I can see both of them have same Arch & jinja template. I have used this command, (re-copy jinja template, there there is a modification recommended by a user)
|
Trying this PR, tools get called and get a response but the model can't continue with them. I do see the right chat format as expected in the logs and did build the PR correctly.
I'm using https://huggingface.co/unsloth/GLM-4.5-Air-GGUF?chat_template=default as the chat template Edit: Using the jinja template in the OP fixed the issue. |
I had similar problem. Not sure what chat template to use |
I copied the template from the OP's edit and used it, and now tool calling is working for me. |
It worked like a charm, thanks |
For claude code / openai, this template would work
This worked with Claude Code and Claude Code Router. |
} | ||
|
||
static void common_chat_parse_glm_4_5(common_chat_msg_parser & builder) { | ||
builder.try_parse_reasoning("<think>", "</think>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be builder.try_parse_reasoning("\n<think>", "</think>");
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hksdpc255 Thank you so much for pointing it out, currently testing, will get back to you soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hksdpc255
Update 1: You are right, we need to add \n before , I have started seeing reasoning_content
variable with thinking tokens. But everything else breaks, Roo Code, Cline and Kilo code stops working. I am still debugging the issue.
@dhandhalyabhavik I don't thin the See: #15186 (comment)_ I changed |
This enables tool calling functionality for GLM-4.5 models when using --jinja flag. The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.
Personally verified working on following applications
Unfortunately its not working with OpenAI API SDK because jinja requires dict parser but OpenAI requires json.Now works with OpenAI SDK too.
above issue is now fixed with corrected Jinja template. The template works great with cline too. I extensively tested it.
Corrected Jinja template.
@ggerganov @ngxson @slaren Please review and merge the PR. Thank you.