Skip to content

common : add GLM-4.5 tool calling support #15186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dhandhalyabhavik
Copy link

@dhandhalyabhavik dhandhalyabhavik commented Aug 8, 2025

  • Add COMMON_CHAT_FORMAT_GLM_4_5 format enum
  • Implement GLM-4.5 tool call parser for <tool_call><arg_key><arg_value> format
  • Add template detection based on <arg_key> and <arg_value> tags
  • Fix null content handling in message parsing and serialization
  • Ensure GLM-4.5 detection runs before Hermes to avoid misidentification

This enables tool calling functionality for GLM-4.5 models when using --jinja flag. The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.

Personally verified working on following applications

  • Cline
  • Roo Code
  • Kilo Code
  • Cherry Studio (MCP + Tool calling)

Unfortunately its not working with OpenAI API SDK because jinja requires dict parser but OpenAI requires json.

Now works with OpenAI SDK too.
above issue is now fixed with corrected Jinja template. The template works great with cline too. I extensively tested it.

Corrected Jinja template.

[gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson }}
{% endfor %}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
    {%- if m.role == 'user' %}
        {%- set user_content = visible_text(m.content) -%}
        {%- if not ("tool_response" in user_content) %}
            {% set ns.last_user_index = loop.index0 -%}
        {%- endif -%}
    {%- endif %}
{%- endfor %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ user_content }}
{%- if enable_thinking is defined and not enable_thinking -%}
{%- if not user_content.endswith("/nothink") -%}
{{- '/nothink' -}}
{%- endif -%}
{%- endif -%}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
    {%- set reasoning_content = m.reasoning_content %}
{%- else %}
    {%- if '</think>' in content %}
        {%- set think_parts = content.split('</think>') %}
        {%- if think_parts|length > 1 %}
            {%- set before_end_think = think_parts[0] %}
            {%- set after_end_think = think_parts[1] %}
            {%- set think_start_parts = before_end_think.split('<think>') %}
            {%- if think_start_parts|length > 1 %}
                {%- set reasoning_content = think_start_parts[-1].lstrip('\n') %}
            {%- endif %}
            {%- set content = after_end_think.lstrip('\n') %}
        {%- endif %}
    {%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- if tc.function %}
    {%- set tc = tc.function %}
{%- endif %}
{{ '\n<tool_call>' + tc.name }}
{% set _args = tc.arguments %}
{% for k, v in _args.items() %}
<arg_key>{{ k }}</arg_key>
<arg_value>{{ v | tojson if v is not string else v }}</arg_value>
{% endfor %}
</tool_call>{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
    {{- '<|observation|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- m.content }}
{{- '\n</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}

<tool_response>
{{ tr.output if tr.output is defined else tr }}
</tool_response>{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}

@ggerganov @ngxson @slaren Please review and merge the PR. Thank you.

- Add COMMON_CHAT_FORMAT_GLM_4_5 format enum
- Implement GLM-4.5 tool call parser for <tool_call><arg_key><arg_value> format
- Add template detection based on <arg_key> and <arg_value> tags
- Fix null content handling in message parsing and serialization
- Ensure GLM-4.5 detection runs before Hermes to avoid misidentification

This enables tool calling functionality for GLM-4.5 models when using --jinja flag.
The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.
@ajunca
Copy link

ajunca commented Aug 9, 2025

I tried the PR, and it fixes tool calling on GLM 4.5 Air (unsloth version) getting called correctly.
Then though this other problem #15046 arise.

@dhandhalyabhavik
Copy link
Author

I tried the PR, and it fixes tool calling on GLM 4.5 Air (unsloth version) getting called correctly. Then though this other problem #15046 arise.

But its Qwen tool calling issue right? I think once other pending PRs are merged you should not see the issue.

@ajunca
Copy link

ajunca commented Aug 9, 2025

Yea, I don't think is related to this specific PR. But the problem is shared with this Qwen tool calling issue.

@dhandhalyabhavik
Copy link
Author

dhandhalyabhavik commented Aug 10, 2025

Cline

Works great now with Cline 💪,

image.png

Cherry studio with MCP

Works great with MCP settings too 🔥.

image.png

Kilo code

works great.

image

@TNohSam
Copy link

TNohSam commented Aug 10, 2025

Hey, quick thought — I might be misunderstanding this, but it looks like this PR will parse GLM’s XML-style tool calls and turn them into JSON tool_calls before they reach the client.

If that’s the case, projects like Roo Code (which currently only know how to handle XML tool calls) might suddenly stop recognizing the output from GLM models when running through llama.cpp.

Am I right about this?

@jfgonsalves
Copy link

Does this template parse the thinking tags correctly? I'm getting my responses inline instead of in the reasoning_content field.

@bfroemel
Copy link

Very nice!

#15162 aims to achieve the same for Qwen3 Coder; only seems more mature/higher quality (using minja and letting it handle quoting/escaping argument strings, storing the jinja template in ./models/templates, having test cases in ./tests/test-chat.cpp,). Maybe @ochafik and @dhandhalyabhavik can sync up/collaborate and bring both PRs in a consistent way forward?

@dhandhalyabhavik
Copy link
Author

dhandhalyabhavik commented Aug 11, 2025

Hello everyone, thanks for insightful comments, Let me answer all of you,

@TNohSam There are two ways to implement tool calling,
(1) use instruction following template, write parsing code and parse manually.
(2) OpenAI compatible tool calling where functions or tools are part of their chat object class <--- This is what people refer when they say model supports tool calling

I have tested Roo Code just now, it is working fine. Both type of function or tool calling will work with the current PR.

@jfgonsalves enable reasoning_content via llama-server's flag. Check flags.

@bfroemel sure, @ochafik can you please review my added changes? Help me merge this PR. I would really appreciate. Thank you.

@dhandhalyabhavik
Copy link
Author

@jfgonsalves

You can enable reasoning_content via flag.

There is parser logic common for all models that will do this job. Check out the code here

This PR has nothing to do with it. Thank you for pointing it out though.

check it our here
image

@trilog-inc
Copy link

I am still having trouble getting llama.cpp to identify the GLM-4.5 chat template. Am I missing something in my command?

srv params_from_: Chat format: Hermes 2 Pro

./build/bin/llama-server --model /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/Q5_K_M/GLM-4.5-Q5_K_M-00001-of-00006.gguf --alias glm-4.5 --no-webui --threads 44 --ctx-size 131072 --n-gpu-layers 94 -ot exps=CPU -ub 2048 -b 2048 --temp 0.6 --top-p 1.0 --flash-attn --host 0.0.0.0 --jinja --port 8099 --chat-template-file /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/template.jinja

@dhandhalyabhavik
Copy link
Author

dhandhalyabhavik commented Aug 14, 2025

I am still having trouble getting llama.cpp to identify the GLM-4.5 chat template. Am I missing something in my command?

srv params_from_: Chat format: Hermes 2 Pro

./build/bin/llama-server --model /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/Q5_K_M/GLM-4.5-Q5_K_M-00001-of-00006.gguf --alias glm-4.5 --no-webui --threads 44 --ctx-size 131072 --n-gpu-layers 94 -ot exps=CPU -ub 2048 -b 2048 --temp 0.6 --top-p 1.0 --flash-attn --host 0.0.0.0 --jinja --port 8099 --chat-template-file /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/template.jinja

Hey @trilog-inc

I rebuild and its working fine for me. I got GLM 4.5 in my logs, tool calling also works well.

srv params_from_: Chat format: GLM 4.5

I hope you have build it correctly using clone repo -> switch to common-glm45-tool-calls branch -> build -> test

FYI, I have used GLM 4.5 Air for testing. Both should work as I can see both of them have same Arch & jinja template.

I have used this command, (re-copy jinja template, there there is a modification recommended by a user)

./llama.cpp/build/bin/llama-server -hf unsloth/GLM-4.5-Air-GGUF:IQ2_M --alias GLM-4.5-Air-GPUs -c 60000 --host 0.0.0.0 -np 1 -ngl 999 -ts 72,28 -b 1024 -ub 256 --jinja --chat-template-file template/chat_template.jinja

@feliscat
Copy link

feliscat commented Aug 18, 2025

Trying this PR, tools get called and get a response but the model can't continue with them.

I do see the right chat format as expected in the logs and did build the PR correctly.

srv params_from_: Chat format: GLM 4.5

srv  log_server_r: request: GET /v1/models 192.168.1.56 200

got exception: {"code":500,"message":"Value is not callable: null at row 56, column 70:\n (...) ","type":"server_error"}

srv  log_server_r: request: POST /v1/chat/completions 192.168.1.56 500

I'm using https://huggingface.co/unsloth/GLM-4.5-Air-GGUF?chat_template=default as the chat template

Edit: Using the jinja template in the OP fixed the issue.

@jerrydeng
Copy link

I had similar problem. Not sure what chat template to use

@feliscat
Copy link

@jerrydeng

I copied the template from the OP's edit and used it, and now tool calling is working for me.

@jerrydeng
Copy link

It worked like a charm, thanks

@ggerganov ggerganov requested a review from ochafik August 19, 2025 19:29
@jerrydeng
Copy link

jerrydeng commented Aug 19, 2025

For claude code / openai, this template would work

{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}

{#--- SYSTEM + TOOLS (OpenAI-style JSON) ---#}
{%- if tools -%}
<|system|>
You can call functions ("tools") to complete tasks. Tools are provided as JSON below.
Respond NORMALLY unless a tool is needed.
If you decide to call a tool, output EXACTLY ONE LINE containing ONLY this JSON:

{"tool_call":{"name":"<function name>","arguments":{...}}}

Rules:
- The JSON must be strictly valid (double quotes, no trailing commas).
- Put all arguments under "arguments" as an object.
- Do not add commentary or extra text on that line.

TOOLS (OpenAI schema):
{{ tools | tojson }}
{%- endif -%}

{#--- find last user index for think gating ---#}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
  {%- if m.role == 'user' %}
    {%- set user_content = visible_text(m.content) -%}
    {%- if not ("tool_response" in user_content) %}
      {%- set ns.last_user_index = loop.index0 -%}
    {%- endif -%}
  {%- endif %}
{%- endfor %}

{#--- MESSAGE RENDERING ---#}
{%- for m in messages %}

  {%- if m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}

  {%- elif m.role == 'user' -%}
<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ user_content }}
{%- if enable_thinking is defined and not enable_thinking -%}
  {%- if not user_content.endswith("/nothink") -%}
/nothink
  {%- endif -%}
{%- endif -%}

  {%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set content = visible_text(m.content) %}
{%- set reasoning_content = '' %}

{# pull <think> block out of prior assistant messages if present #}
{%- if m.reasoning_content is string %}
  {%- set reasoning_content = m.reasoning_content %}
{%- else %}
  {%- if '</think>' in content %}
    {%- set think_parts = content.split('</think>') %}
    {%- if think_parts|length > 1 %}
      {%- set before_end_think = think_parts[0] %}
      {%- set after_end_think = think_parts[1] %}
      {%- set think_start_parts = before_end_think.split('<think>') %}
      {%- if think_start_parts|length > 1 %}
        {%- set reasoning_content = think_start_parts[-1].lstrip('\n') %}
      {%- endif %}
      {%- set content = after_end_think.lstrip('\n') %}
    {%- endif %}
  {%- endif %}
{%- endif %}

{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
<think>{{ reasoning_content.strip() }}</think>
{%- else -%}
<think></think>
{%- endif -%}

{# normal assistant text, if any #}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}

{# tool call ECHO SUPPORT (when upstream passed tool_calls back into history) #}
{%- if m.tool_calls %}
{%- for tc in m.tool_calls %}
{%- set f = tc.function if tc.function else tc %}
{%- set args = f.arguments if f.arguments else {} %}
{{ '\n{"tool_call":{"name":"' ~ f.name ~ '","arguments":' ~ (args if args is string else (args | tojson)) ~ '}}' }}
{%- endfor %}
{%- endif %}

  {%- elif m.role == 'tool' -%}
{# Tool outputs are shown as observations for the model to read #}
<|observation|>
{%- if m.content is string -%}
<tool_response>
{{ m.content }}
</tool_response>
{%- else -%}
{%- for tr in m.content %}
<tool_response>
{{ tr.output if tr.output is defined else tr }}
</tool_response>
{%- endfor -%}
{%- endif -%}

  {%- endif -%}
{%- endfor %}

{#--- generation prompt ---#}
{%- if add_generation_prompt -%}
<|assistant|>
{%- if enable_thinking is defined and not enable_thinking -%}<think></think>{%- endif -%}
{%- endif -%}

This worked with Claude Code and Claude Code Router.

}

static void common_chat_parse_glm_4_5(common_chat_msg_parser & builder) {
builder.try_parse_reasoning("<think>", "</think>");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be builder.try_parse_reasoning("\n<think>", "</think>");?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hksdpc255 Thank you so much for pointing it out, currently testing, will get back to you soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hksdpc255

Update 1: You are right, we need to add \n before , I have started seeing reasoning_content variable with thinking tokens. But everything else breaks, Roo Code, Cline and Kilo code stops working. I am still debugging the issue.

@hksdpc255
Copy link

hksdpc255 commented Aug 20, 2025

@jfgonsalves

You can enable reasoning_content via flag.

There is parser logic common for all models that will do this job. Check out the code here

This PR has nothing to do with it. Thank you for pointing it out though.

check it our here image

@dhandhalyabhavik I don't thin the <think> appears in content is something related to reasoning_content flag.

See: #15186 (comment)_

I changed builder.try_parse_reasoning("<think>", "</think>"); in function common_chat_parse_glm_4_5 to builder.try_parse_reasoning("\n<think>", "</think>");, then the reasoning working with your chat template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants