common : add GLM-4.5 tool calling support #15186

dhandhalyabhavik · 2025-08-08T22:18:00Z

Add COMMON_CHAT_FORMAT_GLM_4_5 format enum
Implement GLM-4.5 tool call parser for <tool_call><arg_key><arg_value> format
Add template detection based on <arg_key> and <arg_value> tags
Fix null content handling in message parsing and serialization
Ensure GLM-4.5 detection runs before Hermes to avoid misidentification

This enables tool calling functionality for GLM-4.5 models when using --jinja flag. The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.

Personally verified working on following applications

Cline
Roo Code
Kilo Code
Cherry Studio (MCP + Tool calling)

~~Unfortunately its not working with OpenAI API SDK because jinja requires dict parser but OpenAI requires json.~~

Now works with OpenAI SDK too.
above issue is now fixed with corrected Jinja template. The template works great with cline too. I extensively tested it.

Corrected Jinja template.

[gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson }}
{% endfor %}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
    {%- if m.role == 'user' %}
        {%- set user_content = visible_text(m.content) -%}
        {%- if not ("tool_response" in user_content) %}
            {% set ns.last_user_index = loop.index0 -%}
        {%- endif -%}
    {%- endif %}
{%- endfor %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ user_content }}
{%- if enable_thinking is defined and not enable_thinking -%}
{%- if not user_content.endswith("/nothink") -%}
{{- '/nothink' -}}
{%- endif -%}
{%- endif -%}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
    {%- set reasoning_content = m.reasoning_content %}
{%- else %}
    {%- if '</think>' in content %}
        {%- set think_parts = content.split('</think>') %}
        {%- if think_parts|length > 1 %}
            {%- set before_end_think = think_parts[0] %}
            {%- set after_end_think = think_parts[1] %}
            {%- set think_start_parts = before_end_think.split('<think>') %}
            {%- if think_start_parts|length > 1 %}
                {%- set reasoning_content = think_start_parts[-1].lstrip('\n') %}
            {%- endif %}
            {%- set content = after_end_think.lstrip('\n') %}
        {%- endif %}
    {%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- if tc.function %}
    {%- set tc = tc.function %}
{%- endif %}
{{ '\n<tool_call>' + tc.name }}
{% set _args = tc.arguments %}
{% for k, v in _args.items() %}
<arg_key>{{ k }}</arg_key>
<arg_value>{{ v | tojson if v is not string else v }}</arg_value>
{% endfor %}
</tool_call>{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
    {{- '<|observation|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- m.content }}
{{- '\n</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}

<tool_response>
{{ tr.output if tr.output is defined else tr }}
</tool_response>{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}

@ggerganov @ngxson @slaren Please review and merge the PR. Thank you.

- Add COMMON_CHAT_FORMAT_GLM_4_5 format enum - Implement GLM-4.5 tool call parser for <tool_call><arg_key><arg_value> format - Add template detection based on <arg_key> and <arg_value> tags - Fix null content handling in message parsing and serialization - Ensure GLM-4.5 detection runs before Hermes to avoid misidentification This enables tool calling functionality for GLM-4.5 models when using --jinja flag. The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.

ajunca · 2025-08-09T11:21:35Z

I tried the PR, and it fixes tool calling on GLM 4.5 Air (unsloth version) getting called correctly.
Then though this other problem #15046 arise.

dhandhalyabhavik · 2025-08-09T13:52:43Z

I tried the PR, and it fixes tool calling on GLM 4.5 Air (unsloth version) getting called correctly. Then though this other problem #15046 arise.

But its Qwen tool calling issue right? I think once other pending PRs are merged you should not see the issue.

ajunca · 2025-08-09T18:39:01Z

Yea, I don't think is related to this specific PR. But the problem is shared with this Qwen tool calling issue.

dhandhalyabhavik · 2025-08-10T10:52:50Z

Cline

Works great now with Cline 💪,

Cherry studio with MCP

Works great with MCP settings too 🔥.

Kilo code

works great.

TNohSam · 2025-08-10T17:48:26Z

Hey, quick thought — I might be misunderstanding this, but it looks like this PR will parse GLM’s XML-style tool calls and turn them into JSON tool_calls before they reach the client.

If that’s the case, projects like Roo Code (which currently only know how to handle XML tool calls) might suddenly stop recognizing the output from GLM models when running through llama.cpp.

Am I right about this?

jfgonsalves · 2025-08-11T07:03:27Z

Does this template parse the thinking tags correctly? I'm getting my responses inline instead of in the reasoning_content field.

bfroemel · 2025-08-11T14:43:00Z

Very nice!

#15162 aims to achieve the same for Qwen3 Coder; only seems more mature/higher quality (using minja and letting it handle quoting/escaping argument strings, storing the jinja template in ./models/templates, having test cases in ./tests/test-chat.cpp,). Maybe @ochafik and @dhandhalyabhavik can sync up/collaborate and bring both PRs in a consistent way forward?

dhandhalyabhavik · 2025-08-11T16:57:04Z

Hello everyone, thanks for insightful comments, Let me answer all of you,

@TNohSam There are two ways to implement tool calling,
(1) use instruction following template, write parsing code and parse manually.
(2) OpenAI compatible tool calling where functions or tools are part of their chat object class <--- This is what people refer when they say model supports tool calling

I have tested Roo Code just now, it is working fine. Both type of function or tool calling will work with the current PR.

@jfgonsalves enable reasoning_content via llama-server's flag. Check flags.

@bfroemel sure, @ochafik can you please review my added changes? Help me merge this PR. I would really appreciate. Thank you.

dhandhalyabhavik · 2025-08-11T17:58:34Z

@jfgonsalves

You can enable reasoning_content via flag.

There is parser logic common for all models that will do this job. Check out the code here

This PR has nothing to do with it. Thank you for pointing it out though.

check it our here

trilog-inc · 2025-08-12T17:08:32Z

I am still having trouble getting llama.cpp to identify the GLM-4.5 chat template. Am I missing something in my command?

srv params_from_: Chat format: Hermes 2 Pro

./build/bin/llama-server --model /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/Q5_K_M/GLM-4.5-Q5_K_M-00001-of-00006.gguf --alias glm-4.5 --no-webui --threads 44 --ctx-size 131072 --n-gpu-layers 94 -ot exps=CPU -ub 2048 -b 2048 --temp 0.6 --top-p 1.0 --flash-attn --host 0.0.0.0 --jinja --port 8099 --chat-template-file /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/template.jinja

dhandhalyabhavik · 2025-08-14T06:20:23Z

I am still having trouble getting llama.cpp to identify the GLM-4.5 chat template. Am I missing something in my command?

srv params_from_: Chat format: Hermes 2 Pro

./build/bin/llama-server --model /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/Q5_K_M/GLM-4.5-Q5_K_M-00001-of-00006.gguf --alias glm-4.5 --no-webui --threads 44 --ctx-size 131072 --n-gpu-layers 94 -ot exps=CPU -ub 2048 -b 2048 --temp 0.6 --top-p 1.0 --flash-attn --host 0.0.0.0 --jinja --port 8099 --chat-template-file /mnt/home_extend/models/unsloth_GLM-4.5-GGUF/template.jinja

Hey @trilog-inc

I rebuild and its working fine for me. I got GLM 4.5 in my logs, tool calling also works well.

srv params_from_: Chat format: GLM 4.5

I hope you have build it correctly using clone repo -> switch to common-glm45-tool-calls branch -> build -> test

FYI, I have used GLM 4.5 Air for testing. Both should work as I can see both of them have same Arch & jinja template.

I have used this command, (re-copy jinja template, there there is a modification recommended by a user)

./llama.cpp/build/bin/llama-server -hf unsloth/GLM-4.5-Air-GGUF:IQ2_M --alias GLM-4.5-Air-GPUs -c 60000 --host 0.0.0.0 -np 1 -ngl 999 -ts 72,28 -b 1024 -ub 256 --jinja --chat-template-file template/chat_template.jinja

feliscat · 2025-08-18T22:45:12Z

Trying this PR, tools get called and get a response but the model can't continue with them.

I do see the right chat format as expected in the logs and did build the PR correctly.

srv params_from_: Chat format: GLM 4.5

srv  log_server_r: request: GET /v1/models 192.168.1.56 200

got exception: {"code":500,"message":"Value is not callable: null at row 56, column 70:\n (...) ","type":"server_error"}

srv  log_server_r: request: POST /v1/chat/completions 192.168.1.56 500

I'm using https://huggingface.co/unsloth/GLM-4.5-Air-GGUF?chat_template=default as the chat template

Edit: Using the jinja template in the OP fixed the issue.

jerrydeng · 2025-08-18T22:55:56Z

I had similar problem. Not sure what chat template to use

feliscat · 2025-08-18T23:10:21Z

@jerrydeng

I copied the template from the OP's edit and used it, and now tool calling is working for me.

jerrydeng · 2025-08-19T05:21:38Z

It worked like a charm, thanks

jerrydeng · 2025-08-19T20:04:29Z

For claude code / openai, this template would work

{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}

{#--- SYSTEM + TOOLS (OpenAI-style JSON) ---#}
{%- if tools -%}
<|system|>
You can call functions ("tools") to complete tasks. Tools are provided as JSON below.
Respond NORMALLY unless a tool is needed.
If you decide to call a tool, output EXACTLY ONE LINE containing ONLY this JSON:

{"tool_call":{"name":"<function name>","arguments":{...}}}

Rules:
- The JSON must be strictly valid (double quotes, no trailing commas).
- Put all arguments under "arguments" as an object.
- Do not add commentary or extra text on that line.

TOOLS (OpenAI schema):
{{ tools | tojson }}
{%- endif -%}

{#--- find last user index for think gating ---#}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
  {%- if m.role == 'user' %}
    {%- set user_content = visible_text(m.content) -%}
    {%- if not ("tool_response" in user_content) %}
      {%- set ns.last_user_index = loop.index0 -%}
    {%- endif -%}
  {%- endif %}
{%- endfor %}

{#--- MESSAGE RENDERING ---#}
{%- for m in messages %}

  {%- if m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}

  {%- elif m.role == 'user' -%}
<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ user_content }}
{%- if enable_thinking is defined and not enable_thinking -%}
  {%- if not user_content.endswith("/nothink") -%}
/nothink
  {%- endif -%}
{%- endif -%}

  {%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set content = visible_text(m.content) %}
{%- set reasoning_content = '' %}

{# pull <think> block out of prior assistant messages if present #}
{%- if m.reasoning_content is string %}
  {%- set reasoning_content = m.reasoning_content %}
{%- else %}
  {%- if '</think>' in content %}
    {%- set think_parts = content.split('</think>') %}
    {%- if think_parts|length > 1 %}
      {%- set before_end_think = think_parts[0] %}
      {%- set after_end_think = think_parts[1] %}
      {%- set think_start_parts = before_end_think.split('<think>') %}
      {%- if think_start_parts|length > 1 %}
        {%- set reasoning_content = think_start_parts[-1].lstrip('\n') %}
      {%- endif %}
      {%- set content = after_end_think.lstrip('\n') %}
    {%- endif %}
  {%- endif %}
{%- endif %}

{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
<think>{{ reasoning_content.strip() }}</think>
{%- else -%}
<think></think>
{%- endif -%}

{# normal assistant text, if any #}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}

{# tool call ECHO SUPPORT (when upstream passed tool_calls back into history) #}
{%- if m.tool_calls %}
{%- for tc in m.tool_calls %}
{%- set f = tc.function if tc.function else tc %}
{%- set args = f.arguments if f.arguments else {} %}
{{ '\n{"tool_call":{"name":"' ~ f.name ~ '","arguments":' ~ (args if args is string else (args | tojson)) ~ '}}' }}
{%- endfor %}
{%- endif %}

  {%- elif m.role == 'tool' -%}
{# Tool outputs are shown as observations for the model to read #}
<|observation|>
{%- if m.content is string -%}
<tool_response>
{{ m.content }}
</tool_response>
{%- else -%}
{%- for tr in m.content %}
<tool_response>
{{ tr.output if tr.output is defined else tr }}
</tool_response>
{%- endfor -%}
{%- endif -%}

  {%- endif -%}
{%- endfor %}

{#--- generation prompt ---#}
{%- if add_generation_prompt -%}
<|assistant|>
{%- if enable_thinking is defined and not enable_thinking -%}<think></think>{%- endif -%}
{%- endif -%}

This worked with Claude Code and Claude Code Router.

hksdpc255 · 2025-08-20T04:47:35Z

common/chat.cpp

+}
+
+static void common_chat_parse_glm_4_5(common_chat_msg_parser & builder) {
+    builder.try_parse_reasoning("<think>", "</think>");


Should this be builder.try_parse_reasoning("\n<think>", "</think>");?

Hey @hksdpc255 Thank you so much for pointing it out, currently testing, will get back to you soon.

Hey @hksdpc255

Update 1: You are right, we need to add \n before , I have started seeing reasoning_content variable with thinking tokens. But everything else breaks, Roo Code, Cline and Kilo code stops working. I am still debugging the issue.

hksdpc255 · 2025-08-20T04:50:10Z

@jfgonsalves

You can enable reasoning_content via flag.

There is parser logic common for all models that will do this job. Check out the code here

This PR has nothing to do with it. Thank you for pointing it out though.

check it our here

@dhandhalyabhavik I don't thin the <think> appears in content is something related to reasoning_content flag.

See: #15186 (comment)_

I changed builder.try_parse_reasoning("<think>", "</think>"); in function common_chat_parse_glm_4_5 to builder.try_parse_reasoning("\n<think>", "</think>");, then the reasoning working with your chat template.

ubergarm mentioned this pull request Aug 10, 2025

add jinja template support ikawrakow/ik_llama.cpp#677

Merged

4 tasks

TNohSam mentioned this pull request Aug 10, 2025

RFC: Native Tool Use for Top-Tier AI Models RooCodeInc/Roo-Code#4047

Open

4 tasks

ggerganov requested a review from ochafik August 19, 2025 19:29

hksdpc255 reviewed Aug 20, 2025

View reviewed changes

common : add GLM-4.5 tool calling support #15186

Are you sure you want to change the base?

common : add GLM-4.5 tool calling support #15186

Conversation

dhandhalyabhavik commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajunca commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhandhalyabhavik commented Aug 9, 2025

Uh oh!

ajunca commented Aug 9, 2025

Uh oh!

dhandhalyabhavik commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cline

Cherry studio with MCP

Kilo code

Uh oh!

TNohSam commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfgonsalves commented Aug 11, 2025

Uh oh!

bfroemel commented Aug 11, 2025

Uh oh!

dhandhalyabhavik commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhandhalyabhavik commented Aug 11, 2025

Uh oh!

trilog-inc commented Aug 12, 2025

Uh oh!

dhandhalyabhavik commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feliscat commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrydeng commented Aug 18, 2025

Uh oh!

feliscat commented Aug 18, 2025

Uh oh!

jerrydeng commented Aug 19, 2025

Uh oh!

jerrydeng commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hksdpc255 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

dhandhalyabhavik Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

dhandhalyabhavik Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

hksdpc255 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dhandhalyabhavik commented Aug 8, 2025 •

edited

Loading

ajunca commented Aug 9, 2025 •

edited

Loading

dhandhalyabhavik commented Aug 10, 2025 •

edited

Loading

TNohSam commented Aug 10, 2025 •

edited

Loading

dhandhalyabhavik commented Aug 11, 2025 •

edited

Loading

dhandhalyabhavik commented Aug 14, 2025 •

edited

Loading

feliscat commented Aug 18, 2025 •

edited

Loading

jerrydeng commented Aug 19, 2025 •

edited

Loading

hksdpc255 commented Aug 20, 2025 •

edited

Loading