Skip to content

Eval bug: llama-mtmd-cli : option --image failed to load image #13959

Open
@standby24x7

Description

@standby24x7

Name and Version

OS: Ubuntu 22.04 (AMD64)

$ ./build/bin/llama-mtmd-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
version: 5572 (7675c55)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

CUDA 12.8

Operating systems

Linux

GGML backends

CUDA

Hardware

AMD Ryzen 7 8845HS + RTX4070(8GB)

Models

unsloth_gemma-3-4b-it-GGUF_gemma-3-4b-it-UD-Q4_K_XL.gguf

The model was downloaded with -hf option.

Problem description & steps to reproduce

Symptom:
Execute llama-mtmd-cli with --image option. it seemed it doesn't load the image.
No error messages when it loading the image, but when I send first query to the LLM,
it returned
"Please provide me with the photo! I need to see the image to tell you what's in it."
Judging from this message, it seemed it failed to load image when I specify it in command option.

Step to reproduce
(1) $ ./build/bin/llama-mtmd-cli --image /home/username/Repo/llama.cpp/dog01.png -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

(2) After initialize the LLM, I make a query.

Actual result

what is in the photo

Please provide me with the photo! I need to see the image to tell you what's in it. 😊 Once you upload or describe the photo, I'll do my best to identify the objects, people, or scenes within it.

Expected Result.

What is in the photo.

Here's what's in the photo:
It shows four dogs running in a grassy field. They appear to be having fun and enjoying themselves.

Additional information
After the LLM is initialized, I use /image to load the image file.
It works as expected.
For example,

/image /home/username/Repo/llama.cpp/dog01.png
/home/username/Repo/llama.cpp/dog01.png image loaded

What is in the photo.
encoding image slice...
image slice encoded in 778 ms
decoding image batch 1/1, n_tokens_batch = 256
image decoded (batch 1/1) in 297 ms

/image /home/iida/Repo/llama.cpp/dog01.png
/home/iida/Repo/llama.cpp/dog01.png image loaded

Here's what's in the photo:

It shows four dogs running in a grassy field. They appear to be having fun and enjoying themselves.

First Bad Commit

it doesn't work since llama-mtmd-cli was introduced.

Relevant log output

see problem description.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions