Eval bug: llama-mtmd-cli : option --image failed to load image

### Name and Version

OS: Ubuntu 22.04 (AMD64)

$ ./build/bin/llama-mtmd-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
version: 5572 (7675c555)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

CUDA 12.8



### Operating systems

Linux

### GGML backends

CUDA

### Hardware

AMD Ryzen 7 8845HS + RTX4070(8GB)


### Models

unsloth_gemma-3-4b-it-GGUF_gemma-3-4b-it-UD-Q4_K_XL.gguf

The model was downloaded with  -hf option.


### Problem description & steps to reproduce

Symptom:  
Execute  llama-mtmd-cli  with  --image option.  it seemed it doesn't load the image.
No error messages when it loading the image,   but when I send first query to the LLM, 
it returned 
"Please provide me with the photo! I need to see the image to tell you what's in it."
Judging from this message,  it seemed it failed to load image when I specify it in command option.

Step to reproduce 
(1) $ ./build/bin/llama-mtmd-cli --image /home/username/Repo/llama.cpp/dog01.png -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

(2) After initialize the LLM,  I make a query.

Actual result
> what is in the photo

Please provide me with the photo! I need to see the image to tell you what's in it. 😊 Once you upload or describe the photo, I'll do my best to identify the objects, people, or scenes within it.

Expected Result.
> What is in the photo.

Here's what's in the photo:
It shows four dogs running in a grassy field. They appear to be having fun and enjoying themselves. 


Additional information
After the LLM is initialized,  I use  /image to load the image file.
It works as expected.
For example,

> /image /home/username/Repo/llama.cpp/dog01.png
/home/username/Repo/llama.cpp/dog01.png image loaded

> What is in the photo.
encoding image slice...
image slice encoded in 778 ms
decoding image batch 1/1, n_tokens_batch = 256
image decoded (batch 1/1) in 297 ms

> /image /home/iida/Repo/llama.cpp/dog01.png
/home/iida/Repo/llama.cpp/dog01.png image loaded

Here's what's in the photo:

It shows four dogs running in a grassy field. They appear to be having fun and enjoying themselves. 



### First Bad Commit

it doesn't work since llama-mtmd-cli was introduced.


### Relevant log output

```shell
see problem description.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: llama-mtmd-cli : option --image failed to load image #13959

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: llama-mtmd-cli : option --image failed to load image #13959

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions