Description
Name and Version
OS: Ubuntu 22.04 (AMD64)
$ ./build/bin/llama-mtmd-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
version: 5572 (7675c55)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
CUDA 12.8
Operating systems
Linux
GGML backends
CUDA
Hardware
AMD Ryzen 7 8845HS + RTX4070(8GB)
Models
unsloth_gemma-3-4b-it-GGUF_gemma-3-4b-it-UD-Q4_K_XL.gguf
The model was downloaded with -hf option.
Problem description & steps to reproduce
Symptom:
Execute llama-mtmd-cli with --image option. it seemed it doesn't load the image.
No error messages when it loading the image, but when I send first query to the LLM,
it returned
"Please provide me with the photo! I need to see the image to tell you what's in it."
Judging from this message, it seemed it failed to load image when I specify it in command option.
Step to reproduce
(1) $ ./build/bin/llama-mtmd-cli --image /home/username/Repo/llama.cpp/dog01.png -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL
(2) After initialize the LLM, I make a query.
Actual result
what is in the photo
Please provide me with the photo! I need to see the image to tell you what's in it. 😊 Once you upload or describe the photo, I'll do my best to identify the objects, people, or scenes within it.
Expected Result.
What is in the photo.
Here's what's in the photo:
It shows four dogs running in a grassy field. They appear to be having fun and enjoying themselves.
Additional information
After the LLM is initialized, I use /image to load the image file.
It works as expected.
For example,
/image /home/username/Repo/llama.cpp/dog01.png
/home/username/Repo/llama.cpp/dog01.png image loaded
What is in the photo.
encoding image slice...
image slice encoded in 778 ms
decoding image batch 1/1, n_tokens_batch = 256
image decoded (batch 1/1) in 297 ms
/image /home/iida/Repo/llama.cpp/dog01.png
/home/iida/Repo/llama.cpp/dog01.png image loaded
Here's what's in the photo:
It shows four dogs running in a grassy field. They appear to be having fun and enjoying themselves.
First Bad Commit
it doesn't work since llama-mtmd-cli was introduced.
Relevant log output
see problem description.