Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp.

I have fine-tuned qwen2.5-vl-7b using unsloth and merged it with LoRA. Now, I need to use llama.cpp to perform Q4 quantization on it. Before that, I converted it to the GGUF format using convert_hf_to_gguf.py. I want to test the performance of the unquantized model first and deployed the model using the following command:
`./llama-server -m /root/autodl-tmp/qwen2.5-vl/qwen-gguf/qwen2.5.gguf -c 2048`

![Image](https://github.com/user-attachments/assets/395ceceb-9981-406b-b349-20f0106e7969)
The model was deployed successfully without any errors. However, when I tested it with the following request:
`curl http://127.0.0.1:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/root/autodl-tmp/qwen2.5-vl/qwen-gguf/qwen2.5.gguf",
        "messages": [
            {"role": "system", "content": "你是一个有用的助手。"},
            {"role": "user", "content": [
                {"type": "image_url", 
                 "image_url": {
                   "url": "https://oss-pai-emcfh1jjcesunsrf7g-cn-guangzhou.oss-cn-guangzhou.aliyuncs.com/031920645691.jpg?Expires=1740968587&OSSAccessKeyId=TMP.3KoFNaN1sZAKuMb8zSRv5Ct65nWvYgsQfACyR9DRFXPzTVTVh4Ym6uQUp8nXcoANAP7MatHJB5Gux1iz2iwRgQEfPM4zpc&Signature=2%2FkCE6f5QkjQhY7t9zsCYSacmiA%3D"
                 }
                },
                {"type": "text", "text": "这是一张电表图片,提取具体的电表读数,总共有6位,最后1位为小数位,小数位不需要提取,只返回最终的电表读数不要返回多余内容"}
            ]}
        ]
    }'`
I encountered an error stating that the model does not support image input.

![Image](https://github.com/user-attachments/assets/49c0d23c-dcdf-4ffb-bf40-ab2dd5a809c5)
After researching, I found that when deploying multimodal models using llama.cpp, the command generally looks like this:
`build/bin/llama-server -m ../models/BroadBit/Qwen2.5-VL-7B-Instruct-Q8_0.gguf --mmproj ../models/BroadBit/mmproj-Qwen2.5-VL-7B-Instruct-f16.gguf -c 32768 -ngl 50 --temp 0.01 -np 1 --host 0.0.0.0 --port 18080 --mlock --no-warmup -t 4`
Here, the --mmproj option is used. I would like to know how to generate the corresponding mmproj file when I convert a multimodal model to the GGUF format using llama.cpp. I am not very familiar with llama.cpp and would appreciate guidance from someone experienced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp. #13723

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp. #13723

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions