Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
The documentation for /completion
has a description for this obsolete field:
image_data
: An array of objects to hold base64-encoded imagedata
and itsid
s to be reference inprompt
. You can determine the place of the image in the prompt as in the following:USER:[img-12]Describe the image in detail.\nASSISTANT:
. In this case,[img-12]
will be replaced by the embeddings of the image with id12
in the followingimage_data
array:{..., "image_data": [{"data": "<BASE64_STRING>", "id": 12}]}
. Useimage_data
only with multimodal models, e.g., LLaVA.
However, when passing a prompt with [img-1] to a multimodal model loaded along its corresponding mmproj, the model doesn't understand the image. It works fine with the /chat/completions
endpoint though.
Motivation
My project does its own prompt formatting and communicates with llama.cpp through /completion
. I would like to integrate llama.cpp's multimodal feature but am unable to due to the limitation above.
Possible Implementation
No response