-
Notifications
You must be signed in to change notification settings - Fork 465
Description
Description of the bug:
Calling generate_content
on a Gemini Pro Vision model returns an error when it receives a PNG image saying KeyError: 'RGBA'
which causes another execption saying OSError: cannot write mode RGBA as JPEG
. This seems to indicate that PNG is not supported, but according to the Gemini API docs, PNG is a supported MIME type. Note that the png example from that docs page doesn't seem to work. It uses a contents
kwarg to generate_content
, but that argument doesn't exist. Modifying the code to use the right arguments gives the error google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.
Actual vs expected behavior:
The expected behavior is for this code:
screenshot = get_screen_data()
prompt = "What are your thoughts on this screenshot? I think"
response = model.generate_content(
[prompt, screenshot], stream=True
)
response.resolve()
print(response.text)
to work successfully. This code was modified from the text from image and text example in the quickstart. Instead, it outputs the KeyError and OSError above. Changing the code to:
screenshot = get_screen_data()
screenshot_data = {
'mime_type': 'image/png',
'data': screenshot.tobytes()
}
prompt = "What are your thoughts on this screenshot? I think"
response = model.generate_content(
[prompt, screenshot_data], stream=True
)
response.resolve()
print(response.text)
Raises a 400 error as described above. This code is modified from that Gemini API Overview
Any other information you'd like to share?
#112 is related to this. Specifically, it deals with my second attempt at solving this problem. This issue is about the fact that generate_content doesn't handle PNG by default even though it is supposedly supported.