Skip to content

feat(img): implement rudimentary image API #31399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

chipsenkbeil
Copy link

@chipsenkbeil chipsenkbeil commented Nov 29, 2024

Status

This PR serves as a nice feasibility study but has grown too large to consider merging. So it is temporarily parked while a suitable API is being discussed.


Feature & Implementation Breakdown

The following is both a tracker and detailing of the features supported by this PR, what they look like, and a visual of each implementation (provider) in order to support as many terminals as possible, including on Windows. Monitor this top-level comment for the current API structure and visual examples of the PR in action.

Core API Structure

The API follows a simplistic, object-oriented design similar to vim.system(). You specify an image, optionally eagerly loading its data, and then display it within neovim. The information about where and how it is displayed is provided through a single table.

Each image, once shown, returns a placement, which represents the specific instance of the image on screen. This is to support the separation of an image and its data from the details and management of displaying it.

The image API makes use of an internal Promise<T> class in order to support both synchronous and asynchronous operation, meaning that images can be loaded, shown, hidden, and updated asynchronously in order to be efficient, but also can manually be waited upon at any stage.

  • A static image you can place within neovim
    -- Supports loading PNG images into memory
    local img = assert(vim.ui.img.load("/path/to/img.png"):wait())
    
    -- Supports lazy-loading image, deferring to a provider
    local img = vim.ui.img.new("/path/to/img.png")
    
    -- Supports specifying an image and explicitly providing the data
    local img = vim.ui.img.new({ bytes = "...", filename = "/path/to/img.png" })
    
    -- Once created, can be shown, returning an object representing the instance
    local placement = img:show():wait() -- Places in top-left of editor with default size
    local placement = img:show({ pos = { x = 4, y = 8, unit = 'cell' }):wait()
    local placement = img:show({ relative = "cursor" }):wait()
  • Support deleting the image placed within neovim
    local img = vim.ui.img.new({ filename = "/path/to/img.png" })
    local placement = assert(img:show():wait())
    
    -- Supports hiding the specific instance of the image
    placement:hide()
  • Dynamically resize/move an image
    local img = vim.ui.img.new({ filename = "/path/to/img.png" })
    local placement = assert(img:show({ pos = { x = 1, y = 2, unit = "cell" } }):wait())
    
    -- Supports updating a displayed image with a new position
    placement:update({ pos = { x = 5, y = 6, unit = "cell" } }):wait()
    
    -- Supports resizing a displayed image
    placement:update({ size = { width = 10, height = 5, unit = "cell" } }):wait()
  • Asynchronous support for image operations
    img:show({ ... })
        :on_ok(function(placement)
            -- Use the placement once it has been confirmed as shown
        end)
        :on_fail(function(err)
            -- Do something with the error that occurred
        end)
        :on_done(function(err, placement)
            -- When either ok or fail happens
        end)
    
    -- The same applies for all APIs:
    --
    --    img.load() -> returns Promise<Image>
    --    img:show() -> returns Promise<Placement>
    --    placement:show() -> returns Promise<true>
    --    placement:hide() -> returns Promise<true>
    --    placement:update() -> returns Promise<true>
    --
    -- Each Promise<T> supports chaining callbacks for individual
    -- conditions of success or failure as well as combining the two
    -- together.
    --
    -- The on_* methods can be called multiple times and each
    -- callback will be invoked when finished.
    --
    -- You can also still choose to wait in a synchronous fashion
    -- using `:wait()` which supports supplying a specific timeout
    -- in milliseconds.

Image Manipulation & Inspection

Leveraging ImageMagick for the functionality, this section describes features needed to manipulate and inspect images. This is particularly needed for sixel, which would normally require advanced image decoding to convert to an RGB format and then packaged in sixel's format. ImageMagick supports features like cropping and resizing, which we expose through the convert method.

Additionally, provides a retrieval API called identify (modeled after magick identify) to inspect images without needing advanced header parsing per image format in order to learn details like the true image format (not just reading the file extension) and pixel dimensions of the image.

  • Manipulation an image
    -- Making use of ImageMagick, you can convert an image
    -- to different formats, resize it, crop it, and more
    --
    -- And like all other APIs, this returns a promise
    img:convert({ 
        format = 'jpeg', -- Supports a variety of formats, including sixel!
        size = { width = 50, height = 30, unit = 'pixel' },
    }):on_done(function(err, data)
        -- By default, this returns the data of the image
        -- instead of updating the instance or saving it
        -- to disk
        --
        -- Use img:convert({ out = '/path/to/img.jpeg' })
        -- to write to disk instead
    end)
  • Retrieve information about an image
    -- Retrieving information also makes use of ImageMagick
    -- to learn things like image dimensions and format
    img:identify({ format = true, size = true }):on_done(function(err, info)
        -- Info is a table containing the information requested by each
        -- field marked true
        --
        -- In this case, info.format == "PNG" and
        -- info.size == vim.ui.img.utils.Size { width = 50, height = 30, unit = 'pixel' }
    end)

Kitty Provider

The 1st protocol supported is kitty graphics protocol. Aside from the obvious kitty, other graphics-accelerated terminals like Ghostty make use of this protocol to display images. This protocol only requires magick when attempting to load an image other than PNG, which will be converted to PNG and placed in ~/.cache/nvim/imgs/{sha256(filename)}.png.

  • Integration of kitty as a provider

    vim.o.imgprovider = 'sixel'
    neovim-img-kitty.mp4

Sixel Provider

The 2nd protocol supported is sixel image protocol. Windows Terminal, in particular, makes use of this protocol to display images. This protocol requires magick in order to convert images into the sixel format, as well as all operations like resizing and cropping.

  • Integration of sixel as a provider

    vim.o.imgprovider = 'sixel'
    neovim-img-sixel-windows.mp4

iterm2 Provider

The 3rd protocol supported is iterm2 image protocol. WezTerm provides a GPU-accelerated implementation of this protocol. This protocol only requires magick in order to crop images. Resizing images does not require magick, and iterm2 supports multiple image formats.

  • Integration of iterm2 as a provider

    vim.o.imgprovider = 'iterm2'
    neovim-img-iterm2.mp4

3rd Party Providers

To enable GUIs and other image protocols to easily slide into neovim's API abstraction, a singular API is exposed through vim.ui.img.providers.new, which implementers will use to specify show and hide functions at a minimum. Each provider supports operations when loading, unloading, checking supported to see if the provider can operate in the current environment, and (if possible) specifying a more efficient update method.

The methods supported, show, hide, and update take a callback to support asynchronous operation, and must use it to communicate when the operation is finished. The methods load and unload are considered synchronous and should be fast to operate.

  • Abstraction for 3rd parties to support images in neovim
    -- Providers implement a small API to support showing and hiding images
    vim.ui.img.providers['neovide'] = vim.ui.img.providers.new({
        ---(Optional) Called to initialize the provider.
        ---@param ... any arguments for the specific provider upon loading
        load = function(...)
          -- Implement here
        end,
    
        ---(Optional) Called to cleanup the provider.
        unload = function()
          -- Implement here
        end,
    
        ---(Optional) Reports whether this provider is supported in the current environment.
        ---@param on_supported? fun(supported:boolean) callback when finished checking
        supported = function(on_supported)
            -- Implement here
        end,
    
        ---Displays an image, returning (through callback) an id tied to the instance.
        ---@param img vim.ui.Image image data container to display
        ---@param opts vim.ui.img.Opts specification of how to display the image
        ---@param on_shown? fun(err:string|nil, id:integer|nil) callback when finished showing
        ---@return integer id unique identifier connected to the displayed image (not vim.ui.Image)
        show = function(img, opts, on_shown)
            -- Implement here
        end,
    
        ---Hides one or more displayed images.
        ---@param ids integer[] list of displayed image ids to hide
        ---@param on_hidden fun(err:string|nil, ids:integer[]|nil) callback when finished hiding
        hide = function(ids, on_hidden)
            -- Implement here
        end,
    
        ---(Optional) Updates an image, returning (through callback) a refreshed id tied to the instance.
        ---If not specified, neovim will invoke `hide(id)` followed by `show(img, opts, on_updated)`.
        ---@param id integer id of the displayed image to update
        ---@param opts vim.ui.img.Opts specification of how to display the image
        ---@param on_updated? fun(err:string|nil, id:integer|nil) callback when finished updating
        ---@return integer id unique identifier connected to the displayed image (not vim.ui.Image)
        update = function(id, opts, on_updated)
            -- Implement here
        end,
    })
  • Example of using the neovide 3rd party once specified above
    -- Load an image to display, nothing different here
    local img = assert(vim.ui.img.load("/path/to/img.png"):wait())
    
    -- Set the global provider, which will be used for all image operations by default
    vim.o.imgprovider = 'neovide'
    
    -- Display the image, which will use the neovim provider
    local placement = assert(img:show():wait())
    
    -- Alternatively, you can manually create a placement and specify the provider
    -- This will not be shown yet until specifically invoked
    local placement = img:new_placement({ provider = 'neovide' })
    placement:show({ size = { ... } }):wait()

Maybe in scope for this PR

  • Injection of an image into a buffer within neovim (tracking its movement properly, possibly through the use of extmarks)
    1. Support raw placement of an image on top of text in a buffer, but have the image scroll with the buffer
    2. Support injecting an image into the buffer and reflowing text around it

Out of Scope for this PR

  • Video/gif support (there are reasons why this would be neat, but not a dealbreaker if we want to exclude this from neovim core)

@chipsenkbeil

This comment was marked as outdated.

@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 3 times, most recently from 9964ad6 to 630f852 Compare November 30, 2024 03:00
@chipsenkbeil chipsenkbeil changed the title Implement image API for #30889 feat(img): add vim.img.protocol() to detect preferred graphics protocol Nov 30, 2024
@chipsenkbeil chipsenkbeil changed the title feat(img): add vim.img.protocol() to detect preferred graphics protocol Implement image API for #30889 Nov 30, 2024
@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 2 times, most recently from fe0d5a8 to 33ee581 Compare November 30, 2024 04:27
@chipsenkbeil chipsenkbeil changed the title Implement image API for #30889 feat(img): implement image API Nov 30, 2024
@chipsenkbeil chipsenkbeil changed the title feat(img): implement image API feat(img): implement image API for absolute positions Nov 30, 2024
@chipsenkbeil chipsenkbeil mentioned this pull request Nov 30, 2024
@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 4 times, most recently from babd349 to dc51bf3 Compare November 30, 2024 20:56
@chipsenkbeil

This comment was marked as outdated.

@kovidgoyal
Copy link

kovidgoyal commented Dec 1, 2024 via email

vim.validate('opts', opts, { 'string', 'table' })
if type(opts) == 'table' then
vim.validate('opts.bytes', opts.bytes, 'string', true)
vim.validate('opts.filename', opts.filename, 'string')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"file" is common in vim.fs. Eventually we will have URIs, and "file" is more compatible with that.

Suggested change
vim.validate('opts.filename', opts.filename, 'string')
vim.validate('opts.file', opts.file, 'string')

Comment on lines 138 to 156
---Will throw an error if unable to load the bytes of the file.
---Works without ImageMagick.
---@return boolean
function M:is_png()
Copy link
Member

@justinmk justinmk May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange for a boolean getter to throw. can this be format():'png'|nil instead, where nil means "unknown"? or maybe format():'png'|'error'|nil

or perhaps is_png():boolean|nil

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can do that instead. Do you expect to support other formats other than PNG down the line? If so, one of the format functions makes sense. If not, the is_png makes more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's assume PNG.

Comment on lines 246 to 250
---Returns information about the image by invoking ImageMagick's identify.
---This will be some uppercase string like 'PNG'.
---@param opts? vim.ui.img.IdentifyOpts
---@return vim.ui.img.utils.Promise<vim.ui.img.Metadata>
function M:identify(opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely out of scope for this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reason it existed was to easily find any image format's dimensions (width & height). If we just stick with PNG, I know how to read that from the file header. So, no problem.

Comment on lines 435 to 444
---Converts an image using ImageMagick, returning the bytes of the new image.
---
---If `background` is specified, will convert alpha pixels to the background color (e.g. #ABCDEF).
---If `crop` is specified, will crop the image to the specified pixel dimensions.
---If `format` is specified, will convert to the image format, defaulting to png.
---If `out` is specified, will treat as output file and write to it, and promise returns empty str.
---If `size` is specified, will resize the image to the desired size.
---@param opts? vim.ui.img.ConvertOpts
---@return vim.ui.img.utils.Promise<string>
function M:convert(opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem. If we did want sixel or need cropping for iterm2 (if we brought it in), then we would need this. Or we'd have to provide or write a PNG decoder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiring PNG seems reasonable. converting to PNG is a different topic, which could be covered in some other api in the future. but the caller of this api is expected to provide PNG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think supporting png only is reasonable. It also makes storing the images in memory more efficient, which we will need to do, since Kitty has a limited buffer size and images can be evicted.

What makes the evection even worse, is that on tmux with pass through, we don't have a way to know when that happens or how much quota is used. So, I think the only safe way to do it, is to re-upload the image to kitty each time a new image is brought to view.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fredizzimo in snacks they use a least recently used queue to know which images to re-upload. I was going to implement that, but held off when discussion started back.

Or we transmit again every time, but that definitely has noticeable performance considerations from what I saw with iterm2 and sixel.

LRU out of scope for this PR? Retransmitting out of scope for this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need to retransmit when a new image appears, not for every movement, but yes it could still be annoying when scrolling up and down.

LRU, with some size limit could work. The only thing we really have to be careful with is when the image is displayed in tmux or similar, in the other cases, neovim has full control. And the same thing applies to unique id collisions, not much we can do about that in a tmux context.

Since this does not affect the interface, I would vote for out of scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LRU we have vim.ringbuf. But yes, let's save performance improvements for a followup, not here.

Comment on lines 550 to 554
---Creates a placement of this image that is not yet visible.
---@param opts? {provider?:string}
---@return vim.ui.img.Placement
function M:new_placement(opts)
return require('vim.ui.img.placement').new(self, opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems overwrought to represent "placements" as objects which need to be created. I would expect something like M:set_pos.

Positions are called "pos". Avoid a new "placement" concept.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the naming is wrong on this - was following kitty's naming - but the concept is that the image is the file path and (optionally) data of the image itself and a placement is the reference to the image managed by the provider in the terminal (or UI) of which there may be multiple. Maybe we just don't make this a public method or we change the name, but M:set_pos makes me think that I'm moving the vim.ui.Image instead of one of the displayed versions on the terminal.

neovim-img-placement-example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just don't make this a public method or we change the name,

If it can be private for now, let's do that.

but M:set_pos makes me think that I'm moving the vim.ui.Image instead of one of the displayed versions on the terminal.

Is the intention to avoid creating a new image for each placement? Can we instead just say "a new placement requires new image". So an image object has one position, and cannot be placed in multiple places. That might be "inefficient", but so what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use cases for showing the same image multiple times, is probably quite limited. I can only really think of demo type effects, games and repeated UI elements. So, I think limiting the interface to only support images would be fine.

Furthermore, I'm pretty sure this can be optimized on the provider side. It can calculate the hash of the image data and properties and internally share it and keep track of the ids.

Copy link
Author

@chipsenkbeil chipsenkbeil May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use cases for showing the same image multiple times, is probably quite limited. I can only really think of demo type effects, games and repeated UI elements. So, I think limiting the interface to only support images would be fine.

Totally fair! I don't have a use case for same image repeated, and should be able to merge placement and image without much trouble.

Furthermore, I'm pretty sure this can be optimized on the provider side. It can calculate the hash of the image data and properties and internally share it and keep track of the ids.

Yep! I already do that within the sixel and iterm2 providers, and it works well. So agreed the optimization can be on the provider side. 😄

Will make the change along with the other requests.

@neovim neovim unlocked this conversation May 21, 2025
@chipsenkbeil
Copy link
Author

chipsenkbeil commented May 22, 2025

Thanks @justinmk and @fredizzimo for locking down discussion and changes. I'll get to these soon.

So it's clear what I'll do:

  1. Remove ImageMagick
  2. Remove iterm2 and sixel providers
  3. Rename filename param to file and bytes param to data within Image
  4. Remove crop option when showing an image
  5. Remove Region as it was only needed for cropping
  6. Remove Size and Position and flatten those parameters in the Opts table as we no longer need to carry around unit type
  7. Merge Image and Placement
  8. Remove supported check from providers as they don't function in neovim (can't capture CSI or kitty responses)
  9. Remove screen size code (was temporary)
  10. Move promise to _promise internally
  11. Update PNG function to return boolean or nil
  12. Wrap img import within ui.lua via the defer @justinmk mentioned
  13. Remove imgprg and imgprovider and hard code kitty.
  14. Keep the printio-test bin per @justinmk feedback

What I won't do unless you tell me you want it now:

  1. Convert provider to be synchronous
  2. Remove promise code in favor of libuv/luv style (ironically, what I started with)
  3. Change to use the nvim_* discussed by @fredizzimo
  4. Add LRU to kitty provider to track which images to retransmit
  5. Update kitty provider to transmit every time you show the image

Anything I missed? Typing from my phone, so going from memory.

@justinmk
Copy link
Member

Remove Size and Position and flatten those parameters in the Opts table as we no longer need to carry around unit type

Yes. It can be done in a way that allows us to re-add those in the future, right? If I'm wrong, can you show a small example of the function signature before/after.

What I won't do unless you tell me you want it now:

Convert provider to be synchronous
Remove promise code in favor of libuv/luv style (ironically, what I started with)

Not required here, but hopefully in a followup. Async should ideally be opt-in via an optional callback param.

@lewis6991
Copy link
Member

Remove promise code in favor of libuv/luv style (ironically, what I started with)

I would prefer this. I'm working on adding vim.async and porting luv style will be easier.

@fredizzimo
Copy link
Contributor

We still need to discuss the exact API semantics, especially regarding the size parameter. That was the main reason I opened my API proposal, to have a specification. The removal of the absolute position support wasn't even included in the initial version.

As far as I can see the exact behaviour is unspecified here, but from based on the actual implementation and interface we have the following. Size is either optional or present, and if present both width and height are required. And when present, it's passed directly along to Kitty.

This means.

  1. Not present - The image is displayed with the size of the source image
  2. Present - The image is stretched/shrunk so that both the width and height fit the columns and rows exactly
  3. There's no way to control how big the image, while at the same time keeping the aspect ratio.

This seemed OK to me when I though Kitty worked differently, and we would have more control being able to specify pixel sizes, preserving the aspect ratio. I was counting on the helper functions, which were unspecified to take care of the details. But that would still only have worked for a single TUI/GUI case.

So, this is what I specified instead here #34093
For buffer/inline positions both dimensions are always required, otherwise only one. Furthermore, there's a keep_aspect field that defines the behaviour when both dimensions are specified, with the idea that one of the dimensions is always filled. I thought that this was doable without knowing the cell aspect ratio, but it's actually not so I'm revising my suggestion:

Regardless of the placement mode, you can specify non, one or both of the dimensions. If both are specified, the keep_aspect ratio field determines if both dimensions should be used fully or only the one with the best fit. If none are specified, the number of columns and rows is automatically determined.

Bonus: minimum maximum support like snacks.

As mentioned, we will need to know the cell aspect ratio and/or size according to the following:

  1. Non specified and absolute positioning - No need unless we want to be able to query how many cells the image actually is
  2. Non specified and inline positioning - We need the exact cell size so that we can allocate room for the image in the buffer
  3. One specified and absolute positioning - No need, kitty supports scaling accordingly unless we want to query
  4. One specified and inline positioning - We only need the aspect ratio
  5. Both specified - We need the aspect ratio to determine which size to use
  6. Bonus - We need aspect ratio

Note that the cell aspect ratio can be virtual, for example 1:2, and the images would still be displayed with the correct proportions. It's just that the reserved area might be bigger or smaller than intended. So, in that case no terminal query support is needed, and no decision needs to be taken about which terminal is the main one. This approach also works well when multiple different UIs are connected.

Therefore, I suggest that we always require at least one dimension for now in this initial version, because terminal query support and designating a primary UI, which need a lot of design thought by itself, is still not implemented.

The z-index behaviour also needs specification

  1. Do we copy the Kitty one?

Finally, you can specify the image z-index, i.e. the vertical stacking order. Images placed in the same location with different z-index values will be blended if they are semi-transparent. You can specify z-index values using the z key. Negative z-index values mean that the images will be drawn under the text. This allows rendering of text on top of images. Negative z-index values below INT32_MIN/2 (-1,073,741,824) will be drawn under cells with non-default background colors. If two images with the same z-index overlap then the image with the lower id is considered to have the lower z-index. If the images have the same z-index and the same id, then the behavior is undefined.

That is the only thing we can implement with the Kitty protocol I think. But it's hard to do in Neovide, since it fights against the natural order of floating windows in Neovim
2. Do we allow only positive indices? Meaning that the images will always be on top of all other Neovim elements. The z-index has no relation to the window z-index at all. At least this interpretation is much more doable
3. Or do we leave it undefined? I don't like this at all, even if Neovide itself would benefit from an interpretation where the z-index follows the window z-index, or even more fine grained. But the compatibility issues are worrying me.

I suggest that we implement 2.

@fredizzimo
Copy link
Contributor

I also suggest that we battle test this against snacks.nvim. It has an absolute positioning mode to support wezterm.

@fredizzimo
Copy link
Contributor

Remove promise code in favor of libuv/luv style (ironically, what I started with)

I would prefer this. I'm working on adding vim.async and porting luv style will be easier.

Couldn't we avoid async support altogether by externalizing the file loading, and pass a data buffer containing the PNG file data to the API? I see zero reasons why anything in here should ever block or require calculation.


-- Capture old cursor position, hide the cursor, and move to the
-- position where the image should be displayed
local pos = opts:position():to_cells()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the position is not meant to be dynamically updated when the mouse or window moves, I think this calculation should be moved out of the provider. And the provider gets the precalculated position.

I had previously assumed that it was up to the provider to dynamically watch and update this, partially because of the mouse movement demo. But looking at the code, that does not appear to be the case.

If could be interesting though if it is updated dynamically by the provider, for example Neovide could move the image pixel by pixel with the mouse, and the image would follow smoothly with an animated window.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... Maybe it has to be on the provider side after all, in the case of multigrid with custom window placement, only the provider will know the location it should be displayed.

In any case, it needs to be specified if it's supposed to dynamically update or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets even more tricky than that. Since the provider processes this asynchronously, the cursor or mouse might have moved since the request, which results in another image location than intended. So, I think it has to be either continuously on the provider side, or immediately when the request is done outside the provider.

That means that multigrid UIs that make their own placement breaks, but that's the case with a lot of things, so I don't think it's a huge problem. These kinds of UIs probably need their own specialized plugins anyway. Also with inline placement, none of this is a problem, other than that the image will never follow the mouse or cursor automatically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is batched together and not sent immediately. That's what the writer does when it calls flush(). So it aggregates:

  1. Saving the cursor
  2. Hiding the cursor
  3. Moving to a new location
  4. Sending the display request
  5. Restoring the cursor position
  6. Showing the cursor

All of that is sent at once, so the jump of the cursor and return to its original position (wherever that may be) is really quick.

Copy link
Author

@chipsenkbeil chipsenkbeil May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the position is not meant to be dynamically updated when the mouse or window moves, I think this calculation should be moved out of the provider. And the provider gets the precalculated position.

This is a static location sent to the provider. The call to position() just figures out the row and column based on the relative field. So, not a dynamic moving position. Every time you call position() it will return the same value unless you change the opts.

It's a helper function attached to the opts so each provider doesn't need to implement that calculation itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about this sequence and the positions specified by the placement

  1. The user clicks somewhere
  2. It takes x amount of time until the request is processed by the provider, which might be on a remote machine. With UI updates being batched.
  3. At this point the Neovim cursor or mouse pointer might have moved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Yeah, to avoid that, we'd need to calculate it on the spot before sending to the provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to fix this, would be to remove the support for opts.Relative, and only support editor for now, I'm not sure how much convenance functionality the protocol should provide, those are all easy to do on the caller side.

And as I mentioned, another option is to make it dynamic, meaning it updates in real-time when the cursor or mouse is moved. In that case it would also do something that can't be done otherwise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fredizzimo I just submitted a bunch of cleanup, and a distinct commit to try to solve this. I just updated the data structure that is fed to the provider to just have a row, col, optional width/height, and optional z.

The user-provided opts get mapped to the internal opts used by the provider. This gets calculated at the point in time when the user wants to show/update an image, and the provider can assume it always has the precalculated opts. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this part looks good now. We can possibly add real-time update support with an option in the future if needed.

@justinmk
Copy link
Member

justinmk commented May 23, 2025

We still need to discuss the exact API semantics, especially regarding the size parameter.

The details that you pointed out are useful to track in a (concise) issue. If we find things that need to be changed, we can quickly adjust.

The point is to have something that works on master branch NOW instead of endlessly debating it and not having real-world feedback from master branch users.

I suggest that we always require at least one dimension for now in this initial version,

ok.

Do we allow only positive indices? Meaning that the images will always be on top of all other Neovim elements. The z-index has no relation to the window z-index at all.

sure. start with the most restricive thing. constraints can always be relaxed later, but the opposite is much harder.


---Calculates and returns the position dictated by `relative` and `pos`.
---@return vim.ui.img.utils.Position
function M:position()
Copy link
Contributor

@fredizzimo fredizzimo May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexing of this function is inconsistent and unclear. Personally, I think that all of them should be 0-indexed, since we are specifying offsets.

Currently, you need to specify 1-indexed positions, even for the ones that are commented as 0-indexed, since the terminal cursor position is 1-indexed

Well, technically you can specify both 0 and 1 to mean the edge of the editor, but to leave one space you need to pass 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note regarding this. Snacks win could use relative = win, but in practice has to use relative = editor, since it needs to take the border into account.

And now that we have vim.ui.img.InternalOpts for the provider, the indexing need to be specified for that as well. Right now, in my implementation, I'm assuming a 1-based index, since the Kitty provider use it directly.

@fredizzimo
Copy link
Contributor

I can't find a function that deletes an image to free up memory, I can only see hide for placements.

Even if we combine images and placements, I think there needs to be a function to free up the memory. I also think that hiding an image without deleting the data should still be possible.

@chipsenkbeil
Copy link
Author

@justinmk @fredizzimo @gpanders @clason I've updated the PR to follow through with the items I listed to make it just kitty and clean up/remove a lot of the code we talked about. I hope this makes it easier to review going forward!

@fredizzimo I haven't updated docs or added checks to assert that one or both width & height are specified yet.

  1. Do we allow only positive indices? Meaning that the images will always be on top of all other Neovim elements. The z-index has no relation to the window z-index at all. At least this interpretation is much more doable

When I implemented sixel and iterm2, this is essentially what I did because it would be near impossible to get them to behave the same way as kitty when it comes to negative indices. I didn't stop anyone from creating them with negative indices, but it would just behave the same way as positive.

I can't find a function that deletes an image to free up memory, I can only see hide for placements.

Even if we combine images and placements, I think there needs to be a function to free up the memory. I also think that hiding an image without deleting the data should still be possible.

Are you referring to the data loaded into the image class? If so, it's a public field, why can't the user just do img.data = nil to remove it if they want to keep the image instance itself around?

@fredizzimo
Copy link
Contributor

I can't find a function that deletes an image to free up memory, I can only see hide for placements.
Even if we combine images and placements, I think there needs to be a function to free up the memory. I also think that hiding an image without deleting the data should still be possible.

Are you referring to the data loaded into the image class? If so, it's a public field, why can't the user just do img.data = nil to remove it if they want to keep the image instance itself around?

I'm thinking more about the provider side, without deletion on the user side we are forcing all providers to implement some kind of eviction system. Even with the automatic eviction on the Kitty side, it's not a good idea to keep the buffer full, since other applications that shows images will suffer.

But maybe the planned fifo system could do it two ways, to also send deletion requests and keep a fixed size allocation on the provider side at all times? That has the advantage of the user not having to do any management themselves.

I'm not sure if it's possible to use the lua __gc hooks with luajit, but even if it is, it's probably not too well suitable, due to its non-deterministic behaviour.

If it's going to be done on the FIFO side, no actions are required for this PR.

@fredizzimo
Copy link
Contributor

fredizzimo commented May 24, 2025

I also suggest that we battle test this against snacks.nvim. It has an absolute positioning mode to support wezterm.

I made a working implementation of that yesterday, but I need to redo it with the new API changes, probably done later today. After that I will share it. It works both with the kitty protocol and a custom branch of Neovide.

It works, but is not really practical due to the zindex limitations. As predicted, images meant to be displayed in the main buffer, are always displayed on top of floating windows. Maybe some clever tricks with dynamically hiding and showing images could be done, but my focus will be on inline image support.

Edit: Snacks already does the dynamic hiding and showing part, I just didn't call the API hide properly

Problem: many terminals support some form of image protocol to display
         graphics; however, nvim provides no integration with any
         protocol available today. This has resulted in multiple
         plugins (e.g. image.nvim, snacks.nvim) providing solutions
         with even more plugins adopting (e.g. orgmode.nvim, neorg).

Solution: provide a rudimentary API that integrates with common
          graphics protocols including kitty, sixel, and iterm2. This
          API should be extensible enough to cover all three protocols
          while also being easy to adapt for external GUIs like
          neovide. In addition, the API should be developed with
          performance and efficiency in mind, meaning taking advantage
          of asynchronous design and caching where possible.

Overview: the following outlines the code points of interest that
          make up the image API implementation.

          1. `vim.ui.img` serves as the entrypoint into the API, where
             images can be loaded and manipulated. Manipulation of
             images includes converting between formats, resizing, and
             cropping of images leveraging the `magick` binary, which
             is defined through the new option `imgprg`.
          2. `vim.ui.img.opts` serves to extract the specifications of
             how and where an image should be displayed within a UI.
          3. `vim.ui.img.placement` houses the logic for individual
             instances of an image displayed within a UI. It manages
             scheduling and batching of requests to show, hide, and
             update on screen.
          4. `vim.ui.img.providers` both serves as the location where
             new providers are created and also where they are stored.
             Each of the builtin providers - kitty, iterm2, and sixel -
             can be found within `vim.ui.img.providers.*`.
          5. `vim.ui.img.utils` contains a mixture of data structures
             and functions core to the needs of the image API.
             a. `vim.ui.img.utils.position` is an abstraction of an
                x & y coordinate (aka column & row) that also contains
                a unit definition - pixel or cell - in order to support
                converting between the two coordinate systems.
             b. `vim.ui.img.utils.promise` provides a simplistic data
                structure to collect a value or an error at a later
                date. It is heavily used throughout the image API
                internals and also is seen publicly as a return value
                in top-level APIs like `vim.ui.img.load()`.
             c. `vim.ui.img.utils.region` is an abstraction of a
                rectangular dimension of x, y, width, and height. It
                also contains a unit definition - pixel or cell - in
                order to support converting between the two coordinate
                systems.
             d. `vim.ui.img.utils.screen` exposes the ability to
                determine the pixel width and height of the editor,
                which is critical towards converting between pixel and
                cell units.
             e. `vim.ui.img.utils.size` is an abstraction of a width
                and height. It also contains a unit definition - pixel
                or cell - in order to support converting between the
                two coordinate systems.
1. Remove ImageMagick
2. Remove iterm2 and sixel providers
3. Rename filename param to file and bytes param to data within Image
4. Remove crop option when showing an image
5. Remove Region as it was only needed for cropping
6. Remove Size and Position and flatten those parameters in the Opts
   table as we no longer need to carry around unit type
7. Merge Image and Placement
8. Remove supported check from providers as they don't function in
   neovim (can't capture CSI or kitty responses)
9. Remove screen size code (was temporary)
10. Move promise to _promise internally
11. Update PNG function to return boolean or nil
12. Wrap img import within ui.lua via the defer @justinmk mentioned
13. Remove imgprg and imgprovider and hard code kitty.
14. Keep the printio-test bin per @justinmk feedback
---end)
---```
---@return vim.ui.img._Promise<true>
function M:hide()
Copy link
Contributor

@fredizzimo fredizzimo May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function could internally check if the image is visible or not, so that hide can be called multiple times, it makes the API more pleasant to use.

---```
---@param opts? vim.ui.img.Opts
---@return vim.ui.img._Promise<true>
function M:update(opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update and show work almost the opposite way of what I would expect.

On one hand, show is used for initially showing an image, but on the other hand it can also be thought of a pair to hide. And in that case, I would like to be able to just re-use the existing options without specifying anything.

I think we could use only one function, but support three modes

  1. nil opts - keep the existing configuration
  2. partial opts - merge the configuration
  3. full opts - replace the configuration

The difference between 2. and 3. could for example be expressed with a special field update=true to select mode 3, otherwise mode 2. is used.

@fredizzimo
Copy link
Contributor

fredizzimo commented May 24, 2025

There does not seem to be a way to override the provider in the latest version, so I'm unable to test it against Neovide. I also think that the image itself shouldn't store any references to the provider as ideally the same image should display on all connected UIs simultaneously.

But I do have my hacks to snacks.nvim here https://github.com/fredizzimo/snacks.nvim/tree/nvim-api, which allows it to work with this API in terminals with Kitty protocol support.

The images are stretched, since the automatic aspect ratio calculation when both with and height is specified described here #31399 (comment) isn't implemented. And that can't be done on the snacks side, since the decisions of using width or height depends on the terminal aspect ratio.

The lack of querying basic image information like width and height is missing, so that's done on the snacks side for now.

All the API usage there is synchronous with wait, as that's still how I think that's how it should work. The provider could deal with all the asynchronous support internally. At most the loading could be asynchronous, since that needs to load the header to be able to provide the image information.

Edit: Only the picker and raw buffer display works, the doc for markdown documents needs some fixes.
Edit2: The doc display works now. but the image is displayed in the wrong position. I don't have time to check if what's the cause is right now, but likely somewhere in my snacks changes. Update: That bug exists in the main version of snacks as well, if a hover window is moved, then the image does not follow. It's just worse in my version since the windows are always big due to lack of terminal query support.

@fredizzimo
Copy link
Contributor

I'm sorry, but I think that my previous suggestion of allowing either the width or the height to be unspecified wasn't a wise one.

Allowing that is still not enough for many practical use cases, as I found out when changing snacks.nvim. Many will need access to the actual cell geometry to determine how big the images should be displayed.

Therefore, I suggest that the size decision as a whole is externalized to the plugin, using whatever comes out of #32408. So, for now I suggest that both the width and height are required.

I'm afraid that specifying what aspect ratio is used for automatically calculating the unspecified dimension would be a bit premature, when we don't fully understand the requirements yet. It's easier to add support for that later, than to have to modify the specification and maybe break plugins that were made against the previous spec. I'm mostly worried about how multiple simultaneously connected UIs will affect it.

If #32408, is kept general so that plugins can query the cell information of all connected UIs, then the plugins have all the information needed to experiment and come up with the best solution, that can be adopted in the core later.

This specification should not affect this PR very much, other than specifying that both dimensions are required. There's no query support, and the whole existing implementation relies on the Kitty protocol doing its magic. So other than that, everything works as before.

I would still rather see that keeping the aspect ratio is the default when both dimensions are specified, than stretching/shrinking like currently implemented. But that requires determining the cell geometry on the provider side, which is an easier problem, since hopefully there will be separate providers for every UI connected. For now, that could be documented as an implementation bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:discussion issue needs attention from an expert, or PR proposes significant changes to architecture or API
Projects
None yet
Development

Successfully merging this pull request may close these issues.