-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
feat(img): implement rudimentary image API #31399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
9964ad6
to
630f852
Compare
vim.img.protocol()
to detect preferred graphics protocol
vim.img.protocol()
to detect preferred graphics protocolfe0d5a8
to
33ee581
Compare
babd349
to
dc51bf3
Compare
This comment was marked as outdated.
This comment was marked as outdated.
On Sat, Nov 30, 2024 at 01:29:25PM -0800, Chip Senkbeil wrote:
@justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data.
@kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using `f=100` would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.
Yes, correct. You can use either imagemagick or the statically compiled
kitten binary that comes as part of kitty to do this.
I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.
sixel supports nothing, you need to convert every image format to the
sixel format and transmit that.
|
dc51bf3
to
ce818a9
Compare
runtime/lua/vim/ui/img.lua
Outdated
vim.validate('opts', opts, { 'string', 'table' }) | ||
if type(opts) == 'table' then | ||
vim.validate('opts.bytes', opts.bytes, 'string', true) | ||
vim.validate('opts.filename', opts.filename, 'string') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"file" is common in vim.fs
. Eventually we will have URIs, and "file" is more compatible with that.
vim.validate('opts.filename', opts.filename, 'string') | |
vim.validate('opts.file', opts.file, 'string') |
runtime/lua/vim/ui/img.lua
Outdated
---Will throw an error if unable to load the bytes of the file. | ||
---Works without ImageMagick. | ||
---@return boolean | ||
function M:is_png() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange for a boolean getter to throw. can this be format():'png'|nil
instead, where nil
means "unknown"? or maybe format():'png'|'error'|nil
or perhaps is_png():boolean|nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, can do that instead. Do you expect to support other formats other than PNG down the line? If so, one of the format
functions makes sense. If not, the is_png
makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's assume PNG.
runtime/lua/vim/ui/img.lua
Outdated
---Returns information about the image by invoking ImageMagick's identify. | ||
---This will be some uppercase string like 'PNG'. | ||
---@param opts? vim.ui.img.IdentifyOpts | ||
---@return vim.ui.img.utils.Promise<vim.ui.img.Metadata> | ||
function M:identify(opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is definitely out of scope for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reason it existed was to easily find any image format's dimensions (width & height). If we just stick with PNG, I know how to read that from the file header. So, no problem.
runtime/lua/vim/ui/img.lua
Outdated
---Converts an image using ImageMagick, returning the bytes of the new image. | ||
--- | ||
---If `background` is specified, will convert alpha pixels to the background color (e.g. #ABCDEF). | ||
---If `crop` is specified, will crop the image to the specified pixel dimensions. | ||
---If `format` is specified, will convert to the image format, defaulting to png. | ||
---If `out` is specified, will treat as output file and write to it, and promise returns empty str. | ||
---If `size` is specified, will resize the image to the desired size. | ||
---@param opts? vim.ui.img.ConvertOpts | ||
---@return vim.ui.img.utils.Promise<string> | ||
function M:convert(opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem. If we did want sixel
or need cropping for iterm2
(if we brought it in), then we would need this. Or we'd have to provide or write a PNG decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiring PNG seems reasonable. converting to PNG is a different topic, which could be covered in some other api in the future. but the caller of this api is expected to provide PNG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think supporting png only is reasonable. It also makes storing the images in memory more efficient, which we will need to do, since Kitty has a limited buffer size and images can be evicted.
What makes the evection even worse, is that on tmux with pass through, we don't have a way to know when that happens or how much quota is used. So, I think the only safe way to do it, is to re-upload the image to kitty each time a new image is brought to view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fredizzimo in snacks they use a least recently used queue to know which images to re-upload. I was going to implement that, but held off when discussion started back.
Or we transmit again every time, but that definitely has noticeable performance considerations from what I saw with iterm2 and sixel.
LRU out of scope for this PR? Retransmitting out of scope for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need to retransmit when a new image appears, not for every movement, but yes it could still be annoying when scrolling up and down.
LRU, with some size limit could work. The only thing we really have to be careful with is when the image is displayed in tmux or similar, in the other cases, neovim has full control. And the same thing applies to unique id collisions, not much we can do about that in a tmux context.
Since this does not affect the interface, I would vote for out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LRU we have vim.ringbuf
. But yes, let's save performance improvements for a followup, not here.
runtime/lua/vim/ui/img.lua
Outdated
---Creates a placement of this image that is not yet visible. | ||
---@param opts? {provider?:string} | ||
---@return vim.ui.img.Placement | ||
function M:new_placement(opts) | ||
return require('vim.ui.img.placement').new(self, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems overwrought to represent "placements" as objects which need to be created. I would expect something like M:set_pos
.
Positions are called "pos". Avoid a new "placement" concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the naming is wrong on this - was following kitty's naming - but the concept is that the image is the file path and (optionally) data of the image itself and a placement is the reference to the image managed by the provider in the terminal (or UI) of which there may be multiple. Maybe we just don't make this a public method or we change the name, but M:set_pos
makes me think that I'm moving the vim.ui.Image
instead of one of the displayed versions on the terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we just don't make this a public method or we change the name,
If it can be private for now, let's do that.
but
M:set_pos
makes me think that I'm moving thevim.ui.Image
instead of one of the displayed versions on the terminal.
Is the intention to avoid creating a new image for each placement? Can we instead just say "a new placement requires new image". So an image object has one position, and cannot be placed in multiple places. That might be "inefficient", but so what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use cases for showing the same image multiple times, is probably quite limited. I can only really think of demo type effects, games and repeated UI elements. So, I think limiting the interface to only support images would be fine.
Furthermore, I'm pretty sure this can be optimized on the provider side. It can calculate the hash of the image data and properties and internally share it and keep track of the ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use cases for showing the same image multiple times, is probably quite limited. I can only really think of demo type effects, games and repeated UI elements. So, I think limiting the interface to only support images would be fine.
Totally fair! I don't have a use case for same image repeated, and should be able to merge placement and image without much trouble.
Furthermore, I'm pretty sure this can be optimized on the provider side. It can calculate the hash of the image data and properties and internally share it and keep track of the ids.
Yep! I already do that within the sixel and iterm2 providers, and it works well. So agreed the optimization can be on the provider side. 😄
Will make the change along with the other requests.
Thanks @justinmk and @fredizzimo for locking down discussion and changes. I'll get to these soon. So it's clear what I'll do:
What I won't do unless you tell me you want it now:
Anything I missed? Typing from my phone, so going from memory. |
Yes. It can be done in a way that allows us to re-add those in the future, right? If I'm wrong, can you show a small example of the function signature before/after.
Not required here, but hopefully in a followup. Async should ideally be opt-in via an optional callback param. |
I would prefer this. I'm working on adding |
We still need to discuss the exact API semantics, especially regarding the size parameter. That was the main reason I opened my API proposal, to have a specification. The removal of the absolute position support wasn't even included in the initial version. As far as I can see the exact behaviour is unspecified here, but from based on the actual implementation and interface we have the following. Size is either optional or present, and if present both width and height are required. And when present, it's passed directly along to Kitty. This means.
This seemed OK to me when I though Kitty worked differently, and we would have more control being able to specify pixel sizes, preserving the aspect ratio. I was counting on the helper functions, which were unspecified to take care of the details. But that would still only have worked for a single TUI/GUI case. So, this is what I specified instead here #34093 Regardless of the placement mode, you can specify non, one or both of the dimensions. If both are specified, the Bonus: minimum maximum support like snacks. As mentioned, we will need to know the cell aspect ratio and/or size according to the following:
Note that the cell aspect ratio can be virtual, for example 1:2, and the images would still be displayed with the correct proportions. It's just that the reserved area might be bigger or smaller than intended. So, in that case no terminal query support is needed, and no decision needs to be taken about which terminal is the main one. This approach also works well when multiple different UIs are connected. Therefore, I suggest that we always require at least one dimension for now in this initial version, because terminal query support and designating a primary UI, which need a lot of design thought by itself, is still not implemented. The z-index behaviour also needs specification
That is the only thing we can implement with the Kitty protocol I think. But it's hard to do in Neovide, since it fights against the natural order of floating windows in Neovim I suggest that we implement 2. |
I also suggest that we battle test this against snacks.nvim. It has an absolute positioning mode to support wezterm. |
Couldn't we avoid async support altogether by externalizing the file loading, and pass a data buffer containing the PNG file data to the API? I see zero reasons why anything in here should ever block or require calculation. |
|
||
-- Capture old cursor position, hide the cursor, and move to the | ||
-- position where the image should be displayed | ||
local pos = opts:position():to_cells() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the position is not meant to be dynamically updated when the mouse or window moves, I think this calculation should be moved out of the provider. And the provider gets the precalculated position.
I had previously assumed that it was up to the provider to dynamically watch and update this, partially because of the mouse movement demo. But looking at the code, that does not appear to be the case.
If could be interesting though if it is updated dynamically by the provider, for example Neovide could move the image pixel by pixel with the mouse, and the image would follow smoothly with an animated window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... Maybe it has to be on the provider side after all, in the case of multigrid with custom window placement, only the provider will know the location it should be displayed.
In any case, it needs to be specified if it's supposed to dynamically update or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets even more tricky than that. Since the provider processes this asynchronously, the cursor or mouse might have moved since the request, which results in another image location than intended. So, I think it has to be either continuously on the provider side, or immediately when the request is done outside the provider.
That means that multigrid UIs that make their own placement breaks, but that's the case with a lot of things, so I don't think it's a huge problem. These kinds of UIs probably need their own specialized plugins anyway. Also with inline placement, none of this is a problem, other than that the image will never follow the mouse or cursor automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is batched together and not sent immediately. That's what the writer does when it calls flush()
. So it aggregates:
- Saving the cursor
- Hiding the cursor
- Moving to a new location
- Sending the display request
- Restoring the cursor position
- Showing the cursor
All of that is sent at once, so the jump of the cursor and return to its original position (wherever that may be) is really quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the position is not meant to be dynamically updated when the mouse or window moves, I think this calculation should be moved out of the provider. And the provider gets the precalculated position.
This is a static location sent to the provider. The call to position()
just figures out the row and column based on the relative
field. So, not a dynamic moving position. Every time you call position()
it will return the same value unless you change the opts.
It's a helper function attached to the opts so each provider doesn't need to implement that calculation itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about this sequence and the positions specified by the placement
- The user clicks somewhere
- It takes x amount of time until the request is processed by the provider, which might be on a remote machine. With UI updates being batched.
- At this point the Neovim cursor or mouse pointer might have moved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Yeah, to avoid that, we'd need to calculate it on the spot before sending to the provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to fix this, would be to remove the support for opts.Relative
, and only support editor
for now, I'm not sure how much convenance functionality the protocol should provide, those are all easy to do on the caller side.
And as I mentioned, another option is to make it dynamic, meaning it updates in real-time when the cursor or mouse is moved. In that case it would also do something that can't be done otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fredizzimo I just submitted a bunch of cleanup, and a distinct commit to try to solve this. I just updated the data structure that is fed to the provider to just have a row, col, optional width/height, and optional z.
The user-provided opts get mapped to the internal opts used by the provider. This gets calculated at the point in time when the user wants to show/update an image, and the provider can assume it always has the precalculated opts. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this part looks good now. We can possibly add real-time update support with an option in the future if needed.
The details that you pointed out are useful to track in a (concise) issue. If we find things that need to be changed, we can quickly adjust. The point is to have something that works on master branch NOW instead of endlessly debating it and not having real-world feedback from master branch users.
ok.
sure. start with the most restricive thing. constraints can always be relaxed later, but the opposite is much harder. |
runtime/lua/vim/ui/img/opts.lua
Outdated
|
||
---Calculates and returns the position dictated by `relative` and `pos`. | ||
---@return vim.ui.img.utils.Position | ||
function M:position() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indexing of this function is inconsistent and unclear. Personally, I think that all of them should be 0-indexed, since we are specifying offsets.
Currently, you need to specify 1-indexed positions, even for the ones that are commented as 0-indexed, since the terminal cursor position is 1-indexed
Well, technically you can specify both 0 and 1 to mean the edge of the editor, but to leave one space you need to pass 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note regarding this. Snacks win could use relative = win
, but in practice has to use relative = editor
, since it needs to take the border into account.
And now that we have vim.ui.img.InternalOpts
for the provider, the indexing need to be specified for that as well. Right now, in my implementation, I'm assuming a 1-based index, since the Kitty provider use it directly.
I can't find a function that deletes an image to free up memory, I can only see hide for placements. Even if we combine images and placements, I think there needs to be a function to free up the memory. I also think that hiding an image without deleting the data should still be possible. |
@justinmk @fredizzimo @gpanders @clason I've updated the PR to follow through with the items I listed to make it just kitty and clean up/remove a lot of the code we talked about. I hope this makes it easier to review going forward! @fredizzimo I haven't updated docs or added checks to assert that one or both width & height are specified yet.
When I implemented sixel and iterm2, this is essentially what I did because it would be near impossible to get them to behave the same way as kitty when it comes to negative indices. I didn't stop anyone from creating them with negative indices, but it would just behave the same way as positive.
Are you referring to the data loaded into the image class? If so, it's a public field, why can't the user just do |
I'm thinking more about the provider side, without deletion on the user side we are forcing all providers to implement some kind of eviction system. Even with the automatic eviction on the Kitty side, it's not a good idea to keep the buffer full, since other applications that shows images will suffer. But maybe the planned fifo system could do it two ways, to also send deletion requests and keep a fixed size allocation on the provider side at all times? That has the advantage of the user not having to do any management themselves. I'm not sure if it's possible to use the lua If it's going to be done on the FIFO side, no actions are required for this PR. |
I made a working implementation of that yesterday, but I need to redo it with the new API changes, probably done later today. After that I will share it. It works both with the kitty protocol and a custom branch of Neovide. It works, but is not really practical due to the zindex limitations. As predicted, images meant to be displayed in the main buffer, are always displayed on top of floating windows. Maybe some clever tricks with dynamically hiding and showing images could be done, but my focus will be on inline image support. Edit: Snacks already does the dynamic hiding and showing part, I just didn't call the API hide properly |
Problem: many terminals support some form of image protocol to display graphics; however, nvim provides no integration with any protocol available today. This has resulted in multiple plugins (e.g. image.nvim, snacks.nvim) providing solutions with even more plugins adopting (e.g. orgmode.nvim, neorg). Solution: provide a rudimentary API that integrates with common graphics protocols including kitty, sixel, and iterm2. This API should be extensible enough to cover all three protocols while also being easy to adapt for external GUIs like neovide. In addition, the API should be developed with performance and efficiency in mind, meaning taking advantage of asynchronous design and caching where possible. Overview: the following outlines the code points of interest that make up the image API implementation. 1. `vim.ui.img` serves as the entrypoint into the API, where images can be loaded and manipulated. Manipulation of images includes converting between formats, resizing, and cropping of images leveraging the `magick` binary, which is defined through the new option `imgprg`. 2. `vim.ui.img.opts` serves to extract the specifications of how and where an image should be displayed within a UI. 3. `vim.ui.img.placement` houses the logic for individual instances of an image displayed within a UI. It manages scheduling and batching of requests to show, hide, and update on screen. 4. `vim.ui.img.providers` both serves as the location where new providers are created and also where they are stored. Each of the builtin providers - kitty, iterm2, and sixel - can be found within `vim.ui.img.providers.*`. 5. `vim.ui.img.utils` contains a mixture of data structures and functions core to the needs of the image API. a. `vim.ui.img.utils.position` is an abstraction of an x & y coordinate (aka column & row) that also contains a unit definition - pixel or cell - in order to support converting between the two coordinate systems. b. `vim.ui.img.utils.promise` provides a simplistic data structure to collect a value or an error at a later date. It is heavily used throughout the image API internals and also is seen publicly as a return value in top-level APIs like `vim.ui.img.load()`. c. `vim.ui.img.utils.region` is an abstraction of a rectangular dimension of x, y, width, and height. It also contains a unit definition - pixel or cell - in order to support converting between the two coordinate systems. d. `vim.ui.img.utils.screen` exposes the ability to determine the pixel width and height of the editor, which is critical towards converting between pixel and cell units. e. `vim.ui.img.utils.size` is an abstraction of a width and height. It also contains a unit definition - pixel or cell - in order to support converting between the two coordinate systems.
1. Remove ImageMagick 2. Remove iterm2 and sixel providers 3. Rename filename param to file and bytes param to data within Image 4. Remove crop option when showing an image 5. Remove Region as it was only needed for cropping 6. Remove Size and Position and flatten those parameters in the Opts table as we no longer need to carry around unit type 7. Merge Image and Placement 8. Remove supported check from providers as they don't function in neovim (can't capture CSI or kitty responses) 9. Remove screen size code (was temporary) 10. Move promise to _promise internally 11. Update PNG function to return boolean or nil 12. Wrap img import within ui.lua via the defer @justinmk mentioned 13. Remove imgprg and imgprovider and hard code kitty. 14. Keep the printio-test bin per @justinmk feedback
…ge scheduling of update
61e047a
to
a719c67
Compare
---end) | ||
---``` | ||
---@return vim.ui.img._Promise<true> | ||
function M:hide() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function could internally check if the image is visible or not, so that hide can be called multiple times, it makes the API more pleasant to use.
---``` | ||
---@param opts? vim.ui.img.Opts | ||
---@return vim.ui.img._Promise<true> | ||
function M:update(opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update and show work almost the opposite way of what I would expect.
On one hand, show is used for initially showing an image, but on the other hand it can also be thought of a pair to hide. And in that case, I would like to be able to just re-use the existing options without specifying anything.
I think we could use only one function, but support three modes
- nil opts - keep the existing configuration
- partial opts - merge the configuration
- full opts - replace the configuration
The difference between 2. and 3. could for example be expressed with a special field update=true
to select mode 3, otherwise mode 2. is used.
There does not seem to be a way to override the provider in the latest version, so I'm unable to test it against Neovide. I also think that the image itself shouldn't store any references to the provider as ideally the same image should display on all connected UIs simultaneously. But I do have my hacks to snacks.nvim here https://github.com/fredizzimo/snacks.nvim/tree/nvim-api, which allows it to work with this API in terminals with Kitty protocol support. The images are stretched, since the automatic aspect ratio calculation when both with and height is specified described here #31399 (comment) isn't implemented. And that can't be done on the snacks side, since the decisions of using width or height depends on the terminal aspect ratio. The lack of querying basic image information like width and height is missing, so that's done on the snacks side for now. All the API usage there is synchronous with wait, as that's still how I think that's how it should work. The provider could deal with all the asynchronous support internally. At most the loading could be asynchronous, since that needs to load the header to be able to provide the image information. Edit: Only the picker and raw buffer display works, the doc for markdown documents needs some fixes. |
I'm sorry, but I think that my previous suggestion of allowing either the width or the height to be unspecified wasn't a wise one. Allowing that is still not enough for many practical use cases, as I found out when changing snacks.nvim. Many will need access to the actual cell geometry to determine how big the images should be displayed. Therefore, I suggest that the size decision as a whole is externalized to the plugin, using whatever comes out of #32408. So, for now I suggest that both the width and height are required. I'm afraid that specifying what aspect ratio is used for automatically calculating the unspecified dimension would be a bit premature, when we don't fully understand the requirements yet. It's easier to add support for that later, than to have to modify the specification and maybe break plugins that were made against the previous spec. I'm mostly worried about how multiple simultaneously connected UIs will affect it. If #32408, is kept general so that plugins can query the cell information of all connected UIs, then the plugins have all the information needed to experiment and come up with the best solution, that can be adopted in the core later. This specification should not affect this PR very much, other than specifying that both dimensions are required. There's no query support, and the whole existing implementation relies on the Kitty protocol doing its magic. So other than that, everything works as before. I would still rather see that keeping the aspect ratio is the default when both dimensions are specified, than stretching/shrinking like currently implemented. But that requires determining the cell geometry on the provider side, which is an easier problem, since hopefully there will be separate providers for every UI connected. For now, that could be documented as an implementation bug. |
Status
This PR serves as a nice feasibility study but has grown too large to consider merging. So it is temporarily parked while a suitable API is being discussed.
Feature & Implementation Breakdown
The following is both a tracker and detailing of the features supported by this PR, what they look like, and a visual of each implementation (provider) in order to support as many terminals as possible, including on Windows. Monitor this top-level comment for the current API structure and visual examples of the PR in action.
Core API Structure
The API follows a simplistic, object-oriented design similar to
vim.system()
. You specify an image, optionally eagerly loading its data, and then display it within neovim. The information about where and how it is displayed is provided through a single table.Each image, once shown, returns a placement, which represents the specific instance of the image on screen. This is to support the separation of an image and its data from the details and management of displaying it.
The image API makes use of an internal
Promise<T>
class in order to support both synchronous and asynchronous operation, meaning that images can be loaded, shown, hidden, and updated asynchronously in order to be efficient, but also can manually be waited upon at any stage.Image Manipulation & Inspection
Leveraging ImageMagick for the functionality, this section describes features needed to manipulate and inspect images. This is particularly needed for sixel, which would normally require advanced image decoding to convert to an RGB format and then packaged in sixel's format. ImageMagick supports features like cropping and resizing, which we expose through the
convert
method.Additionally, provides a retrieval API called
identify
(modeled aftermagick identify
) to inspect images without needing advanced header parsing per image format in order to learn details like the true image format (not just reading the file extension) and pixel dimensions of the image.Kitty Provider
The 1st protocol supported is kitty graphics protocol. Aside from the obvious kitty, other graphics-accelerated terminals like Ghostty make use of this protocol to display images. This protocol only requires
magick
when attempting to load an image other than PNG, which will be converted to PNG and placed in~/.cache/nvim/imgs/{sha256(filename)}.png
.Integration of kitty as a provider
neovim-img-kitty.mp4
Sixel Provider
The 2nd protocol supported is sixel image protocol. Windows Terminal, in particular, makes use of this protocol to display images. This protocol requires
magick
in order to convert images into the sixel format, as well as all operations like resizing and cropping.Integration of sixel as a provider
neovim-img-sixel-windows.mp4
iterm2 Provider
The 3rd protocol supported is iterm2 image protocol. WezTerm provides a GPU-accelerated implementation of this protocol. This protocol only requires
magick
in order to crop images. Resizing images does not requiremagick
, and iterm2 supports multiple image formats.Integration of iterm2 as a provider
neovim-img-iterm2.mp4
3rd Party Providers
To enable GUIs and other image protocols to easily slide into neovim's API abstraction, a singular API is exposed through
vim.ui.img.providers.new
, which implementers will use to specifyshow
andhide
functions at a minimum. Each provider supports operations whenload
ing,unload
ing, checkingsupported
to see if the provider can operate in the current environment, and (if possible) specifying a more efficientupdate
method.The methods
supported
,show
,hide
, andupdate
take a callback to support asynchronous operation, and must use it to communicate when the operation is finished. The methodsload
andunload
are considered synchronous and should be fast to operate.Maybe in scope for this PR
Out of Scope for this PR