feat(img): implement image API for absolute positions #31399

chipsenkbeil · 2024-11-29T22:01:21Z

Alright, let's try this again without the massive amount of pull requests. 😄 Each commit here should be a standalone change, and I'll document the processes here.

This is geared towards tackling #30889, specifically on supporting

Ability to load an image into memory
Display an image with absolute coordinates
Support different backends for rendering images such as iterm2 and kitty
Smartly detect the type(s) of backend graphics supported

Things for later PRs would include

Inline image support (attach to a buffer, reflow text around it)
Alternative image formats (I think PNG is what is supported right now?)
Video feeds (more complex, more limited backend support)

Breakdown of commits

1. Loading an image into memory

Implements vim.img.load() to load from a file or wrap base64 encoded bytes as a vim.img.Image instance.

2. Implement skeleton of vim.img.show() without backends

Implements the skeleton of vim.img.show() with any backend implemented.

3. Implement vim.img._terminal to support basic functionality needed for backends

Implements a vim.img._terminal module that supports writing to the tty tied to neovim as well as basic operations to manipulate the cursor, needed for backend implementations.

4. Implement `vim.img.Image` method `for_each_chunk` to streamline backend processing

Implements a method image:for_each_chunk for instances of vim.img.Image. This method streamlines chunked iteration of image bytes, which is important when working with ssh or tmux and a protocol that supports chunked image rendering such as iterm2 or kitty.

5. Implement iterm2 backend

Implements the iterm2 backend, supporting both iTerm 3.5+ support for multipart images, and falling back to older protocol that sends the entire image at once, which is needed for support on other terminals such as WezTerm.

6. Implement kitty backend

Implements the kitty graphics protocol as a backend, using kitty's chunked image rendering, which should work within tmux and ssh if we keep the chunks small enough.

7. Implement `vim.img.protocol()` to detect preferred graphics protocol

Implements vim.img.protocol() that can be used to detect the preferred graphics protocol.

This is a reverse-engineered copy of how timg implements graphics protocol support, and relies on a couple of terminal queries, hence we implement vim.img._terminal.query() and vim.img._terminal.graphics.detect() to support figuring out if the terminal supports iterm2, kitty, or sixel protocols and mirrors the logic from timg.

chipsenkbeil · 2024-11-30T00:42:39Z

Heads up, I know there is formatting of commit messages needed and linting for preferences in Lua style guides.

The current code is me migrating over my working code from a private repo - not a fork of neovim - to be a pull request here. I'll work on updating the PR to be compliant, but wanted the code to be visible for comments.

In particular, I could use help in rewriting that parts of the PR that make use of Lua's io library - assuming we want to use a neovim equivalent - and to refactor parts of the code that could be improved. So looking for stronger critique, challenges, and suggestions 😄 This was an example-turned-PR, so not all of the code is high quality!

An example of doing this with the current PR:

local file = vim.img.load({
    filename = "/Users/senkwich/Pictures/org-roam-logo.png",
})

vim.img.show(file, {
    pos = { row = 8, col = 8 }, 
    backend = "iterm2",
})

runtime/lua/vim/img/_image.lua

runtime/lua/vim/img/_terminal.lua

chipsenkbeil · 2024-11-30T21:29:04Z

@justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data.

@kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using f=100 would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.

I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.

kovidgoyal · 2024-12-01T03:15:39Z

On Sat, Nov 30, 2024 at 01:29:25PM -0800, Chip Senkbeil wrote: @justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data. @kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using `f=100` would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.

Yes, correct. You can use either imagemagick or the statically compiled kitten binary that comes as part of kitty to do this.

I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.

sixel supports nothing, you need to convert every image format to the sixel format and transmit that.

runtime/lua/vim/_editor.lua

runtime/lua/vim/img.lua

runtime/lua/vim/img/_backend.lua

justinmk · 2024-12-02T12:56:29Z

runtime/lua/vim/img/_terminal.lua

should this live in runtime/lua/vim/termcap.lua ? cc @gpanders

Happy for it to live wherever you'd like. It felt odd to have this live under img, and I was unaware of termcap.lua. Let me know what you'd like to do.

Also happy to break out this logic to a separate PR if we want to review and merge this logic separately. It'd need to be handled prior to the image work, though. Or it can live in this PR.

termcap probably isn't the right name since that is specifically for terminal capabilities. But we can (and probably should) rename termcap to term or tty or something, and then we can move all of this there. The existing vim.termcap.query would become vim.tty.query, which still makes sense (we will need to keep the old termcap file around as a BWC stub for a bit).

runtime/lua/vim/img/_image.lua

runtime/lua/vim/img/_backend.lua

justinmk · 2024-12-02T15:32:12Z

runtime/lua/vim/img/_image.lua

+---Iterates over the chunks of the image, invoking `f` per chunk.
+---@param f fun(chunk:string, pos:integer, has_more:boolean)
+---@param opts? {size?:integer}
+function M:for_each_chunk(f, opts)


:help dev-patterns

Suggested change

function M:for_each_chunk(f, opts)

function M:for_each_chunk(opts, f)

If you want to support optional opts, make it an "overload".

Can we return an iterator instead (:help iterator)? That is more idiomatic, and composes with vim.iter, ipairs(), etc.

@justinmk yeah, I can do that. Was unaware of neovim having an iterator, so it'll be nice to get a feel for how that works.

iterators are a Lua concept, and can be as simple as returning a function (closure) which, when called, returns the next item.

:h for-in

runtime/lua/vim/img/_image.lua

chipsenkbeil · 2024-12-13T17:44:40Z

Thanks for the feedback, folks. Been traveling for work last few weeks, and I may have some free time this weekend to start applying some of the recommendations.

brianhuster · 2025-01-14T06:33:48Z

Just a drive by comment on my concern over the inclusion of the iterm2 protocol

I share this concern. The iTerm2 protocol is supported by only 2 terminals: iTerm2 and WezTerm. The latter also supports the kitty graphics protocol, meaning there is only one terminal for which supporting the iTerm image protocol would be useful.

Considering the additional complexity for supporting it, as well as the apparent feature gap mentioned here, I'd propose implementing support only for the kitty graphics protocol first.

Konsole supports iTerm2 protocol too

schrmh · 2025-03-04T20:07:34Z

Just a link to this comment pre-dating this pull request cause once again it was implied here more terminals support kitty than iTerm2 which does not seem to be the case:
#30889 (comment)
I count at least eight for iTerm2 there and wezterm supporting it would be news for me, but maybe I overlooked it.
Now, the relevancy of them is probably a different story.
In the meantime another new terminal emulator got a bit of a spike: ghostty. I think this supports iTerm2 and kitty.

chipsenkbeil · 2025-03-04T21:05:36Z

Hey folks, heads up that I'll finally get around to reviewing and implementing some of the suggestions in the next couple of weeks. @justinmk where did we align on keeping/removing iTerm protocol support? And sixel support?

My perspective is that we figure out a good interface for new backends (like these) to be injected, but having them directly included here may be too much. I have iterm and kitty today, although seems like kitty has a lot of different nuances on how it's implemented.

And from the feedback @gpanders has given, we can reuse parts of the existing codebase for terminal queries and other logic, which is nice. Will have to retest everything once those changes are made.

fredizzimo · 2025-03-05T15:21:28Z

I made a quick and dirty integration into Neovide here neovide/neovide#3039 to get some feel about the work done so far.

Here are some observations and comments:

If we have a better way of adding new custom backends, then a GUI implementation in Neovim might not be necessary. Although GUI events might have the advantage of being totally in sync with the UI protocol.
Since Neovim supports connecting multiple UIs to the same instance, I think the backend should be connected to the UI connections instead of globally, with the possibility of connecting different backends to different UIs. It probably won't work perfectly, since UIs have different grid sizes with different aspect ratios, but I think it would still be decent enough.
It should be possible to hide/destroy images
I echo the request for supporting different units. Pixel units will most likely be needed to align multiple images perfectly
You should be able to specify either with or height, otherwise it's impossible to render a resized image with the correct aspect ratio, especially when using grid-based sizes
I think a z-index is needed, and just like the Kitty protocol, making it possible to render both in front and behind the text
I think the image data and rendered instances should be separated, otherwise you will pay both transfer costs and memory for displaying the same image multiple times
I think ids for both the data and the instances are needed. I had to hack some in my implementation. The data id could perhaps be some kind of hash, to untie it from lua objects, so that the user never has to care.

Finally, given that this code, at least at the moment doesn't need to be in Neovim, I wonder if it would make sense to pull it out to a plugin for the time being? That way things can be iterated more quickly, it's easier for multiple people to contribute. It would also allow the API to be stabilized, and maybe get some real-world testing before it's integrated into Neovim (if it actually ever needs to be).

justinmk · 2025-03-09T13:30:49Z

@justinmk where did we align on keeping/removing iTerm protocol support? And sixel support?

My perspective is that we figure out a good interface for new backends (like these) to be injected, but having them directly included here may be too much

yeah, other backends/providers can be added later. definitely more important to get the interface right and focus on simplicity here, first.

justinmk · 2025-03-09T13:35:55Z

@fredizzimo those are great notes, would you mind creating a new issue to track that as "phase 2"?

7. I think the image data and rendered instances should be separated, otherwise you will pay both transfer costs and memory for displaying the same image multiple times

8. I think ids for both the data and the instances are needed. I had to hack some in my implementation. The data id could perhaps be some kind of hash, to untie it from lua objects, so that the user never has to care.

All of your notes sound like non-blockers except these two ⬆️ .

Finally, given that this code, at least at the moment doesn't need to be in Neovim, I wonder if it would make sense to pull it out to a plugin for the time being?

No, plugins already exist, and now we're taking the next step. We can mark this as experimental (and prefix the interface with underscore, vim.ui._img), if we think there are one-way decisions that haven't been resolved.

fredizzimo · 2025-03-10T15:38:19Z

@justinmk, thank you!

I was thinking that the upcoming 0.11 release was the reason, but a simple short response saying that would have cleared up a lot of confusion. I have been doing a lot of work with those, and was pushing for 0.11 myself, especially since I started working on them in the autumn already, but that's probably out of question now.

I will create an issue after this PR has been iterated a bit more, since I don't know how big changes @chipsenkbeil is planning to make, and what remains after that.

chipsenkbeil · 2025-03-10T15:49:50Z

@fredizzimo happy to collaborate with you on this. 😄 I wanted to do this in steps (each its own PR?)

A static image you can place within neovim (supporting just PNG or whatever works by default in terminals)
Support deleting the image placed within neovim
Dynamically resize/move an image
Injection of an image into a buffer within neovim (tracking its movement properly, possibly through the use of extmarks); this involves reflowing text around the image (in my mind) versus just covering it up or placing behind it. Think to the examples of images within markdown/org documents but with images as first-class citizens
Multiple image type support (bmp, jpg at least) within neovim (seems like the common way is to farm out to imagemagick, which I'm not a fan of, but seems like what we'd have to do first)
Video/gif support (there are reasons why this would be neat, but not a dealbreaker if we want to exclude this from neovim core)

Right now, this PR represents step 1.

Not saying all of these will make it in even as experiments, just thoughts on the priorities. This also doesn't discuss which terminals to support with which protocols underneath, so could imagine sliding those in at some point, too. So plenty that could be done inbetween or in place of the above.

I've gotten very busy with work, which is why this was delayed. Even having to work today while I'm on vacation, so haven't been able to come back to this PR yet. Any thoughts on how you want to tackle this in a collaborative setting? Probably starting by opening an issue or something to track additional items per @justinmk suggestion would be good.

justinmk · 2025-03-10T16:22:41Z

@chipsenkbeil in item 1 of your list, should items 7/8 from fredizzimo's list be part of that (this PR)?

chipsenkbeil · 2025-03-10T16:30:30Z

@chipsenkbeil in item 1 of your list, should items 7/8 from fredizzimo's list be part of that (this PR)?

sure, I can take a stab at that as part of this PR.

Implement `vim.img.load()` to load from a file or wrap base64 encoded bytes as a `vim.img.Image` instance.

Implement the skeleton of `vim.img.show()` without any backend implemented.

Implement `vim.img._terminal` module that supports writing to the tty tied to neovim as well as basic operations to manipulate the cursor, needed for backend implementations.

Add `for_each_chunk()` for instances of `vim.img.Image`. This method streamlines chunked iteration of image bytes, which is important when working with ssh or tmux and a protocol that supports chunked image rendering such as `iterm2` or `kitty`.

Implement the iterm2 backend, supporting both iTerm 3.5+ support for multipart images, and falling back to older protocol that sends the entire image at once, which is needed for support on other terminals such as WezTerm.

Implement the kitty graphics protocol as a backend, using kitty's chunked image rendering, which should work within tmux and ssh if we keep the chunks small enough.

…ocol Implement `vim.img.protocol()` that can be used to detect the preferred graphics protocol. This is a reverse-engineered copy of how `timg` implements graphics protocol support, and relies on a couple of terminal queries, hence we implement `vim.img._terminal.query()` and `vim.img._terminal.graphics.detect()` to support figuring out if the terminal supports iterm2, kitty, or sixel protocols and mirrors the logic from `timg`.

…) -> vim.ui.img.new()

…, new_position, new_region using cell size calculations to support cell/pixel unit

chipsenkbeil · 2025-04-29T20:16:52Z

Hey, look at that! Months later, I finally am getting around to applying the feedback from @justinmk and @gpanders while at the airport 😄

Still reading through all of the comments, but first cleaning things up. Going to see if I can entirely remove the terminal code in favor of termcaps, although I suspect I still need a helper function to move the cursor to properly place the image at a specific location.

@fredizzimo @schrmh @brianhuster @rockorager I haven't forgotten all of your comments, either! Once the code is cleaned up more and we get as simplistic as I can make it, we can revisit if we just want kitty (seems like a push for yes for now?).

I know that the iterm2 protocol is really limited, so I don't think we'd be able to get it to support the same functionality as kitty. It's also inconsistent in behavior from kitty on absolute image positioning as kitty renders an image over everything else whereas iterm2 scrolls the image out of view (and clears it) as you move lines around.

For sixel, this seems a bit more difficult for me to tackle in this first PR, so I'd propose that we instead keep things simple by me deleting iterm2 support, just focusing on kitty, and then new providers can be added in separate PRs as this evolves. And this SHOULD evolve quite a lot, btw.

Thoughts?

chipsenkbeil force-pushed the feat/ImageApi branch 3 times, most recently from 9964ad6 to 630f852 Compare November 30, 2024 03:00

chipsenkbeil changed the title ~~Implement image API for #30889~~ feat(img): add vim.img.protocol() to detect preferred graphics protocol Nov 30, 2024

chipsenkbeil changed the title ~~feat(img): add vim.img.protocol() to detect preferred graphics protocol~~ Implement image API for #30889 Nov 30, 2024

chipsenkbeil force-pushed the feat/ImageApi branch 2 times, most recently from fe0d5a8 to 33ee581 Compare November 30, 2024 04:27

chipsenkbeil changed the title ~~Implement image API for #30889~~ feat(img): implement image API Nov 30, 2024

chipsenkbeil changed the title ~~feat(img): implement image API~~ feat(img): implement image API for absolute positions Nov 30, 2024

ribru17 reviewed Nov 30, 2024

View reviewed changes

runtime/lua/vim/img/_image.lua Outdated Show resolved Hide resolved

chipsenkbeil mentioned this pull request Nov 30, 2024

image API #30889

Open

lewis6991 reviewed Nov 30, 2024

View reviewed changes

runtime/lua/vim/img/_terminal.lua Outdated Show resolved Hide resolved

chipsenkbeil force-pushed the feat/ImageApi branch 4 times, most recently from babd349 to dc51bf3 Compare November 30, 2024 20:56

chipsenkbeil force-pushed the feat/ImageApi branch from dc51bf3 to ce818a9 Compare December 1, 2024 20:24