Skip to content

feat(img): implement image API for absolute positions #31399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

chipsenkbeil
Copy link

@chipsenkbeil chipsenkbeil commented Nov 29, 2024

Alright, let's try this again without the massive amount of pull requests. 😄 Each commit here should be a standalone change, and I'll document the processes here.

This is geared towards tackling #30889, specifically on supporting

  1. Ability to load an image into memory
  2. Display an image with absolute coordinates
  3. Support different backends for rendering images such as iterm2 and kitty
  4. Smartly detect the type(s) of backend graphics supported

Things for later PRs would include

  1. Inline image support (attach to a buffer, reflow text around it)
  2. Alternative image formats (I think PNG is what is supported right now?)
  3. Video feeds (more complex, more limited backend support)

Breakdown of commits

1. Loading an image into memory

Implements vim.img.load() to load from a file or wrap base64 encoded bytes as a vim.img.Image instance.

2. Implement skeleton of vim.img.show() without backends

Implements the skeleton of vim.img.show() with any backend implemented.

3. Implement vim.img._terminal to support basic functionality needed for backends

Implements a vim.img._terminal module that supports writing to the tty tied to neovim as well as basic operations to manipulate the cursor, needed for backend implementations.

4. Implement vim.img.Image method for_each_chunk to streamline backend processing

Implements a method image:for_each_chunk for instances of vim.img.Image. This method streamlines chunked iteration of image bytes, which is important when working with ssh or tmux and a protocol that supports chunked image rendering such as iterm2 or kitty.

5. Implement iterm2 backend

Implements the iterm2 backend, supporting both iTerm 3.5+ support for multipart images, and falling back to older protocol that sends the entire image at once, which is needed for support on other terminals such as WezTerm.

6. Implement kitty backend

Implements the kitty graphics protocol as a backend, using kitty's chunked image rendering, which should work within tmux and ssh if we keep the chunks small enough.

7. Implement vim.img.protocol() to detect preferred graphics protocol

Implements vim.img.protocol() that can be used to detect the preferred graphics protocol.

This is a reverse-engineered copy of how timg implements graphics protocol support, and relies on a couple of terminal queries, hence we implement vim.img._terminal.query() and vim.img._terminal.graphics.detect() to support figuring out if the terminal supports iterm2, kitty, or sixel protocols and mirrors the logic from timg.

@chipsenkbeil
Copy link
Author

chipsenkbeil commented Nov 30, 2024

Heads up, I know there is formatting of commit messages needed and linting for preferences in Lua style guides.

The current code is me migrating over my working code from a private repo - not a fork of neovim - to be a pull request here. I'll work on updating the PR to be compliant, but wanted the code to be visible for comments.

In particular, I could use help in rewriting that parts of the PR that make use of Lua's io library - assuming we want to use a neovim equivalent - and to refactor parts of the code that could be improved. So looking for stronger critique, challenges, and suggestions 😄 This was an example-turned-PR, so not all of the code is high quality!


An example of doing this with the current PR:

image-example-pr

local file = vim.img.load({
    filename = "/Users/senkwich/Pictures/org-roam-logo.png",
})

vim.img.show(file, {
    pos = { row = 8, col = 8 }, 
    backend = "iterm2",
})

@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 3 times, most recently from 9964ad6 to 630f852 Compare November 30, 2024 03:00
@chipsenkbeil chipsenkbeil changed the title Implement image API for #30889 feat(img): add vim.img.protocol() to detect preferred graphics protocol Nov 30, 2024
@chipsenkbeil chipsenkbeil changed the title feat(img): add vim.img.protocol() to detect preferred graphics protocol Implement image API for #30889 Nov 30, 2024
@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 2 times, most recently from fe0d5a8 to 33ee581 Compare November 30, 2024 04:27
@chipsenkbeil chipsenkbeil changed the title Implement image API for #30889 feat(img): implement image API Nov 30, 2024
@chipsenkbeil chipsenkbeil changed the title feat(img): implement image API feat(img): implement image API for absolute positions Nov 30, 2024
@chipsenkbeil chipsenkbeil mentioned this pull request Nov 30, 2024
@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 4 times, most recently from babd349 to dc51bf3 Compare November 30, 2024 20:56
@chipsenkbeil
Copy link
Author

@justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data.

@kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using f=100 would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.

I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.

@kovidgoyal
Copy link

kovidgoyal commented Dec 1, 2024 via email

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this live in runtime/lua/vim/termcap.lua ? cc @gpanders

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for it to live wherever you'd like. It felt odd to have this live under img, and I was unaware of termcap.lua. Let me know what you'd like to do.

Also happy to break out this logic to a separate PR if we want to review and merge this logic separately. It'd need to be handled prior to the image work, though. Or it can live in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

termcap probably isn't the right name since that is specifically for terminal capabilities. But we can (and probably should) rename termcap to term or tty or something, and then we can move all of this there. The existing vim.termcap.query would become vim.tty.query, which still makes sense (we will need to keep the old termcap file around as a BWC stub for a bit).

---Iterates over the chunks of the image, invoking `f` per chunk.
---@param f fun(chunk:string, pos:integer, has_more:boolean)
---@param opts? {size?:integer}
function M:for_each_chunk(f, opts)
Copy link
Member

@justinmk justinmk Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:help dev-patterns

Suggested change
function M:for_each_chunk(f, opts)
function M:for_each_chunk(opts, f)

If you want to support optional opts, make it an "overload".

Copy link
Member

@justinmk justinmk Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return an iterator instead (:help iterator)? That is more idiomatic, and composes with vim.iter, ipairs(), etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinmk yeah, I can do that. Was unaware of neovim having an iterator, so it'll be nice to get a feel for how that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterators are a Lua concept, and can be as simple as returning a function (closure) which, when called, returns the next item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:h for-in

@chipsenkbeil
Copy link
Author

Thanks for the feedback, folks. Been traveling for work last few weeks, and I may have some free time this weekend to start applying some of the recommendations.

@brianhuster
Copy link
Contributor

Just a drive by comment on my concern over the inclusion of the iterm2 protocol

I share this concern. The iTerm2 protocol is supported by only 2 terminals: iTerm2 and WezTerm. The latter also supports the kitty graphics protocol, meaning there is only one terminal for which supporting the iTerm image protocol would be useful.

Considering the additional complexity for supporting it, as well as the apparent feature gap mentioned here, I'd propose implementing support only for the kitty graphics protocol first.

Konsole supports iTerm2 protocol too

@schrmh
Copy link

schrmh commented Mar 4, 2025

Just a link to this comment pre-dating this pull request cause once again it was implied here more terminals support kitty than iTerm2 which does not seem to be the case:
#30889 (comment)
I count at least eight for iTerm2 there and wezterm supporting it would be news for me, but maybe I overlooked it.
Now, the relevancy of them is probably a different story.
In the meantime another new terminal emulator got a bit of a spike: ghostty. I think this supports iTerm2 and kitty.

@chipsenkbeil
Copy link
Author

Hey folks, heads up that I'll finally get around to reviewing and implementing some of the suggestions in the next couple of weeks. @justinmk where did we align on keeping/removing iTerm protocol support? And sixel support?

My perspective is that we figure out a good interface for new backends (like these) to be injected, but having them directly included here may be too much. I have iterm and kitty today, although seems like kitty has a lot of different nuances on how it's implemented.

And from the feedback @gpanders has given, we can reuse parts of the existing codebase for terminal queries and other logic, which is nice. Will have to retest everything once those changes are made.

@fredizzimo
Copy link
Contributor

I made a quick and dirty integration into Neovide here neovide/neovide#3039 to get some feel about the work done so far.

Here are some observations and comments:

  1. If we have a better way of adding new custom backends, then a GUI implementation in Neovim might not be necessary. Although GUI events might have the advantage of being totally in sync with the UI protocol.
  2. Since Neovim supports connecting multiple UIs to the same instance, I think the backend should be connected to the UI connections instead of globally, with the possibility of connecting different backends to different UIs. It probably won't work perfectly, since UIs have different grid sizes with different aspect ratios, but I think it would still be decent enough.
  3. It should be possible to hide/destroy images
  4. I echo the request for supporting different units. Pixel units will most likely be needed to align multiple images perfectly
  5. You should be able to specify either with or height, otherwise it's impossible to render a resized image with the correct aspect ratio, especially when using grid-based sizes
  6. I think a z-index is needed, and just like the Kitty protocol, making it possible to render both in front and behind the text
  7. I think the image data and rendered instances should be separated, otherwise you will pay both transfer costs and memory for displaying the same image multiple times
  8. I think ids for both the data and the instances are needed. I had to hack some in my implementation. The data id could perhaps be some kind of hash, to untie it from lua objects, so that the user never has to care.

Finally, given that this code, at least at the moment doesn't need to be in Neovim, I wonder if it would make sense to pull it out to a plugin for the time being? That way things can be iterated more quickly, it's easier for multiple people to contribute. It would also allow the API to be stabilized, and maybe get some real-world testing before it's integrated into Neovim (if it actually ever needs to be).

@justinmk
Copy link
Member

justinmk commented Mar 9, 2025

@justinmk where did we align on keeping/removing iTerm protocol support? And sixel support?

My perspective is that we figure out a good interface for new backends (like these) to be injected, but having them directly included here may be too much

yeah, other backends/providers can be added later. definitely more important to get the interface right and focus on simplicity here, first.

@justinmk
Copy link
Member

justinmk commented Mar 9, 2025

@fredizzimo those are great notes, would you mind creating a new issue to track that as "phase 2"?

7. I think the image data and rendered instances should be separated, otherwise you will pay both transfer costs and memory for displaying the same image multiple times

8. I think ids for both the data and the instances are needed. I had to hack some in my implementation. The data id could perhaps be some kind of hash, to untie it from lua objects, so that the user never has to care.

All of your notes sound like non-blockers except these two ⬆️ .

Finally, given that this code, at least at the moment doesn't need to be in Neovim, I wonder if it would make sense to pull it out to a plugin for the time being?

No, plugins already exist, and now we're taking the next step. We can mark this as experimental (and prefix the interface with underscore, vim.ui._img), if we think there are one-way decisions that haven't been resolved.

@fredizzimo

This comment was marked as resolved.

@justinmk

This comment was marked as resolved.

@fredizzimo
Copy link
Contributor

@justinmk, thank you!

I was thinking that the upcoming 0.11 release was the reason, but a simple short response saying that would have cleared up a lot of confusion. I have been doing a lot of work with those, and was pushing for 0.11 myself, especially since I started working on them in the autumn already, but that's probably out of question now.

I will create an issue after this PR has been iterated a bit more, since I don't know how big changes @chipsenkbeil is planning to make, and what remains after that.

@chipsenkbeil
Copy link
Author

@fredizzimo happy to collaborate with you on this. 😄 I wanted to do this in steps (each its own PR?)

  1. A static image you can place within neovim (supporting just PNG or whatever works by default in terminals)
  2. Support deleting the image placed within neovim
  3. Dynamically resize/move an image
  4. Injection of an image into a buffer within neovim (tracking its movement properly, possibly through the use of extmarks); this involves reflowing text around the image (in my mind) versus just covering it up or placing behind it. Think to the examples of images within markdown/org documents but with images as first-class citizens
  5. Multiple image type support (bmp, jpg at least) within neovim (seems like the common way is to farm out to imagemagick, which I'm not a fan of, but seems like what we'd have to do first)
  6. Video/gif support (there are reasons why this would be neat, but not a dealbreaker if we want to exclude this from neovim core)

Right now, this PR represents step 1.

Not saying all of these will make it in even as experiments, just thoughts on the priorities. This also doesn't discuss which terminals to support with which protocols underneath, so could imagine sliding those in at some point, too. So plenty that could be done inbetween or in place of the above.

I've gotten very busy with work, which is why this was delayed. Even having to work today while I'm on vacation, so haven't been able to come back to this PR yet. Any thoughts on how you want to tackle this in a collaborative setting? Probably starting by opening an issue or something to track additional items per @justinmk suggestion would be good.

@justinmk
Copy link
Member

justinmk commented Mar 10, 2025

@chipsenkbeil in item 1 of your list, should items 7/8 from fredizzimo's list be part of that (this PR)?

@chipsenkbeil
Copy link
Author

@chipsenkbeil in item 1 of your list, should items 7/8 from fredizzimo's list be part of that (this PR)?

sure, I can take a stab at that as part of this PR.

Implement `vim.img.load()` to load from a file or wrap base64 encoded bytes as a `vim.img.Image` instance.
Implement the skeleton of `vim.img.show()` without any backend implemented.
Implement `vim.img._terminal` module that supports writing to the tty tied to neovim as well as basic operations to manipulate the cursor, needed for backend implementations.
Add `for_each_chunk()` for instances of `vim.img.Image`. This method streamlines chunked iteration of image bytes, which is important when working with ssh or tmux and a protocol that supports chunked image rendering such as `iterm2` or `kitty`.
Implement the iterm2 backend, supporting both iTerm 3.5+ support for multipart images, and falling back to older protocol that sends the entire image at once, which is needed for support on other terminals such as WezTerm.
Implement the kitty graphics protocol as a backend, using kitty's chunked image rendering, which should work within tmux and ssh if we keep the chunks small enough.
…ocol

Implement `vim.img.protocol()` that can be used to detect the preferred graphics protocol.

This is a reverse-engineered copy of how `timg` implements graphics protocol support, and relies on a couple of terminal queries, hence we implement `vim.img._terminal.query()` and `vim.img._terminal.graphics.detect()` to support figuring out if the terminal supports iterm2, kitty, or sixel protocols and mirrors the logic from `timg`.
@chipsenkbeil
Copy link
Author

Hey, look at that! Months later, I finally am getting around to applying the feedback from @justinmk and @gpanders while at the airport 😄

Still reading through all of the comments, but first cleaning things up. Going to see if I can entirely remove the terminal code in favor of termcaps, although I suspect I still need a helper function to move the cursor to properly place the image at a specific location.

@fredizzimo @schrmh @brianhuster @rockorager I haven't forgotten all of your comments, either! Once the code is cleaned up more and we get as simplistic as I can make it, we can revisit if we just want kitty (seems like a push for yes for now?).

I know that the iterm2 protocol is really limited, so I don't think we'd be able to get it to support the same functionality as kitty. It's also inconsistent in behavior from kitty on absolute image positioning as kitty renders an image over everything else whereas iterm2 scrolls the image out of view (and clears it) as you move lines around.

For sixel, this seems a bit more difficult for me to tackle in this first PR, so I'd propose that we instead keep things simple by me deleting iterm2 support, just focusing on kitty, and then new providers can be added in separate PRs as this evolves. And this SHOULD evolve quite a lot, btw.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants