-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
feat(img): implement rudimentary image API #31399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Heads up, I know there is formatting of commit messages needed and linting for preferences in Lua style guides. The current code is me migrating over my working code from a private repo - not a fork of neovim - to be a pull request here. I'll work on updating the PR to be compliant, but wanted the code to be visible for comments. In particular, I could use help in rewriting that parts of the PR that make use of Lua's io library - assuming we want to use a neovim equivalent - and to refactor parts of the code that could be improved. So looking for stronger critique, challenges, and suggestions 😄 This was an example-turned-PR, so not all of the code is high quality! An example of doing this with the current PR: local file = vim.img.load({
filename = "/Users/senkwich/Pictures/org-roam-logo.png",
})
vim.img.show(file, {
pos = { row = 8, col = 8 },
backend = "iterm2",
}) |
9964ad6
to
630f852
Compare
vim.img.protocol()
to detect preferred graphics protocol
vim.img.protocol()
to detect preferred graphics protocolfe0d5a8
to
33ee581
Compare
babd349
to
dc51bf3
Compare
@justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data. @kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data. |
On Sat, Nov 30, 2024 at 01:29:25PM -0800, Chip Senkbeil wrote:
@justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data.
@kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using `f=100` would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.
Yes, correct. You can use either imagemagick or the statically compiled
kitten binary that comes as part of kitty to do this.
I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.
sixel supports nothing, you need to convert every image format to the
sixel format and transmit that.
|
dc51bf3
to
ce818a9
Compare
…k vim.o.imgprovider change unloads old provider
Hm, it might not be possible to perfectly pixel align the images even with the kitty protocol without manually adding transparent borders using ImageMagic or similar You can set the offset with 'X', but as far as I can see the right border is always aligned to the column. I wouldn't want to limit Neovim support to just that though. And indeed, your current protocol, does not seem to forbid it. I guess that even the kitty protocol needs to rely on Neovim/ImageMagick doing some resizing, Neovide can implement it using the GPU though. |
On Tue, May 13, 2025 at 05:54:57AM -0700, fredizzimo wrote:
fredizzimo left a comment (neovim/neovim#31399)
Hm, it might not be possible to perfectly pixel align the images even with the kitty protocol without manually adding transparent borders using ImageMagic or similar
You can set the offset with 'X', but as far as I can see the right border is always aligned to the column. I wouldn't want to limit Neovim support to just that though. And indeed, your current protocol, does not seem to forbid it.
No the right border is not aligned to anything, you specify the top left
cell and the image is simply rendered from there, unless you use the c
and r control codes to render it in a specific area.
|
I do specify 'c' to give it the width I want, but it only takes an integer and seem to right align in my tests. So, the option is to resize the image before displaying, or add a border, to let Kitty resize. A more concrete example is that I want to display an image starting from column 2.5 and end at column 37.75 |
On Tue, May 13, 2025 at 06:10:17AM -0700, fredizzimo wrote:
fredizzimo left a comment (neovim/neovim#31399)
> No the right border is not aligned to anything, you specify the top left cell and the image is simply rendered from there, unless you use the c and r control codes to render it in a specific area.
I do specify 'c' to give it the width I want, but it only takes an integer and seem to right align in my tests. So, the option is to resize the image before displaying, or add a border, to let Kitty resize.
A more concrete example is that I want to display an image starting from column 2.5 and end at column 37.75
Yes, if you want to display images *at a specific pixel size different from
the images actual size* then you need to resize the image yourself.
That's by design. Resizing images is not a trivial task, the algorithm
you select for it can significantly alter the end result, whether you do
it on the GPU or the CPU. Thus the protocol has only limited options for
resizing.
|
In that case it's probably the easiest to unify at least all of the terminal implementations to always add a transparent border and resize the image to fit to an exact cell width and height multipliers. As mentioned, Neovide could do that by itself for simple filtering, but the protocol could also be simplified to only deal with cell coordinates on the provider side. It also has the advantage that more advanced algorithms could be used. Linear upsampling and lancos-3 downsampling, are good compromises for default resizing strategies https://entropymine.com/resamplescope/notes/browsers/. And I don't think Lancos-3 can be done in real-time on the GPU anyway, so we are not losing too much by delegating it to some library or another process. What do you think? Edit: Probably not to be done in this PR |
So, if I get this right
(*) - Limited pixel-based sizing, affected by the position So, Sixel will need custom resizing and conversion anyway. And the rest would have to be limited to a common factor of only supporting grid based positioning, and grid based resizing, where one dimension can be left out to have it scale automatically. That is an option for the core protocol though, loading custom formats, resizing and adding transparent borders for pixel based positioning could be delegated to plugins as long as Neovim provides accurate grid dimensions, and helper functions to do the calculations. And I think that's what's essentially implemented currently, if the pixel support and crop is removed from the opts table. |
|
||
if self.unit == 'cell' then | ||
local px_x, px_y = screen.cells_to_pixels(self.x, self.y) | ||
local px_width, px_height = screen.cells_to_pixels(self.width, self.height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about terminal emulators, but in order to round to perfect cell positions without gaps when displaying in Neovide, the formula for calculating the width needs to be the following. Where
cells_to_pixels
rounds instead of flooring
x1, y1 = screen.cells_to_pixels(self.x, self.y)
x2, y2 = screen.cells_to_pixels(self.x + self.width, self.y + self.height)
width = x2 - x1
height = y2 - y1
This also means that the image can be one pixel smaller or larger depending on where it's displayed on the screen.
Neovide can't easily be changed to do something else, since the coordinates has to match with how the GPU draws anti-aliased lines.
The other option is to use fractional coordinates all the way and draw the image at fractional coordinates with a fractional width. But that won't align nicely with our box-drawing characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no proof, but a quick experimentation showed that
Alacritty, Wezterm, Konsole, Foot and Kitty all use integer grid sizes, and render FiraCode 10.5 with a slightly wider spacing than Neovide, which in turn renders it with the same spacing as editors like Kate and LibreOffice.
So those terminals should all be unaffected by the change. But I wonder if there are any terminals that renders fractional widths in a different way than Neovide. In that case the calculation code might have to be moved to the provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fredizzimo acknowledging I've seen your messages. Heading back on a flight stateside, so I'll be out for a bit from a mix of jetlag and travel. Rounding vs flooring is fine with me for the helper tools.
@justinmk had made the comment that it was confusing when something needed pixels vs cells, and the table you showed earlier also exemplified that different protocols want them in different units. To me, this means that the neovim interface should provide an abstraction where the user can provide units in either form (when it makes sense) and does the calculations. If that means that each provider needs to implement their own calculations, I think it's an okay tradeoff. Neovim can provide a stock implementation that its built-in providers use, unless the logic ends up being different for each one. Any concerns there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking that Neovim at the core would only support cells units, since none of the terminals can perfectly support pixel units. They all need actual image manipulating, like resizing and adding borders.
But Neovim can still provide an API to query the pixel size, and even do calculations based on it. An advanced plugin that takes use of this would first manipulate the image itself before calling the API.
It should also be noted, that for example snacks.nvim only uses grid coordinates, so it's not super limiting. The limit comes when you want to build advanced UIs using multiple images for example, but even then, it might not be a huge problem if you restrict yourself to the grid.
Note that the user of the API still needs some pixel information, to be able to determine for example how many rows that needs to be reserved if displaying the image taking exactly x columns, for example. But when displaying, they can just set the columns to x, and the low level protocol (Kitty, iterm2, Neovide) will make sure that the correct number of pixels are used for displaying, and the last row will likely not be fully covered by the image.
For example, if the cell size is 9x13 and the image is 100x200, and we want to display it over 10 columns, then our API tells that 14 rows are needed, and the image will be displayed as 90x180, leaving two pixels unused in the last row. An advanced plugin might add a 1 pixel border at the top and bottom of the image itself to make it 90x182 and display it centred.
Edit: I wish I knew about the Kitty protocol limit before going this far. I thought that it supported exact pixel positions and sizes, and that we could get more complete support by limiting ourselves to that.
…ger delay and ignore while drawing
@justinmk @gpanders @j4james okay, I think I'm hitting a wall and need your help on this piece. I'm illustrating the problems I'm facing in two videos: Flicker on sixel and iterm2Below is a video of iTerm2, WezTerm, Ghostty, and Windows Terminal displaying images.
example-of-images-neovim.mp4The sixel and iterm2 are a bit rough because - unlike kitty - I need to manually clear the screen and redraw all images whenever something changes. This could be scrolling, could be typing over the image, could be swapping buffers within a window where an image is present, etc. I have to add a delay from when I trigger a screen redraw (via
Broken Escape Codesgarbage-image-output-neovim.mp4@gpanders @j4james I'm still getting parts of my sixel and iterm2 escape codes showing up (with images broken) every so often, and I think this is because I'm fighting neovim to write to the same tty, but I'm not fully sure. I haven't noticed this problem on Windows with the Windows terminal. When I trigger a full refresh via Is there any way to control this more effectively? I'm currently batching all of my escape codes together to send out at once with the above Neovim eats terminal responses@gpanders you may have missed my previous callout on this. Seems like Without being able to get a response for one or both of those, Windows pixel sizes for individual cells can't be calculated. At least, based on what @j4james has mentioned in a recent comment. |
If those were to be supported, the integration probably needs to be at a deeper level, in tui.c, with the data transferred there over the UI protocol. That part of the code knows exactly what has changed and needs to be redrawn. That's actually probably the case even for Kitty if you connect to a remote instance. But it also makes the whole feature quite a bit more complex. It might also be that it's easier to forget the absolute positioning support, and only support something similar to unicode placeholders. That way Neovim will automatically resolve all overlaps and partially visible images. And then you "just" need to refresh the cells as they are transferred to the tui.c code. But you still also need a way to transfer the actual image data there somehow. |
@chipsenkbeil I don't know anything about nvim scripting, so I'm just guessing here, but I noticed that And assuming that does work, you might then be able to wrap both the call to
All of that is feasible in theory, but I know at least on Windows Terminal it isn't currently supported in the alt buffer (although I'd be happy to see if we could extend that support if there was a demand for it). There are a few other terminals that support paging, and which can potentially be used from within the alt buffer, but there are some idiosyncrasies in their implementation which makes me doubt whether they could support this use case. The bottom line is this is probably something best left for later, if there is need for further optimization (particularly for clipping). But I think synchronized output is still the best short term solution for the flickering issue, and it's likely to be far more widely supported.
Note that for Windows Terminal in particular, you could just fallback to a hardcoded cell size of 10x20 if you can't determine the cell size through any other means. That won't work on most other terminals though. |
… to hide cursor during image display; refactor parts of providers
Specifically, redraw in more cases, while introducing the concept of only clearing when at least one placement is marked as needing the screen to be cleared
…hen neovim exits to clear images
I got the synchronized output working. Reading through neovim's source, turns out it uses that mode already, but it can be disabled with an option, Flicker is gone, and I tested as well on nightly windows terminal and it works great for sixel! @j4james is there any alternative I can pursue outside of CSI for windows pixel dimensions, cell or text area? If not, that'll need to be added to neovim core, I think. |
@chipsenkbeil Not sure if you missed this comment, but I mentioned above you can just hardcode a cell size of 10x20 and it'll work on Windows Terminal. I'm assuming you already know the number of columns and rows, so if you need the window pixel dimensions you would just multiple the columns by 10 and the rows by 20 (it's a kind of virtual size, but the images are automatically scaled, so that's all you need to know). That won't work for most other terminals, though, but it's at least a temporary solution until neovim core can support the |
Oh, I misread that and thought that was a fallback in case I couldn't get the size for Windows Terminal. So Windows Terminal is always 10x20? I'm checking for the presence of Anyway, thanks for all of your help on the sixel, escape codes, and Windows. Really sped up the development on this side :) |
@chipsenkbeil I wouldn't recommend |
…or Windows, and round instead of floor when converting between pixels and cells
…erm2/sixel not clearing screen when removing last image, update iterm2 to support cropping using ImageMagick
-- Assume that for Windows the 10x20 used by Windows Terminal | ||
-- is a better choice for a default | ||
if vim.fn.has('win32') == 1 then | ||
cell_width = 10 | ||
cell_height = 20 | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is just testing whether vim itself is running on Windows, which isn't particularly meaningful. If you're using a Windows terminal you're just as likely (if not more likely) to be running vim in WSL. or when connected to a Linux server. And you'd assumedly also get a false positive with a Linux terminal remoting onto a Windows server.
I thought this would be an easy win, but if it's a problem to make 10x20 the universal default, it's probably best to just drop it for now, and wait for queries to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can set the whole thing to 10x20. When I was browsing snacks.nvim
, one of a handful of plugins that wrote earlier image logic for neovim, they defaulted to 9x18, hence why I picked that first. But I have no preference either way on what the default is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why anyone would choose 9x18 as a default (there may well be a good reason). I just know that 10x20 is the cell size of the original VT240 and VT330/340 terminals, so it's what real terminal emulators need to emulate if they want to be able to run software that targetted those devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is windows terminal hard coded to that? Do you maybe have a link to resources about how windows terminal does things? I am obviously also interested in that because of #32408 , but don't have access to a windows machine at the moment to test things. I should really just spin it up in a VM though anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, at least for now Windows Terminal is hardcoded to 10x20, but it's possible there might come a time where they make that a configurable option. I personally don't think that's a good idea, and if it were up to me I wouldn't allow it, but it's not my decision to make.
So again I would recommend the use of size query sequences if at all possible. If you want to support third-party terminals like WezTerm, Contour, or Mintty, then that's essential. But falling back to 10x20 is better than nothing, because that should at least work on some terminals.
…l pixel size to 10x20
…nterface for provider implementers
Task List
Out of Scope for this PR
OLDER INFORMATION
Alright, let's try this again without the massive amount of pull requests. 😄 Each commit here should be a standalone change, and I'll document the processes here.
This is geared towards tackling #30889, specifically on supporting
Things for later PRs would include
Breakdown of commits
1. Loading an image into memory
Implements
vim.img.load()
to load from a file or wrap base64 encoded bytes as avim.img.Image
instance.2. Implement skeleton of vim.img.show() without backends
Implements the skeleton of
vim.img.show()
with any backend implemented.3. Implement vim.img._terminal to support basic functionality needed for backends
Implements a
vim.img._terminal
module that supports writing to the tty tied to neovim as well as basic operations to manipulate the cursor, needed for backend implementations.4. Implement
vim.img.Image
methodfor_each_chunk
to streamline backend processingImplements a method
image:for_each_chunk
for instances ofvim.img.Image
. This method streamlines chunked iteration of image bytes, which is important when working with ssh or tmux and a protocol that supports chunked image rendering such asiterm2
orkitty
.5. Implement iterm2 backend
Implements the iterm2 backend, supporting both iTerm 3.5+ support for multipart images, and falling back to older protocol that sends the entire image at once, which is needed for support on other terminals such as WezTerm.
6. Implement kitty backend
Implements the kitty graphics protocol as a backend, using kitty's chunked image rendering, which should work within tmux and ssh if we keep the chunks small enough.
7. Implement
vim.img.protocol()
to detect preferred graphics protocolImplements
vim.img.protocol()
that can be used to detect the preferred graphics protocol.This is a reverse-engineered copy of how
timg
implements graphics protocol support, and relies on a couple of terminal queries, hence we implementvim.img._terminal.query()
andvim.img._terminal.graphics.detect()
to support figuring out if the terminal supports iterm2, kitty, or sixel protocols and mirrors the logic fromtimg
.