feat(img): implement image API for absolute positions #31399

chipsenkbeil · 2024-11-29T22:01:21Z

Task List

A static image you can place within neovim (supporting just PNG or whatever works by default in terminals)

-- Supports loading PNG images into memory
local img = vim.ui.img.load("/path/to/img.png")

-- Supports lazy-loading image, deferring to a provider
local img = vim.ui.img.new({ filename = "/path/to/img.png" })

-- Supports specifying an image and explicitly providing the data
local img = vim.ui.img.new({ data = "...", filename = "/path/to/img.png" })

-- Once created, can be shown, returning an id
-- tied to the displayed image
local id = img:show() -- Places in top-left of editor with default size
local id = img:show({ pos = { x = 4, y = 8 })
local id = img:show({ relative = "cursor" })

Support deleting the image placed within neovim

local img = vim.ui.img.new({ filename = "/path/to/img.png" })
local id = img:show()

-- Supports hiding image by the id returned from displaying it
img:hide(id)

-- Supports hiding all places where the image was displayed
img:hide()

Dynamically resize/move an image

local img = vim.ui.img.new({ filename = "/path/to/img.png" })
local id = img:show({ pos = { x = 1, y = 2, unit = "cell" } })

-- Supports updating a displayed image with a new position
img:update(id, { pos = { x = 5, y = 6, unit = "cell" } })

-- Supports resizing a displayed image
img:update(id, { size = { width = 10, height = 5, unit = "cell" } })

Abstraction for 3rd parties to support images in neovim

-- Providers implement a small API to support showing and hiding images
vim.ui.img.providers['neovide'] = vim.ui.img.providers.new({
    ---@param img vim.ui.Image image data container to display
    ---@param opts? vim.ui.img.Opts specification of how to display the image
    ---@return integer id unique identifier connected to the displayed image (not vim.ui.Image)
    show = function(img, opts)
        -- Implement here
    end,

    ---@param ids integer[] list of displayed image ids to hide
    hide = function(ids)
        -- Implement here
    end,
})

-- Load an image to display, nothing different here
local img = vim.ui.img.load("/path/to/img.png")

-- Use the custom provider either by passing it by name
-- or directly passing in the provider instance itself
local id = img:show(img, { provider = 'neovide' })
img:hide(id, { provider = vim.ui.img.providers['neovide'] })

Injection of an image into a buffer within neovim (tracking its movement properly, possibly through the use of extmarks); this involves reflowing text around the image (in my mind) versus just covering it up or placing behind it. Think to the examples of images within markdown/org documents but with images as first-class citizens

Out of Scope for this PR

Multiple image type support (bmp, jpg at least) within neovim (seems like the common way is to farm out to imagemagick, which I'm not a fan of, but seems like what we'd have to do first)
Video/gif support (there are reasons why this would be neat, but not a dealbreaker if we want to exclude this from neovim core)

OLDER INFORMATION

Alright, let's try this again without the massive amount of pull requests. 😄 Each commit here should be a standalone change, and I'll document the processes here.

This is geared towards tackling #30889, specifically on supporting

Ability to load an image into memory
Display an image with absolute coordinates
Support different backends for rendering images such as iterm2 and kitty
Smartly detect the type(s) of backend graphics supported

Things for later PRs would include

Inline image support (attach to a buffer, reflow text around it)
Alternative image formats (I think PNG is what is supported right now?)
Video feeds (more complex, more limited backend support)

Breakdown of commits

1. Loading an image into memory

Implements vim.img.load() to load from a file or wrap base64 encoded bytes as a vim.img.Image instance.

2. Implement skeleton of vim.img.show() without backends

Implements the skeleton of vim.img.show() with any backend implemented.

3. Implement vim.img._terminal to support basic functionality needed for backends

Implements a vim.img._terminal module that supports writing to the tty tied to neovim as well as basic operations to manipulate the cursor, needed for backend implementations.

4. Implement `vim.img.Image` method `for_each_chunk` to streamline backend processing

Implements a method image:for_each_chunk for instances of vim.img.Image. This method streamlines chunked iteration of image bytes, which is important when working with ssh or tmux and a protocol that supports chunked image rendering such as iterm2 or kitty.

5. Implement iterm2 backend

Implements the iterm2 backend, supporting both iTerm 3.5+ support for multipart images, and falling back to older protocol that sends the entire image at once, which is needed for support on other terminals such as WezTerm.

6. Implement kitty backend

Implements the kitty graphics protocol as a backend, using kitty's chunked image rendering, which should work within tmux and ssh if we keep the chunks small enough.

7. Implement `vim.img.protocol()` to detect preferred graphics protocol

Implements vim.img.protocol() that can be used to detect the preferred graphics protocol.

This is a reverse-engineered copy of how timg implements graphics protocol support, and relies on a couple of terminal queries, hence we implement vim.img._terminal.query() and vim.img._terminal.graphics.detect() to support figuring out if the terminal supports iterm2, kitty, or sixel protocols and mirrors the logic from timg.

chipsenkbeil · 2024-11-30T00:42:39Z

Heads up, I know there is formatting of commit messages needed and linting for preferences in Lua style guides.

The current code is me migrating over my working code from a private repo - not a fork of neovim - to be a pull request here. I'll work on updating the PR to be compliant, but wanted the code to be visible for comments.

In particular, I could use help in rewriting that parts of the PR that make use of Lua's io library - assuming we want to use a neovim equivalent - and to refactor parts of the code that could be improved. So looking for stronger critique, challenges, and suggestions 😄 This was an example-turned-PR, so not all of the code is high quality!

An example of doing this with the current PR:

local file = vim.img.load({
    filename = "/Users/senkwich/Pictures/org-roam-logo.png",
})

vim.img.show(file, {
    pos = { row = 8, col = 8 }, 
    backend = "iterm2",
})

runtime/lua/vim/img/_image.lua

runtime/lua/vim/img/_terminal.lua

chipsenkbeil · 2024-11-30T21:29:04Z

@justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data.

@kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using f=100 would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.

I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.

kovidgoyal · 2024-12-01T03:15:39Z

On Sat, Nov 30, 2024 at 01:29:25PM -0800, Chip Senkbeil wrote: @justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data. @kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using `f=100` would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.

Yes, correct. You can use either imagemagick or the statically compiled kitten binary that comes as part of kitty to do this.

I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.

sixel supports nothing, you need to convert every image format to the sixel format and transmit that.

runtime/lua/vim/_editor.lua

runtime/lua/vim/img.lua

runtime/lua/vim/img/_backend.lua

runtime/lua/vim/img/_terminal.lua

runtime/lua/vim/img/_image.lua

runtime/lua/vim/img/_backend.lua

runtime/lua/vim/img/_image.lua

… region

…er()

chipsenkbeil · 2025-04-30T21:02:12Z

Just a drive by comment on my concern over the inclusion of the iterm2 protocol. Disclaimer: I don't have any experience implementing it, but I have been through the protocol.

So...the kitty graphics protocol lets you transmit an image and very finely control it's placement, including a clip region. By specifying a clip region, it is trivial to "scroll" an image partially off the screen - you can specify the horizontal or vertical offset (in pixels) to clip the image.

Without this clip capability, it seems that neovim would need to have an image processing library as well to internally clip images for display? What would the plan be for an image which gets partially scrolled?

Commenting here that I'm fine and most likely moving forward with removing iterm2 and just using kitty. I was already aware of limitations in iterm2 with cropping. My first thought is to farm externally to a process like image magick to crop, which I "think" can be done without creating a temporary image. So if we ever revisit supporting iterm2, that would be the approach I'd take.

j4james · 2025-05-01T01:18:59Z

My first thought is to farm externally to a process like image magick to crop, which I "think" can be done without creating a temporary image.

Just FYI, on terminals with level 4 capabilities, you can crop an image by rendering it to an offscreen page, and then copying the relevant segments back to the main page. This can also serve as a way to cache images to a certain extent. I'm not sure about the iterm image protocol, but I do know this works with Sixel. The only catch is that DEC pages may not interoperate very well with the Xterm alt buffer mode, assuming that's a requirement.

…query logic for now, default to kitty provider

chipsenkbeil · 2025-05-03T23:33:49Z

@gpanders I figured out why io.stdout:write() would not work and I needed to access the tty device directly. Using kitty's direct transfer - you send all of the image bytes directly via escape codes - seems to not work with io.stdout:write() but does work if you access and use the tty device directly.

If you switch to local filesystem access via a file transfer (not escape codes), then it works fine to use io.stdout:write(). I'll document this in the PR.

…ursor logic to kitty provider, and remove unneeded terminal helpers by switching to io.stdout:write()

…ider

chipsenkbeil · 2025-05-04T00:18:29Z

@justinmk @fredizzimo I've rewritten the provider interface and implemented basic kitty graphics logic to hide an image. This provides a bit of an abstraction between the image (the data) and the placement by having two separate ids. Whenever you show an image, the provider is expected to generate some id that can be passed back to it later to hide/remove the image. Thoughts?

@gpanders I've been able to fully remove the terminal helper code and just use io.stdout:write(). I did keep the cursor move, but removed the restore logic and instead - for the kitty provider - use an option to prevent the cursor moving like you alluded to.

Still got some open questions in this code at this point, but ready for another skim to get thoughts on this one.

-- Load the image from disk. We assume all images are loaded from disk right now, and are PNGs
local img = vim.ui.img.load("/Users/senkwich/projects/neovim-img-test/org-roam-logo.png")

-- Calls the underlying provider (kitty) to show the image, returning an id that can hide it later
local id = img:show({
    pos = { x = 8, y = 8 },
    provider = "kitty",
})

-- For the test, as soon as any key is pressed, the image is hidden
vim.on_key(function()
    img:hide(id)
end)

example-of-deleting-image.mp4

…support registering new providers

…e mapping

chipsenkbeil · 2025-05-04T20:02:40Z

I added in some additional options as an experiment to mirror a bit of what it looks like the floating window api can do when it comes to the relative position of the image, now supporting editor (what you've seen thus far), win to display relative to a specific window, cursor to display relative to a specific window's cursor, and mouse to display relative to the mouse (from last click, unless mousemoveevent is enabled).

Here's a silly preview of an image being displayed where the mouse is presently, and then on move it hides the only image (in kitty, by deleting the placement) and then showing a new image where the cursor is. Seems fairly quick, which is nice.

The reason I did this was to potentially set up what the config might look like to set relative to a buffer, which would then rely on something like the kitty implementation using the unicode placement functionality.

neovim-image-mouse-move.mp4

…ting an image; fix placement not being cleared from cache in kitty provider

…methods

…tly transmitting

…we need ffi, ioctl, and TIOCGWINSZ

chipsenkbeil · 2025-05-04T22:58:53Z

@gpanders does TermResponse work with CSI escape sequences? Seems like it's documented for just OSC and DCS?

Reason I ask is that one issue popped up with trying to support converting between pixel and cell units, and that's getting the screen size in pixels. I was trying to do this via \027[14t to request the screen size in the form \027[4;888;999t where 888 is height and 999 is width, but I'm not getting TermResponse to trigger nor do I see anything being printed out. Works fine with a lua shell printing it out via io.stdout:write().

I'm assuming it's filtered out as TERMKEY_RES_NONE from termkey_interpret_string based on

neovim/src/nvim/tui/input.c

Line 577 in 0862c10

if (termkey_interpret_string(input->tk, key, &str) == TERMKEY_RES_KEY) {

FFI alternative

@justinmk the alternative way I've seen this done is using ioctl and TIOCGWINSZ via ffi calls. And you'd have to do something completely different to support windows. Example of ioctl usage from snacks:

https://github.com/folke/snacks.nvim/blob/bc0630e43be5699bb94dadc302c0d21615421d93/lua/snacks/image/terminal.lua#L67-L120

function M.size()
  if size then
    return size
  end
  local ffi = require("ffi")
  ffi.cdef([[
    typedef struct {
      unsigned short row;
      unsigned short col;
      unsigned short xpixel;
      unsigned short ypixel;
    } winsize;
    int ioctl(int, int, ...);
  ]])

  local TIOCGWINSZ = nil
  if vim.fn.has("linux") == 1 then
    TIOCGWINSZ = 0x5413
  elseif vim.fn.has("mac") == 1 or vim.fn.has("bsd") == 1 then
    TIOCGWINSZ = 0x40087468
  end

  local dw, dh = 9, 18
  ---@class snacks.image.terminal.Dim
  size = {
    width = vim.o.columns * dw,
    height = vim.o.lines * dh,
    columns = vim.o.columns,
    rows = vim.o.lines,
    cell_width = dw,
    cell_height = dh,
    scale = dw / 8,
  }

  pcall(function()
    ---@type { row: number, col: number, xpixel: number, ypixel: number }
    local sz = ffi.new("winsize")
    if ffi.C.ioctl(1, TIOCGWINSZ, sz) ~= 0 or sz.col == 0 or sz.row == 0 then
      return
    end
    size = {
      width = sz.xpixel,
      height = sz.ypixel,
      columns = sz.col,
      rows = sz.row,
      cell_width = sz.xpixel / sz.col,
      cell_height = sz.ypixel / sz.row,
      -- try to guess dpi scale
      scale = math.max(1, sz.xpixel / sz.col / 8),
    }
  end)

  return size
end

chipsenkbeil force-pushed the feat/ImageApi branch 3 times, most recently from 9964ad6 to 630f852 Compare November 30, 2024 03:00

chipsenkbeil changed the title ~~Implement image API for #30889~~ feat(img): add vim.img.protocol() to detect preferred graphics protocol Nov 30, 2024

chipsenkbeil changed the title ~~feat(img): add vim.img.protocol() to detect preferred graphics protocol~~ Implement image API for #30889 Nov 30, 2024

chipsenkbeil force-pushed the feat/ImageApi branch 2 times, most recently from fe0d5a8 to 33ee581 Compare November 30, 2024 04:27

chipsenkbeil changed the title ~~Implement image API for #30889~~ feat(img): implement image API Nov 30, 2024

chipsenkbeil changed the title ~~feat(img): implement image API~~ feat(img): implement image API for absolute positions Nov 30, 2024

ribru17 reviewed Nov 30, 2024

View reviewed changes

runtime/lua/vim/img/_image.lua Outdated Show resolved Hide resolved

chipsenkbeil mentioned this pull request Nov 30, 2024

image API #30889

Open

lewis6991 reviewed Nov 30, 2024

View reviewed changes

runtime/lua/vim/img/_terminal.lua Outdated Show resolved Hide resolved

chipsenkbeil force-pushed the feat/ImageApi branch 4 times, most recently from babd349 to dc51bf3 Compare November 30, 2024 20:56

chipsenkbeil force-pushed the feat/ImageApi branch from dc51bf3 to ce818a9 Compare December 1, 2024 20:24