Skip to content
/ neovim Public
  • Sponsor neovim/neovim

  • Notifications You must be signed in to change notification settings
  • Fork 6.1k

feat(img): implement image API for absolute positions #31399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

chipsenkbeil
Copy link

@chipsenkbeil chipsenkbeil commented Nov 29, 2024

Task List

  • A static image you can place within neovim (supporting just PNG or whatever works by default in terminals)
    -- Supports loading PNG images into memory
    local img = vim.ui.img.load("/path/to/img.png")
    
    -- Supports lazy-loading image, deferring to a provider
    local img = vim.ui.img.new({ filename = "/path/to/img.png" })
    
    -- Supports specifying an image and explicitly providing the data
    local img = vim.ui.img.new({ data = "...", filename = "/path/to/img.png" })
    
    -- Once created, can be shown, returning an id
    -- tied to the displayed image
    local id = img:show() -- Places in top-left of editor with default size
    local id = img:show({ pos = { x = 4, y = 8 })
    local id = img:show({ relative = "cursor" })
  • Support deleting the image placed within neovim
    local img = vim.ui.img.new({ filename = "/path/to/img.png" })
    local id = img:show()
    
    -- Supports hiding image by the id returned from displaying it
    img:hide(id)
    
    -- Supports hiding all places where the image was displayed
    img:hide()
  • Dynamically resize/move an image
    local img = vim.ui.img.new({ filename = "/path/to/img.png" })
    local id = img:show({ pos = { x = 1, y = 2, unit = "cell" } })
    
    -- Supports updating a displayed image with a new position
    img:update(id, { pos = { x = 5, y = 6, unit = "cell" } })
    
    -- Supports resizing a displayed image
    img:update(id, { size = { width = 10, height = 5, unit = "cell" } })
  • Abstraction for 3rd parties to support images in neovim
    -- Providers implement a small API to support showing and hiding images
    vim.ui.img.providers['neovide'] = vim.ui.img.providers.new({
        ---@param img vim.ui.Image image data container to display
        ---@param opts? vim.ui.img.Opts specification of how to display the image
        ---@return integer id unique identifier connected to the displayed image (not vim.ui.Image)
        show = function(img, opts)
            -- Implement here
        end,
    
        ---@param ids integer[] list of displayed image ids to hide
        hide = function(ids)
            -- Implement here
        end,
    })
    
    -- Load an image to display, nothing different here
    local img = vim.ui.img.load("/path/to/img.png")
    
    -- Use the custom provider either by passing it by name
    -- or directly passing in the provider instance itself
    local id = img:show(img, { provider = 'neovide' })
    img:hide(id, { provider = vim.ui.img.providers['neovide'] })
  • Injection of an image into a buffer within neovim (tracking its movement properly, possibly through the use of extmarks); this involves reflowing text around the image (in my mind) versus just covering it up or placing behind it. Think to the examples of images within markdown/org documents but with images as first-class citizens

Out of Scope for this PR

  1. Multiple image type support (bmp, jpg at least) within neovim (seems like the common way is to farm out to imagemagick, which I'm not a fan of, but seems like what we'd have to do first)
  2. Video/gif support (there are reasons why this would be neat, but not a dealbreaker if we want to exclude this from neovim core)

OLDER INFORMATION

Alright, let's try this again without the massive amount of pull requests. 😄 Each commit here should be a standalone change, and I'll document the processes here.

This is geared towards tackling #30889, specifically on supporting

  1. Ability to load an image into memory
  2. Display an image with absolute coordinates
  3. Support different backends for rendering images such as iterm2 and kitty
  4. Smartly detect the type(s) of backend graphics supported

Things for later PRs would include

  1. Inline image support (attach to a buffer, reflow text around it)
  2. Alternative image formats (I think PNG is what is supported right now?)
  3. Video feeds (more complex, more limited backend support)

Breakdown of commits

1. Loading an image into memory

Implements vim.img.load() to load from a file or wrap base64 encoded bytes as a vim.img.Image instance.

2. Implement skeleton of vim.img.show() without backends

Implements the skeleton of vim.img.show() with any backend implemented.

3. Implement vim.img._terminal to support basic functionality needed for backends

Implements a vim.img._terminal module that supports writing to the tty tied to neovim as well as basic operations to manipulate the cursor, needed for backend implementations.

4. Implement vim.img.Image method for_each_chunk to streamline backend processing

Implements a method image:for_each_chunk for instances of vim.img.Image. This method streamlines chunked iteration of image bytes, which is important when working with ssh or tmux and a protocol that supports chunked image rendering such as iterm2 or kitty.

5. Implement iterm2 backend

Implements the iterm2 backend, supporting both iTerm 3.5+ support for multipart images, and falling back to older protocol that sends the entire image at once, which is needed for support on other terminals such as WezTerm.

6. Implement kitty backend

Implements the kitty graphics protocol as a backend, using kitty's chunked image rendering, which should work within tmux and ssh if we keep the chunks small enough.

7. Implement vim.img.protocol() to detect preferred graphics protocol

Implements vim.img.protocol() that can be used to detect the preferred graphics protocol.

This is a reverse-engineered copy of how timg implements graphics protocol support, and relies on a couple of terminal queries, hence we implement vim.img._terminal.query() and vim.img._terminal.graphics.detect() to support figuring out if the terminal supports iterm2, kitty, or sixel protocols and mirrors the logic from timg.

PHP-Proxy

PHP-Proxy

Error accessing resource: 404 - Not Found

@chipsenkbeil
Copy link
Author

chipsenkbeil commented Nov 30, 2024

Heads up, I know there is formatting of commit messages needed and linting for preferences in Lua style guides.

The current code is me migrating over my working code from a private repo - not a fork of neovim - to be a pull request here. I'll work on updating the PR to be compliant, but wanted the code to be visible for comments.

In particular, I could use help in rewriting that parts of the PR that make use of Lua's io library - assuming we want to use a neovim equivalent - and to refactor parts of the code that could be improved. So looking for stronger critique, challenges, and suggestions 😄 This was an example-turned-PR, so not all of the code is high quality!


An example of doing this with the current PR:

image-example-pr

local file = vim.img.load({
    filename = "/Users/senkwich/Pictures/org-roam-logo.png",
})

vim.img.show(file, {
    pos = { row = 8, col = 8 }, 
    backend = "iterm2",
})

@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 3 times, most recently from 9964ad6 to 630f852 Compare November 30, 2024 03:00
@chipsenkbeil chipsenkbeil changed the title Implement image API for #30889 feat(img): add vim.img.protocol() to detect preferred graphics protocol Nov 30, 2024
@chipsenkbeil chipsenkbeil changed the title feat(img): add vim.img.protocol() to detect preferred graphics protocol Implement image API for #30889 Nov 30, 2024
@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 2 times, most recently from fe0d5a8 to 33ee581 Compare November 30, 2024 04:27
@chipsenkbeil chipsenkbeil changed the title Implement image API for #30889 feat(img): implement image API Nov 30, 2024
@chipsenkbeil chipsenkbeil changed the title feat(img): implement image API feat(img): implement image API for absolute positions Nov 30, 2024
@chipsenkbeil chipsenkbeil mentioned this pull request Nov 30, 2024
@chipsenkbeil chipsenkbeil force-pushed the feat/ImageApi branch 4 times, most recently from babd349 to dc51bf3 Compare November 30, 2024 20:56
@chipsenkbeil
Copy link
Author

@justinmk heads up, one complexity that we'll punt for now is supporting non-PNG images. I think we can write a pretty straightforward decoder for BMP & GIF, but JPEG is very complex and would /probably/ need a specialized C function to do it with the assistance of a JPEG-oriented library. This is in order to get RGB or RGBA data.

@kovidgoyal I'm assuming my understanding of pixel formats is correct in that if we fed in any other image format that was not PNG, using f=100 would not work, and we'd need to instead decode the base64 image data, figure out the format (i.e. bmp, jpeg, etc) and then extract a 24-bit RGB or a 32-bit RGBA set of data to feed in order for your protocol to work.

I don't know what iterm2's graphics protocol supports as I've only tested with png and I don't see anything mentioned on their doc page. I also don't know what sixel supports or how it works since I haven't read the documentation yet, but I imagine given the age of sixel that we'd need to support image decoding of some kind to break out rgb/rgba data.

@kovidgoyal
Copy link

kovidgoyal commented Dec 1, 2024 via email

@chipsenkbeil
Copy link
Author

Just a drive by comment on my concern over the inclusion of the iterm2 protocol. Disclaimer: I don't have any experience implementing it, but I have been through the protocol.

So...the kitty graphics protocol lets you transmit an image and very finely control it's placement, including a clip region. By specifying a clip region, it is trivial to "scroll" an image partially off the screen - you can specify the horizontal or vertical offset (in pixels) to clip the image.

Without this clip capability, it seems that neovim would need to have an image processing library as well to internally clip images for display? What would the plan be for an image which gets partially scrolled?

Commenting here that I'm fine and most likely moving forward with removing iterm2 and just using kitty. I was already aware of limitations in iterm2 with cropping. My first thought is to farm externally to a process like image magick to crop, which I "think" can be done without creating a temporary image. So if we ever revisit supporting iterm2, that would be the approach I'd take.

@j4james
Copy link

j4james commented May 1, 2025

My first thought is to farm externally to a process like image magick to crop, which I "think" can be done without creating a temporary image.

Just FYI, on terminals with level 4 capabilities, you can crop an image by rendering it to an offscreen page, and then copying the relevant segments back to the main page. This can also serve as a way to cache images to a certain extent. I'm not sure about the iterm image protocol, but I do know this works with Sixel. The only catch is that DEC pages may not interoperate very well with the Xterm alt buffer mode, assuming that's a requirement.

@chipsenkbeil
Copy link
Author

@gpanders I figured out why io.stdout:write() would not work and I needed to access the tty device directly. Using kitty's direct transfer - you send all of the image bytes directly via escape codes - seems to not work with io.stdout:write() but does work if you access and use the tty device directly.

If you switch to local filesystem access via a file transfer (not escape codes), then it works fine to use io.stdout:write(). I'll document this in the PR.

…ursor logic to kitty provider, and remove unneeded terminal helpers by switching to io.stdout:write()
@chipsenkbeil
Copy link
Author

chipsenkbeil commented May 4, 2025

@justinmk @fredizzimo I've rewritten the provider interface and implemented basic kitty graphics logic to hide an image. This provides a bit of an abstraction between the image (the data) and the placement by having two separate ids. Whenever you show an image, the provider is expected to generate some id that can be passed back to it later to hide/remove the image. Thoughts?

@gpanders I've been able to fully remove the terminal helper code and just use io.stdout:write(). I did keep the cursor move, but removed the restore logic and instead - for the kitty provider - use an option to prevent the cursor moving like you alluded to.

Still got some open questions in this code at this point, but ready for another skim to get thoughts on this one.

-- Load the image from disk. We assume all images are loaded from disk right now, and are PNGs
local img = vim.ui.img.load("/Users/senkwich/projects/neovim-img-test/org-roam-logo.png")

-- Calls the underlying provider (kitty) to show the image, returning an id that can hide it later
local id = img:show({
    pos = { x = 8, y = 8 },
    provider = "kitty",
})

-- For the test, as soon as any key is pressed, the image is hidden
vim.on_key(function()
    img:hide(id)
end)
example-of-deleting-image.mp4

@chipsenkbeil
Copy link
Author

I added in some additional options as an experiment to mirror a bit of what it looks like the floating window api can do when it comes to the relative position of the image, now supporting editor (what you've seen thus far), win to display relative to a specific window, cursor to display relative to a specific window's cursor, and mouse to display relative to the mouse (from last click, unless mousemoveevent is enabled).

Here's a silly preview of an image being displayed where the mouse is presently, and then on move it hides the only image (in kitty, by deleting the placement) and then showing a new image where the cursor is. Seems fairly quick, which is nice.

The reason I did this was to potentially set up what the config might look like to set relative to a buffer, which would then rely on something like the kitty implementation using the unicode placement functionality.

neovim-image-mouse-move.mp4

@chipsenkbeil
Copy link
Author

chipsenkbeil commented May 4, 2025

@gpanders does TermResponse work with CSI escape sequences? Seems like it's documented for just OSC and DCS?

Reason I ask is that one issue popped up with trying to support converting between pixel and cell units, and that's getting the screen size in pixels. I was trying to do this via \027[14t to request the screen size in the form \027[4;888;999t where 888 is height and 999 is width, but I'm not getting TermResponse to trigger nor do I see anything being printed out. Works fine with a lua shell printing it out via io.stdout:write().

I'm assuming it's filtered out as TERMKEY_RES_NONE from termkey_interpret_string based on

if (termkey_interpret_string(input->tk, key, &str) == TERMKEY_RES_KEY) {

FFI alternative

@justinmk the alternative way I've seen this done is using ioctl and TIOCGWINSZ via ffi calls. And you'd have to do something completely different to support windows. Example of ioctl usage from snacks:

https://github.com/folke/snacks.nvim/blob/bc0630e43be5699bb94dadc302c0d21615421d93/lua/snacks/image/terminal.lua#L67-L120

function M.size()
  if size then
    return size
  end
  local ffi = require("ffi")
  ffi.cdef([[
    typedef struct {
      unsigned short row;
      unsigned short col;
      unsigned short xpixel;
      unsigned short ypixel;
    } winsize;
    int ioctl(int, int, ...);
  ]])

  local TIOCGWINSZ = nil
  if vim.fn.has("linux") == 1 then
    TIOCGWINSZ = 0x5413
  elseif vim.fn.has("mac") == 1 or vim.fn.has("bsd") == 1 then
    TIOCGWINSZ = 0x40087468
  end

  local dw, dh = 9, 18
  ---@class snacks.image.terminal.Dim
  size = {
    width = vim.o.columns * dw,
    height = vim.o.lines * dh,
    columns = vim.o.columns,
    rows = vim.o.lines,
    cell_width = dw,
    cell_height = dh,
    scale = dw / 8,
  }

  pcall(function()
    ---@type { row: number, col: number, xpixel: number, ypixel: number }
    local sz = ffi.new("winsize")
    if ffi.C.ioctl(1, TIOCGWINSZ, sz) ~= 0 or sz.col == 0 or sz.row == 0 then
      return
    end
    size = {
      width = sz.xpixel,
      height = sz.ypixel,
      columns = sz.col,
      rows = sz.row,
      cell_width = sz.xpixel / sz.col,
      cell_height = sz.ypixel / sz.row,
      -- try to guess dpi scale
      scale = math.max(1, sz.xpixel / sz.col / 8),
    }
  end)

  return size
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet