Skip to content

Junk is being inserted into the web terminal #3666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
johnstcn opened this issue Aug 24, 2022 · 16 comments · Fixed by #8640
Closed

Junk is being inserted into the web terminal #3666

johnstcn opened this issue Aug 24, 2022 · 16 comments · Fixed by #8640
Assignees
Labels
s1 Bugs that break core workflows. Only humans may set this. site Area: frontend dashboard
Milestone

Comments

@johnstcn
Copy link
Member

johnstcn commented Aug 24, 2022

Problem

Right around the time of a Coderd restart, @Emyrk was connected to the web terminal, lost connection, reconnected, and got a bunch of junk inserted into their terminal:

image

image

We poked around in the Web inspector and found that the frontend was sending the content. It seems like there's some kind of replay buffer that's re-sending some terminal escape codes, but whatever had been listening to them is now gone and has no way of acking.

Solution

@deansheather thinks we could just block on the websocket until we're finished rendering.

@kylecarbs kylecarbs changed the title bug: web terminal: junk inserted into console Junk is being inserted into console Aug 24, 2022
@kylecarbs kylecarbs added this to the EE milestone Aug 24, 2022
@kylecarbs kylecarbs changed the title Junk is being inserted into console Junk is being inserted into the web terminal Aug 24, 2022
@kylecarbs kylecarbs added bug site Area: frontend dashboard labels Aug 24, 2022
@code-asher code-asher removed their assignment Sep 28, 2022
@code-asher
Copy link
Member

For whoever works on this the solution we decided on v1 is to use screen: coder/wsep#25

@matifali
Copy link
Member

matifali commented May 24, 2023

Faced this today on f72d8e9
dev.coder.com
image
cc: @BrunoQuaresma

@matifali matifali reopened this May 24, 2023
@matifali matifali removed this from the EE milestone May 24, 2023
@mafredri
Copy link
Member

mafredri commented May 24, 2023

I think this could be accounted to reconnecting-pty buffer size and not taking terminal escape codes into account. I can easily reproduce this problem (junk in the terminal) by making the text size really small (huge terminal) and opening a second window with the same URL, it will type junk into the session.

One potential problem is the escape sequence getting cut off and not turning into an escape, another is that an escape sequence may try to put the terminal into a certain kind of mode that is dependent on another escape sequence that was lost.

Thoughts @kylecarbs? The easiest solution I see is to increase buffer size, but that won't guarantee the problem is fixed. Stripping escape sequences also doesn't seem like a good idea, since we won't get identical output. So we'd be left with trying to "render" the terminal in memory and output the result, this way we can calculate the buffer needed. But that seems like a bad idea from a security perspective, and a lot of work.

EDIT: I may have been off on cause/how to repro, suddenly I was no longer able to do it, but I found one way that works each time:

  1. Open the terminal: https://dev.coder.com/@mafredri/w.dev/terminal?reconnect=[id]
  2. Run tmux
  3. Exit tmux
  4. Reload terminal or open a new window with URL from 1.

Didn't even need to make the font smaller. This may be unique to my setup/dotfiles but interestingly this doesn't happen without the reload/new window.

@matifali matifali added the s1 Bugs that break core workflows. Only humans may set this. label May 24, 2023
@mafredri
Copy link
Member

mafredri commented May 24, 2023

I made an asciinema recording with a minimal shell setup: https://asciinema.org/a/fwLHGVx1zbIRtpYaB0isQz2mw

Uses /bin/sh and no tmux config, yet the issue persist on reloading the page. There's no way we filled the reconnecting pty buffer at this point so this may be an xterm.js bug.

When we reload the page, we get the initial payload:

00000000: 2420 746d 7578 0d0a 1b5b 3f31 3034 3968  $ tmux...[?1049h
00000001: 1b5b 3232 3b30 3b30 741b 5b3f 3168 1b3d  .[22;0;0t.[?1h.=
00000002: 1b5b 481b 5b32 4a1b 5b3f 3132 6c1b 5b3f  .[H.[2J.[?12l.[?
00000003: 3235 681b 5b3f 3130 3030 6c1b 5b3f 3130  25h.[?1000l.[?10
00000004: 3032 6c1b 5b3f 3130 3033 6c1b 5b3f 3130  02l.[?1003l.[?10
00000005: 3036 6c1b 5b3f 3130 3035 6c1b 2842 1b5b  06l.[?1005l.(B.[
00000006: 6d1b 5b3f 3132 6c1b 5b3f 3235 681b 5b3f  m.[?12l.[?25h.[?
00000007: 3130 3036 6c1b 5b3f 3130 3030 6c1b 5b3f  1006l.[?1000l.[?
00000008: 3130 3032 6c1b 5b3f 3130 3033 6c1b 5b3f  1002l.[?1003l.[?
00000009: 3230 3034 6c1b 5b31 3b31 481b 5b31 3b33  2004l.[1;1H.[1;3
0000000a: 3472 1b5b 3e63 1b5b 3e71 1b5b 313b 3148  4r.[>c.[>q.[1;1H
0000000b: 1b5b 3f32 356c 1b5b 4b0d 0a1b 5b4b 0d0a  .[?25l.[K...[K..
0000000c: 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b  .[K...[K...[K...
0000000d: 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b  [K...[K...[K...[
0000000e: 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b  K...[K...[K...[K
0000000f: 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d  ...[K...[K...[K.
00000010: 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a  ..[K...[K...[K..
00000011: 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b  .[K...[K...[K...
00000012: 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b  [K...[K...[K...[
00000013: 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b  K...[K...[K...[K
00000014: 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d  ...[K...[K...[K.
00000015: 0a1b 5b4b 0d0a 1b5b 4b1b 5b33 306d 1b5b  ..[K...[K.[30m.[
00000016: 3432 6d0d 0a5b 305d 2030 3a73 682a 2020  42m..[0] 0:sh*  
00000017: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000018: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000019: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000001a: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000001b: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000001c: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000001d: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000001e: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000001f: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000020: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000021: 2020 2020 2020 2020 2020 2277 2220 3136            "w" 16
00000022: 3a30 3420 3234 2d4d 6179 2d32 331b 2842  :04 24-May-23.(B
00000023: 1b5b 6d1b 5b3f 3132 6c1b 5b3f 3235 681b  .[m.[?12l.[?25h.
00000024: 5b31 3b31 481b 2842 1b5b 6d1b 5b3f 3132  [1;1H.(B.[m.[?12
00000025: 6c1b 5b3f 3235 681b 5b3f 3130 3036 6c1b  l.[?25h.[?1006l.
00000026: 5b3f 3130 3030 6c1b 5b3f 3130 3032 6c1b  [?1000l.[?1002l.
00000027: 5b3f 3130 3033 6c1b 5b3f 3230 3034 6c1b  [?1003l.[?2004l.
00000028: 5b31 3b31 481b 5b31 3b33 3472 1b5b 313b  [1;1H.[1;34r.[1;
00000029: 3148 1b5b 3f32 356c 1b5b 4b0d 0a1b 5b4b  1H.[?25l.[K...[K
0000002a: 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d  ...[K...[K...[K.
0000002b: 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a  ..[K...[K...[K..
0000002c: 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b  .[K...[K...[K...
0000002d: 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b  [K...[K...[K...[
0000002e: 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b  K...[K...[K...[K
0000002f: 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d  ...[K...[K...[K.
00000030: 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a  ..[K...[K...[K..
00000031: 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b 4b0d 0a1b  .[K...[K...[K...
00000032: 5b4b 0d0a 1b5b 4b0d 0a1b 5b4b 0d0a 1b5b  [K...[K...[K...[
00000033: 4b0d 0a1b 5b4b 0d0a 1b5b 4b1b 5b33 306d  K...[K...[K.[30m
00000034: 1b5b 3432 6d0d 0a5b 305d 2030 3a73 682a  .[42m..[0] 0:sh*
00000035: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000036: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000037: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000038: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000039: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000003a: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000003b: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000003c: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000003d: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000003e: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000003f: 2020 2020 2020 2020 2020 2020 2277 2220              "w" 
00000040: 3136 3a30 3420 3234 2d4d 6179 2d32 331b  16:04 24-May-23.
00000041: 2842 1b5b 6d1b 5b3f 3132 6c1b 5b3f 3235  (B.[m.[?12l.[?25
00000042: 681b 5b31 3b31 4824 201b 5b3f 3737 3237  h.[1;1H$ .[?7727
00000043: 680d 0a1b 5b31 3b33 3472 1b28 421b 2842  h...[1;34r.(B.(B
00000044: 1b5b 6d1b 5b3f 316c 1b3e 1b5b 481b 5b32  .[m.[?1l.>.[H.[2
00000045: 4a1b 5b3f 3132 6c1b 5b3f 3235 681b 5b3f  J.[?12l.[?25h.[?
00000046: 3130 3030 6c1b 5b3f 3130 3032 6c1b 5b3f  1000l.[?1002l.[?
00000047: 3130 3033 6c1b 5b3f 3130 3036 6c1b 5b3f  1003l.[?1006l.[?
00000048: 3130 3035 6c1b 5b3f 3737 3237 6c1b 5b3f  1005l.[?7727l.[?
00000049: 3130 3034 6c1b 5b3f 3130 3439 6c1b 5b32  1004l.[?1049l.[2
0000004a: 333b 303b 3074 5b65 7869 7465 645d 0d0a  3;0;0t[exited]..
0000004b: 2420                                     $ 

But after we receive it, we send out the following response:

00000000: 7b22 6461 7461 223a 225c 7530 3031 625b  {"data":"\u001b[
00000001: 3e30 3b32 3736 3b30 6322 7d              >0;276;0c"}

So something in the output triggered a response.

I do believe control characters at the edge of the buffer constitutes a problem as well, so we may be dealing with two separate but kind of related issues here.

@code-asher
Copy link
Member

code-asher commented May 24, 2023

Yup this matches with my findings when we fixed this on v1, the client is responding to control codes in the buffer asking for its capabilities, cursor positions, things like that. But those responses are no longer expected for one reason or another (maybe the application that asked for them has exited for example) and get interpreted as regular input.

I think the only reliable solutions are to actually render the terminal by using a full terminal emulator on the backend (ideally one that produces identical output to xterm.js; xterm-headless itself would obviously be ideal especially since it is capable of serializing its state for precisely this purpose but we would need to run Node), or use something like screen.

https://github.com/coder/v1/issues/12909#issuecomment-1226202016

@mafredri
Copy link
Member

Thanks for sharing that @code-asher, makes perfect sense! Your reply gives me the idea that we could just strip out the problematic escape sequences. There are not many that I know of that would be problematic.

Furthermore, I believe we can eliminate a slew of problems by making the ring buffer newline aware and evicting whole lines at a time (or let the reader do it, doesn’t matter).

@code-asher
Copy link
Member

code-asher commented May 24, 2023

That was actually the approach I first took but I was not sure a limited buffer would work in general; take for example a control character saying to "move down one", this will put the cursor in a different location than the first go around if there was a "move to the center of the screen" control code that has since been evicted from the buffer.

To recreate the current cursor state (as just one example) from a history of movement commands you would need the full history which means a buffer of unbounded size.

Otherwise I think it would work but I also found I was reading through the spec to make sure I got all the possible codes that could cause an issue and I suddenly realized I was starting down a rabbit hole of implementing a full-blown terminal emulator because I was not confident it could be reliably done by just a regex replacement or something like that and figured a proper parser was the way to go. 😆

https://invisible-island.net/xterm/ctlseqs/ctlseqs.html < I am not sure if this is the entire spec since it does not even mention capabilities.

@mafredri
Copy link
Member

mafredri commented May 25, 2023

I do see where your coming from @code-asher and those are valid points. It would be quite elegant to render it server side, I agree. One problem with going the screen or tmux, though, is that in v2 we can't rely on either of these programs being present. We also currently have no facility for shipping such binaries to a workspace (not to mention in a platform independent way). So far we've managed to implement all features in Go which lets us have the single agent binary.

My next question would be, how do we perform terminal updates in a performant manner with SSR? Do we perform some sort of diffing for the currently rendered screen and attempt to deliver intelligent line updates or do we serialize the entire screen all the time?

I'm not entirely convinced that the loss of control characters past the circular buffer will be a huge problem for v2 considering the buffer is currently 64 KB in size (which is pretty big). That's because, if we discard based on newlines, we remove many problematic cases of control characters and (hopefully) only a little bit at the start of the scroll-back buffer may be corrupted (if at all).

See the following example:
image

The newline resets the cursor position to start-of-row and is thus a pretty good indicator of "we're safe now".

Some further mitigations we could do:

  1. Discard the first terminal $ROWS of rendered output, if we do this client side we lose a bit of scroll-back but ensure that the output is much more likely to be clean
  2. Set a maximum rows/cols for the web terminal, this is to give us an estimate that a single page of output never exceeds N% of the ring buffer

We could optionally also enable infinite scroll-back by storing it on the filesystem, but I guess this should be a user or terraform setting.

@code-asher
Copy link
Member

code-asher commented May 25, 2023

In v1 we technically have no guarantee either, basically if you install screen in your image you get reconnections and if you do not then it degrades to no reconnection.

Yeah my concern with lost control characters is mostly just academic at this point; I have not actually experimented with reproducing. I think maybe you could see it if you launch an editor like Vim and leave it running for a while with heavy use, maybe? I think it will not be such a big problem with the regular command prompt, if at all. Trying to replay the history of everything Vim/Emacs/etc has been doing seems fraught with risk to me.

@code-asher
Copy link
Member

code-asher commented May 25, 2023

Another thing I tried was not buffering anything that printed in the altscreen (which would cover Vim, Emacs, etc) and then trying to send a SIGWINCH to every child process on reconnect to force it to redraw but unfortunately that did not work as some applications appear to ignore the signal. I also tried sending a resize but some applications will not redraw if the size is the same so I tried resizing smaller once and then back up to the expected size to force a redraw but that also did not work because some applications seem to debounce the resize. I am also not sure all apps even enter altscreen to begin with.

@code-asher
Copy link
Member

code-asher commented May 25, 2023

I can reproduce some weirdness if I launch Emacs, split the window, open a file in one and eshell in the other, then run printf 'A%.0s' {1..128000} in eshell and reload.

Actually on a related note any application that enters the altscreen starts by sending 1049h right? That could be pretty easily lost if you use the editor for a bit and then on reload xterm would never enter the altscreen? Ohhhhhh that probably explains why sometimes scrolling stops working! It must be scrolling xterm rather than sending to the app because it never entered altscreen mode.

Really any control character that sets some kind of mode will have this problem, even outside the altscreen.

The buffer also explains why when I reconnect sometimes I see flashes of old states (for example I see flashes of previous runs of Emacs as xterm "catches up" writing out the rest of the buffer).

@matifali
Copy link
Member

If we are talking about terminal then as a sperate issue. Press any key to reconnect actually doesn't work on terminal. It does reconnect on pressing Tab, Ctrl etc.

@code-asher
Copy link
Member

code-asher commented May 25, 2023

If we are adverse to relying on screen maybe something like https://github.com/hinshun/vt10x could get us most of the way there. There could be implementation differences but probably nothing severe?

@mafredri
Copy link
Member

If we are talking about terminal then as a sperate issue. Press any key to reconnect actually doesn't work on terminal. It does reconnect on pressing Tab, Ctrl etc.

@matifali #6732

@code-asher You're likely on to something there. I'm unable to reproduce any altscreen issues with vim or tmux, but that's most likely due those applications re-rendering on screen resize (which happens on reload).

Exploring vt10x seems like a solid option, although I am a bit nervous about embedding it in the agent process. The more bloat we add to the agent, the more likely it will become a point of failure (unforeseen panics, increased memory usage resulting in OOM killing, etc) which would cripple the entire workspace. I'm not too worried about it in the immediate future, but something we may have to give some thought to down the line.

If we do go down the SSR path, it'd be nice to see a plan for protocol updates/bandwidth utilization.

Another option would be for us to have special handling of altscreen (separate buffer?), but obviously the more exceptions we add, the closer we get to implementing a terminal emulator ourselves. 😄

@code-asher
Copy link
Member

I recorded a reproduction of the scroll issue with vim if you are curious what it looks like. Here I scroll a bit to show the cursor moving up/down, run a command to generate some output, and finally cut/paste it a bunch to make sure I fill up the buffer. On reload you see some flashing from old buffer contents being rendered and then finally scrolling is permanently broken (well at least until vim is restarted).

vim.mp4

I bet if we dig into the codes there are other reproductions that can occur due to evictions especially with modes of which there appear to be many.

the more exceptions we add, the closer we get to implementing a terminal emulator ourselves.

Hahaha that was my thinking exactly as I was going down the rabbit hole. Every mode we check and every code we strip out gets us a little further.

I think screen is still a good option although it means users have to make sure it exists in their image if they want reconnectable terminals. We could surface this information to the user, maybe even in the terminal itself? For example we could print a message saying "install screen if you want to be able to reconnect" or something.

If we do go down the SSR path, it'd be nice to see a plan for protocol updates/bandwidth utilization.

I am not sure how screen handles screen updates although I would (perhaps naively) assume they are optimal since it was probably used back in the day when bandwidth was an even bigger deal than it is today.

With vt10x the only way I can think of to make that work with xterm.js is to tee the input into both the tty and vt10x and then on reconnect we ask vt10x for the scrollback and send it all to the client. We would want to limit the scrollback size which I think we would not need to do with screen. On the flip side scrolling with screen would be slower since it would need to talk to the server to do that.

It is a bit weird since both vt10x and xterm.js are doing the same/similar emulating and they are only tangentially connected; hopefully they end up rendering the same way. Maybe there is something we could replace xterm.js with and hook it directly into vt10x somehow.

@matifali
Copy link
Member

matifali commented Jun 2, 2023

#7765 may be related here.

@matifali matifali added the bug label Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s1 Bugs that break core workflows. Only humans may set this. site Area: frontend dashboard
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants