Skip to content

fix: use screen for reconnecting terminal sessions if available #8640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Aug 14, 2023

Conversation

code-asher
Copy link
Member

@code-asher code-asher commented Jul 21, 2023

This fixes #3666 and similar issues.

I copied much of this from v1 (coder/wsep#25) but one major difference is that I was asked to keep the existing behavior as a fallback so I broke out parts into an interface then moved the existing code into said interface.

I think having a fallback makes more sense than it did for v1 since we have macOS and Windows that are not using screen yet (I have not looked into whether screen can run on Windows yet and on macOS it works for me locally but in CI screen will sometimes refuse to find a session even if you screen -list to verify that exact name does in fact exist).

@code-asher code-asher force-pushed the asher/reconnection-with-screen branch 5 times, most recently from a7468cd to bdd4a6a Compare July 21, 2023 02:49
@code-asher code-asher changed the title fix: use screen for reconnecting terminal sessions fix: use screen for reconnecting terminal sessions if available Jul 21, 2023
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch 23 times, most recently from 9ae2251 to 59aceb9 Compare July 26, 2023 21:16
In some cases it might get closed but it is not a guarantee like the
comment claims it is.

While we could implement this behavior properly we already close the
connection in the caller so it seems unnecessary.
Still need to figure out a way to test both, but we would rather not add
the ability to select the backend through the API.
Since we might add to the map after it closes, tweak the delete to also
run when the pty closes.

I think this also fixes that issue where if you canceled the context you
pass into Attach it would not do anything since readConnLoop does not
use the context for anything but logging.
We want to send the error from failing to wait for the version command,
not the error from closing/killing the pty or process.
It is possible for the process to immediately exit (for example if you
run `echo hello`), then we would wait for the `version` command to
succeed but it never will because the session is already gone.

If we immediately read to the process then we can tell when it has gone
away and we can abort.
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch from b90c051 to 7a8ec2e Compare August 10, 2023 01:43
Since we used the same var all I did with my other change was use the
same logger anyway.

Now it will not have the connection ID, which is correct since the
pty is not tied to a single connection.
Copy link
Contributor

@spikecurtis spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I do think it would be an improvement to consolidate the bufferedReconnectingPTY down to a single lock, as I mentioned in a comment thread, but what you have now is fine.

I moved the conn closes back to the lifecycle, too.
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch from 5f905f5 to 9b88a68 Compare August 11, 2023 00:23
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch from 83ac118 to 9fb4230 Compare August 11, 2023 01:54
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch from 9fb4230 to d4170ca Compare August 11, 2023 01:54
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch 2 times, most recently from a83cb61 to b30bc20 Compare August 11, 2023 18:43
We do not really do anything with it other than just return it, and if
we do need to do something with it we can just do it on the return from
`waitForStateOrContext`, we probably would not need to use it behind the
mutex.

Also we should check the state outside since there is technically no
guarantee an error is set on a state change (although in practice for
close/done there always is, maybe this should be enforced in some way).
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch from b30bc20 to 88a6b96 Compare August 11, 2023 18:44
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch from 802b2cc to 748cb33 Compare August 14, 2023 17:51
@code-asher code-asher force-pushed the asher/reconnection-with-screen branch from 748cb33 to 968526d Compare August 14, 2023 17:53
@code-asher code-asher merged commit b993cab into main Aug 14, 2023
@code-asher code-asher deleted the asher/reconnection-with-screen branch August 14, 2023 19:19
@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2023
@code-asher
Copy link
Member Author

For posterity's sake, I made a mistake because I forgot waiting on the cond releases the lock; hotfix can be found here: #9094

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Junk is being inserted into the web terminal
2 participants