Skip to content

refactor: change template archive extraction to be on provisioner #9264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 25, 2023

Conversation

spikecurtis
Copy link
Contributor

fixes #9165

Refactors provisionersdk protocol to transmit template archive and has the provisioner extract the archive, not the daemon.

The new protocol introduces a "Session" concept so that we only transmit the archive once, and then use it for Parse/Plan (template import) and Plan/Apply (workspace build) operations, which are the mainline use of the provisioner.

Easiest to review commit by commit, as the protocol refactor touches a lot of files.

Note that the provisionersdk protocol is entirely within the same process, so there are no back compatibility concerns about protocol changes.

Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment on the PR size

ProvisionApply: []*proto.Provision_Response{{
Type: &proto.Provision_Response_Complete{
Complete: &proto.Provision_Complete{
ProvisionApply: []*proto.Response{{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that this PR does a lot of renaming. Is it possible to extract them to a separate PR? Maybe it could shrink this PR a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really possible to put in a separate PR, since the protocol changes are fundamental, and without renames nothing builds.

I've split it out into different commits to reduce the noise

Copy link
Member

@mtojek mtojek Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that I focus mainly on changes in the CLI package like:

  • echo.ProvisionComplete -> echo.ApplyComplete
  • proto.Provision_Response -> proto.Response
  • completeWithAgent()

but if these are strongly coupled with the protocol, then please ignore the comment.

@spikecurtis spikecurtis changed the title refactor: changes template archive extraction to be on provisioner refactor: change template archive extraction to be on provisioner Aug 23, 2023
if err != nil {
return xerrors.Errorf("mkdir %q: %w", headerPath, err)
}
s.Logger.Debug(context.Background(), "extracted directory", slog.F("path", headerPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: include mode in msg

@@ -1304,7 +1303,10 @@ func newProvisionerDaemon(
defer wg.Done()
defer cancel()

err := echo.Serve(ctx, afero.NewOsFs(), &provisionersdk.ServeOptions{Listener: echoServer})
err := echo.Serve(ctx, &provisionersdk.ServeOptions{
Listener: echoServer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing Logger here?

Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first review round is done.

stream: stream,
server: p.server,
}
sessDir := fmt.Sprintf("Session%p", s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why %p? isn't it better to link it with (PID+something/id)? if the process dies, it would be easier to find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need something unique to the Session, for the lifetime of the Session. The pointer is a simple and easy choice.

Each provisioner needs to be run with a unique work directory so that they don't trample on each others caches, so there is no need to qualify the pointer with a PID.

The PID might make things easier to find... if you know the PID, which is rarely the case after you have a crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point in favour of PID: the PID will at least show up in dmesg in case of an OOMKill.

}
}

service Provisioner {
rpc Parse(Parse.Request) returns (stream Parse.Response);
rpc Provision(stream Provision.Request) returns (stream Provision.Response);
// Session represents provisioning a single template import or workspace. The daemon always sends Config followed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-check: do we need to update our Coder docs? If not, then it might be nice to visualize these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we need to update any docs --- this interface is entirely within the provisioner daemon for now.

One thing it does affect is the format of messages on the Echo provisioner, which technically you could install in a real system. But, its only for testing and we don't publicly document it today (nor do I think we should!).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was thinking about updating this page accordingly, but it can be performed in a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR doesn't change the architecture, but it is true that a subsequent PR will allow the provisioner to be separate from the provisioner daemon, which would be an architecture change.

}
sessDir := fmt.Sprintf("Session%p", s)
s.WorkDirectory = filepath.Join(p.opts.WorkDirectory, sessDir)
err := os.MkdirAll(s.WorkDirectory, 0o700)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking loud: what are the chances that s.WorkDirectory exists, and contains a malicious/old/buggy .tf file, which will be included in the terraform provisioning process?

I'm asking as os.MkdirAll doesn't complain that the directory exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is some chance of this from a restarted process, or failed cleanup. I'll add a call to os.RemoveAll() before.

Signed-off-by: Spike Curtis <spike@coder.com>
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

I will leave the decision about session ID (UUID vs. timestamp vs. pointer) to you as this seems to be rather a lower risk.

Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
@matifali
Copy link
Member

Could this also solve #9246 @mafredri?

@spikecurtis
Copy link
Contributor Author

Could this also solve #9246 @mafredri?

I don't think so. That's an issue with the cache / downloading terraform providers. (This PR is about Coder provisioners).

Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@spikecurtis spikecurtis enabled auto-merge (squash) August 25, 2023 06:03
@spikecurtis spikecurtis merged commit 60d5002 into main Aug 25, 2023
@spikecurtis spikecurtis deleted the spike/9165-provisioner-filesystem branch August 25, 2023 06:10
@github-actions github-actions bot locked and limited conversation to collaborators Aug 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor: Provisioner handles unpacking template to file system, not daemon
4 participants