Skip to content

feat: add provisioning timings to understand slow build times #14274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 21, 2024

Conversation

dannykopping
Copy link
Contributor

@dannykopping dannykopping commented Aug 15, 2024

This PR productionizes a spike I conducted during coder/internal#22.

Terraform exposes discrete timing information for each resource when planning and applying, and we're now using this to build up a view of which resources or stages take the most time. I'm hooking into our existing parsing of the Terraform log lines and expanding it a bit; there were some necessary refactors but I've tried to keep it as low-touch as possible.

This PR does not contain any HTTP APIs or UIs. Those will have to be added in subsequent PRs; this one is already fairly large.

There are tests, and they cover the majority of the new code added, but they're incomplete. I had to do some fairly nasty business in provisioner/terraform/timings_test.go since we don't have a seam in front of calls to os/exec which makes it difficult to mock. There's some precedent for this though in provisioner/terraform/provision_test.go from which I took heavy inspiration.

We also don't have any active e2e tests which use terraform, which I think is something worth addressing.

I also included txtar after watching Russ Cox's excellent talk on Go testing recently, and it made the tests quite readable - at least I think so. If you disagree, let me know.


Using this information, we'll be able to add cool graphs like this to workspace detail pages and probably also expose some aggregated data in Prometheus metrics.

image

Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
…me resource are captured

Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Copy link

alwaysmeticulous bot commented Aug 15, 2024

✅ Meticulous spotted zero visual differences across 1408 screens tested: view results.

Meticulous tested none of the executable lines edited in this PR. If this PR does not modify your frontend this is expected. If this is a new feature or Meticulous was recently introduced then Meticulous will generate tests to cover your code in the coming days1.

1. If you wish to increase coverage immediately you can do so by interacting with your feature on localhost.

Expected differences? Click here. Last updated for commit eb5ec5c. This comment will update as new commits are pushed.

Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
@dannykopping dannykopping force-pushed the dk/provision-detailed-apply branch from 24477d6 to 28fa2f7 Compare August 15, 2024 13:36
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
@@ -2787,6 +2787,14 @@ func (q *querier) InsertProvisionerJobLogs(ctx context.Context, arg database.Ins
return q.db.InsertProvisionerJobLogs(ctx, arg)
}

// TODO: We need to create a ProvisionerJob resource type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See InsertProvisionerJobLogs and InsertProvisionerJob above

@@ -2463,6 +2463,13 @@ func (s *MethodTestSuite) TestSystemFunctions() {
JobID: j.ID,
}).Asserts( /*rbac.ResourceSystem, policy.ActionCreate*/ )
}))
s.Run("InsertProvisionerJobTimings", s.Subtest(func(db database.Store, check *expects) {
// TODO: we need to create a ProvisionerJob resource
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Comment on lines +23 to +25
pj.worker_id,
pj.error,
pj.error_code,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are extraneous right now, but when we slap an API on top of this we'll likely need this info and can save ourselves another database call.

Comment on lines +30 to +41
GREATEST(EXTRACT(EPOCH FROM (
MAX(CASE WHEN pjt.stage = 'init'::provisioner_job_timing_stage THEN pjt.ended_at END) -
MIN(CASE WHEN pjt.stage = 'init'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS init_secs,
GREATEST(EXTRACT(EPOCH FROM (
MAX(CASE WHEN pjt.stage = 'plan'::provisioner_job_timing_stage THEN pjt.ended_at END) -
MIN(CASE WHEN pjt.stage = 'plan'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS plan_secs,
GREATEST(EXTRACT(EPOCH FROM (
MAX(CASE WHEN pjt.stage = 'graph'::provisioner_job_timing_stage THEN pjt.ended_at END) -
MIN(CASE WHEN pjt.stage = 'graph'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS graph_secs,
GREATEST(EXTRACT(EPOCH FROM (
MAX(CASE WHEN pjt.stage = 'apply'::provisioner_job_timing_stage THEN pjt.ended_at END) -
MIN(CASE WHEN pjt.stage = 'apply'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS apply_secs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the max(ended_at)-min(started_at) of each stage to determine stage timings.

@@ -539,46 +556,42 @@ func readAndLog(sink logSink, r io.Reader, done chan<- any, level proto.LogLevel
// provisionLogWriter creates a WriteCloser that will log each JSON formatted terraform log. The WriteCloser must be
// closed by the caller to end logging, after which the returned channel will be closed to indicate that logging of the
// written data has finished. Failure to close the WriteCloser will leak a goroutine.
func provisionLogWriter(sink logSink) (io.WriteCloser, <-chan any) {
func (e *executor) provisionLogWriter(sink logSink) (io.WriteCloser, <-chan any) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these under the executor receiver to get access to the logger.

@@ -609,12 +676,25 @@ func convertTerraformLogLevel(logLevel string, sink logSink) proto.LogLevel {
}

type terraformProvisionLog struct {
Level string `json:"@level"`
Message string `json:"@message"`
Level string `json:"@level"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded this type to unmarshal additional fields from the existing JSON logs.

Signed-off-by: Danny Kopping <danny@coder.com>
@dannykopping dannykopping changed the title Dk/provision detailed apply feat: add provisioning timings to understand slow build times Aug 15, 2024
Signed-off-by: Danny Kopping <danny@coder.com>
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, some minor nits and comments but looks good in general 👍🏻

Comment on lines 3 to 4
// protoc-gen-go v1.30.0
// protoc v4.23.3
// protoc-gen-go v1.34.2
// protoc v4.25.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Good catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we don't lock these versions.
In my dogfood workspace running nix, i've got:

$ protoc-gen-go --version
protoc-gen-go v1.34.2
$ protoc --version
libprotoc 25.3

$ which protoc protoc-gen-go
/nix/var/nix/profiles/default/bin/protoc
/nix/var/nix/profiles/default/bin/protoc-gen-go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CI uses the 'regular' dogfood image as opposed to 'nix'. This probably isn't a huge issue tbh but just wanted to double-check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would beg to differ.

If our dev environments' and our CI environment's dependencies are not in lock-step, this could lead not only to CI frustrations like the one above but to something more pernicious. We're either getting false-positives or false-negatives: any alignment is temporary/incidental, and we have no source of truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #14343 as a follow-up.

@dannykopping dannykopping marked this pull request as ready for review August 15, 2024 18:22
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
@dannykopping dannykopping force-pushed the dk/provision-detailed-apply branch from b5218be to 597ec85 Compare August 19, 2024 10:24
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any blocking changes here, but it would be good to verify the situation with external provisioners on upgrade.

@dannykopping dannykopping merged commit 6960d19 into main Aug 21, 2024
34 checks passed
@dannykopping dannykopping deleted the dk/provision-detailed-apply branch August 21, 2024 12:19
@github-actions github-actions bot locked and limited conversation to collaborators Aug 21, 2024
@matifali matifali added the roadmap https://coder.com/roadmap. Only humans may set this. label Oct 22, 2024
@coder coder unlocked this conversation Nov 14, 2024
@coder coder locked as resolved and limited conversation to collaborators Nov 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
roadmap https://coder.com/roadmap. Only humans may set this.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants