-
Notifications
You must be signed in to change notification settings - Fork 875
feat: add provisioning timings to understand slow build times #14274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
…me resource are captured Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
✅ Meticulous spotted zero visual differences across 1408 screens tested: view results. Meticulous tested none of the executable lines edited in this PR. If this PR does not modify your frontend this is expected. If this is a new feature or Meticulous was recently introduced then Meticulous will generate tests to cover your code in the coming days1. 1. If you wish to increase coverage immediately you can do so by interacting with your feature on localhost. Expected differences? Click here. Last updated for commit eb5ec5c. This comment will update as new commits are pushed. |
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
24477d6
to
28fa2f7
Compare
Signed-off-by: Danny Kopping <danny@coder.com>
@@ -2787,6 +2787,14 @@ func (q *querier) InsertProvisionerJobLogs(ctx context.Context, arg database.Ins | |||
return q.db.InsertProvisionerJobLogs(ctx, arg) | |||
} | |||
|
|||
// TODO: We need to create a ProvisionerJob resource type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See InsertProvisionerJobLogs
and InsertProvisionerJob
above
@@ -2463,6 +2463,13 @@ func (s *MethodTestSuite) TestSystemFunctions() { | |||
JobID: j.ID, | |||
}).Asserts( /*rbac.ResourceSystem, policy.ActionCreate*/ ) | |||
})) | |||
s.Run("InsertProvisionerJobTimings", s.Subtest(func(db database.Store, check *expects) { | |||
// TODO: we need to create a ProvisionerJob resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
pj.worker_id, | ||
pj.error, | ||
pj.error_code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are extraneous right now, but when we slap an API on top of this we'll likely need this info and can save ourselves another database call.
GREATEST(EXTRACT(EPOCH FROM ( | ||
MAX(CASE WHEN pjt.stage = 'init'::provisioner_job_timing_stage THEN pjt.ended_at END) - | ||
MIN(CASE WHEN pjt.stage = 'init'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS init_secs, | ||
GREATEST(EXTRACT(EPOCH FROM ( | ||
MAX(CASE WHEN pjt.stage = 'plan'::provisioner_job_timing_stage THEN pjt.ended_at END) - | ||
MIN(CASE WHEN pjt.stage = 'plan'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS plan_secs, | ||
GREATEST(EXTRACT(EPOCH FROM ( | ||
MAX(CASE WHEN pjt.stage = 'graph'::provisioner_job_timing_stage THEN pjt.ended_at END) - | ||
MIN(CASE WHEN pjt.stage = 'graph'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS graph_secs, | ||
GREATEST(EXTRACT(EPOCH FROM ( | ||
MAX(CASE WHEN pjt.stage = 'apply'::provisioner_job_timing_stage THEN pjt.ended_at END) - | ||
MIN(CASE WHEN pjt.stage = 'apply'::provisioner_job_timing_stage THEN pjt.started_at END))), 0) AS apply_secs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the max(ended_at)-min(started_at) of each stage to determine stage timings.
@@ -539,46 +556,42 @@ func readAndLog(sink logSink, r io.Reader, done chan<- any, level proto.LogLevel | |||
// provisionLogWriter creates a WriteCloser that will log each JSON formatted terraform log. The WriteCloser must be | |||
// closed by the caller to end logging, after which the returned channel will be closed to indicate that logging of the | |||
// written data has finished. Failure to close the WriteCloser will leak a goroutine. | |||
func provisionLogWriter(sink logSink) (io.WriteCloser, <-chan any) { | |||
func (e *executor) provisionLogWriter(sink logSink) (io.WriteCloser, <-chan any) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these under the executor
receiver to get access to the logger.
@@ -609,12 +676,25 @@ func convertTerraformLogLevel(logLevel string, sink logSink) proto.LogLevel { | |||
} | |||
|
|||
type terraformProvisionLog struct { | |||
Level string `json:"@level"` | |||
Message string `json:"@message"` | |||
Level string `json:"@level"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded this type to unmarshal additional fields from the existing JSON logs.
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, some minor nits and comments but looks good in general 👍🏻
// protoc-gen-go v1.30.0 | ||
// protoc v4.23.3 | ||
// protoc-gen-go v1.34.2 | ||
// protoc v4.25.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we don't lock these versions.
In my dogfood workspace running nix, i've got:
$ protoc-gen-go --version
protoc-gen-go v1.34.2
$ protoc --version
libprotoc 25.3
$ which protoc protoc-gen-go
/nix/var/nix/profiles/default/bin/protoc
/nix/var/nix/profiles/default/bin/protoc-gen-go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think CI uses the 'regular' dogfood image as opposed to 'nix'. This probably isn't a huge issue tbh but just wanted to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would beg to differ.
If our dev environments' and our CI environment's dependencies are not in lock-step, this could lead not only to CI frustrations like the one above but to something more pernicious. We're either getting false-positives or false-negatives: any alignment is temporary/incidental, and we have no source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #14343 as a follow-up.
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
b5218be
to
597ec85
Compare
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
…on-detailed-apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any blocking changes here, but it would be good to verify the situation with external provisioners on upgrade.
This PR productionizes a spike I conducted during coder/internal#22.
Terraform exposes discrete timing information for each resource when planning and applying, and we're now using this to build up a view of which resources or stages take the most time. I'm hooking into our existing parsing of the Terraform log lines and expanding it a bit; there were some necessary refactors but I've tried to keep it as low-touch as possible.
This PR does not contain any HTTP APIs or UIs. Those will have to be added in subsequent PRs; this one is already fairly large.
There are tests, and they cover the majority of the new code added, but they're incomplete. I had to do some fairly nasty business in
provisioner/terraform/timings_test.go
since we don't have a seam in front of calls toos/exec
which makes it difficult to mock. There's some precedent for this though inprovisioner/terraform/provision_test.go
from which I took heavy inspiration.We also don't have any active e2e tests which use terraform, which I think is something worth addressing.
I also included
txtar
after watching Russ Cox's excellent talk on Go testing recently, and it made the tests quite readable - at least I think so. If you disagree, let me know.Using this information, we'll be able to add cool graphs like this to workspace detail pages and probably also expose some aggregated data in Prometheus metrics.