Skip to content

feat: add external provisioner daemon helm chart #8939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 10, 2023

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Aug 7, 2023

Adds a new helm chart for external provisioner daemons.

Refactors existing helm chart to pull common templates into a libcoder library chart. This is in the first commit, so you might find it easier to review commit-by-commit.

This PR doesn't ship the new helm chart to our repository. I'll follow up in another PR since this one is big already.

Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
@spikecurtis spikecurtis force-pushed the spike/8243-helm-ext-provisioner branch from 306fdd8 to bec7eb9 Compare August 8, 2023 12:37
@spikecurtis spikecurtis marked this pull request as ready for review August 8, 2023 12:45
@spikecurtis spikecurtis requested review from mafredri and mtojek August 8, 2023 12:45
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some QA to verify the PR.

  1. Maybe you have planned to introduce it as a follow-up, but it might be worth flushing out updated docs aligned with these changes. Alternatively, you can write down some CONTRIBUTING for development purposes.
  2. Is there an option to add some prechecks to block starting the deployment if env/properties are missing?

For instance:

kubectl create secret generic coder-provisioner-psk -n coder --from-literal=psk=aaaaaa
helm install -f values.yaml -n coder --set coder.image.tag=latest coder-provisioner .

kubectl get pods -n coder
NAME                                READY   STATUS             RESTARTS       AGE
coder-db-postgresql-0               1/1     Running            0              22m
coder-provisioner-67c786b98-8jkts   0/1     CrashLoopBackOff   7 (3m1s ago)   14m

kubectl logs coder-provisioner-67c786b98-8jkts -n coder
running command "coder provisionerd start": You are not logged in. Try logging in using 'coder login <url>'.:
    github.com/coder/coder/cli.init
        /home/runner/actions-runner/_work/coder/coder/cli/root.go:74

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity: why do we need to store .tgz in the repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't then we need to do a helm dependency build before using the chart. The prior chart was usable from a clean git checkout, so I thought we'd like to keep that property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. SGTM.

@@ -0,0 +1,13 @@
apiVersion: v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: helm/coder/ has dedicated README.md file. Is it worth adding it to libcoder and provisioner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add one to provisioner anyway when I write the docs PR. The README.md is pretty basic and mainly points to our docs.

Copy link
Member

@mtojek mtojek Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this order as long as it does not invalidate any instructions in Coder docs.

Comment on lines 24 to 26
var UpdateGoldenFiles = flag.Bool("update", false, "Update golden files")

var TestCases = []TestCase{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are these intended to be exposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to be; I copied from the existing coder chart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity check: will this docs page be still valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The existing coder chart is refactored to pull out the libcoder stuff so I could reuse it for the provisionerd chart, but functionally, it should be the same. The only change is to add a new optional field to values.yaml to set the PSK secret.

@spikecurtis
Copy link
Contributor Author

Maybe you have planned to introduce it as a follow-up, but it might be worth flushing out updated docs aligned with these changes. Alternatively, you can write down some CONTRIBUTING for development purposes.

Yeah, good point. I know I need to write docs for the new chart, but didn't mention it in what was missing. I'll address in a follow on PR.

@spikecurtis
Copy link
Contributor Author

spikecurtis commented Aug 9, 2023

Is there an option to add some prechecks to block starting the deployment if env/properties are missing?
For instance:

kubectl create secret generic coder-provisioner-psk -n coder --from-literal=psk=aaaaaa
helm install -f values.yaml -n coder --set coder.image.tag=latest coder-provisioner .

kubectl get pods -n coder
NAME READY STATUS RESTARTS AGE
coder-db-postgresql-0 1/1 Running 0 22m
coder-provisioner-67c786b98-8jkts 0/1 CrashLoopBackOff 7 (3m1s ago) 14m

kubectl logs coder-provisioner-67c786b98-8jkts -n coder
running command "coder provisionerd start": You are not logged in. Try logging in using 'coder login '.:
github.com/coder/coder/cli.init
/home/runner/actions-runner/_work/coder/coder/cli/root.go:74

I think what's happening in your example isn't that any fields are missing --- the only required field is the image tag, which you've set. The problem is that latest is v2.0.0, which is prior to #8877 that adds the PSK. So coder provisionerd start is looking for a session token and sees that you don't have one. It doesn't understand the PSK environment variable we set.

So, that shouldn't happen once we actually ship a new version.

@matifali
Copy link
Member

matifali commented Aug 9, 2023

@spikecurtis: For testing, you can use ghcr.io/coder/coder-preview:main as we build an image for each commit to the main.

@mtojek
Copy link
Member

mtojek commented Aug 9, 2023

Thanks for the hint, @matifali!

I managed to start the provisioner:

kubectl get pods -n coder
NAME                                READY   STATUS    RESTARTS   AGE
coder-589cdbd87c-cw9fs              1/1     Running   0          21m
coder-db-postgresql-0               1/1     Running   0          3h29m
coder-provisioner-7f66b7b5c-dv2mk   1/1     Running   0          9m55s
➜  provisioner git:(spike/8243-helm-ext-provisioner) ✗ kubectl logs coder-provisioner-7f66b7b5c-dv2mk -n coder -f
2023-08-09 12:09:32.815 [info]  starting provisioner daemon  tags={}

Few suggestions:

  • I started with zero-knowledge about necessary configuration and saw a couple of error logs:
2023-08-09 11:57:43.619 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: You aren't allowed to create provisioner daemons"
2023-08-09 11:57:53.628 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: You aren't allowed to create provisioner daemons"
2023-08-09 11:58:03.634 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: You aren't allowed to create provisioner daemons"
2023-08-09 11:58:13.643 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: You aren't allowed to create provisioner daemons"
2023-08-09 11:58:23.666 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"
2023-08-09 11:58:33.678 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"
2023-08-09 11:58:43.689 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"
2023-08-09 11:58:53.697 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"
2023-08-09 11:59:03.709 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"
2023-08-09 11:59:13.713 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"
2023-08-09 11:59:23.719 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"
2023-08-09 11:59:33.731 [warn]  coderd client failed to dial  error="GET http://coder.coder.svc.cluster.local/api/v2/organizations/00000000-0000-0000-0000-000000000000/provisionerdaemons/serve?provisioner=terraform: unexpected status code 403: External provisioner daemons is an Enterprise feature. Contact sales!"

After I set the license, the log stream stopped. It might be useful for ops to leave an extra info message confirming that provisioner works correctly, and it is up and running (healthcheck in logs?).

  • Should I be able to connect the provisoner with coderd, so that new provisioner can pick up tasks? So far, I can see that "coder" deployment uses its embedded provisioner, and I'm wondering if this is in scope of this PR. I modified pskSecretName to match coder-provisioner-psk, not sure if this is required?

@spikecurtis
Copy link
Contributor Author

Should I be able to connect the provisoner with coderd, so that new provisioner can pick up tasks? So far, I can see that "coder" deployment uses its embedded provisioner, and I'm wondering if this is in scope of this PR. I modified pskSecretName to match coder-provisioner-psk, not sure if this is required?

Yes, the new provisioner should be able to pick up tasks, but it will race with the provisionerds that are in-process with Coderd to acquire jobs. You can create a lot of jobs in parallel, or you could start Coderd without in-process provisionerd by setting

    - name: CODER_PROVISIONER_DAEMONS
      value: "0"

in the coder.env value.

@spikecurtis
Copy link
Contributor Author

spikecurtis commented Aug 10, 2023

After I set the license, the log stream stopped. It might be useful for ops to leave an extra info message confirming that provisioner works correctly, and it is up and running (healthcheck in logs?).

Yeah, I agree it's useful to drop an INFO log when we successfully connect. I'll add one.

Edit: turns out there is one, but it's at DEBUG. I'll make it INFO.

Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
@spikecurtis spikecurtis requested a review from mtojek August 10, 2023 04:46
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm chart works correctly. Nice job 👍

This could be potentially an issue:

[
   {
      "id":"6221b964-d2ed-4faa-9864-ad077d35a4de",
      "created_at":"2023-08-10T07:39:08.633353Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"priceless_shaw4",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"48bdb066-bc75-4365-b655-4d7e3bc71c7e",
      "created_at":"2023-08-10T07:49:20.504444Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"reverent_johnson2",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"3465c1ac-c6d5-4ba4-8442-834d902a8a1e",
      "created_at":"2023-08-10T07:49:30.330417Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"jolly_feistel6",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"820684dd-35bf-4bc7-bfd4-c6c565eefb15",
      "created_at":"2023-08-10T07:49:35.860361Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"beautiful_sutherland5",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"dc127705-726d-4bfb-8923-3ce40fae05fb",
      "created_at":"2023-08-10T07:51:45.435515Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"upbeat_heisenberg8",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"cdbb9d36-4e63-4e21-a6a6-9d97d582244d",
      "created_at":"2023-08-10T07:51:45.687759Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"zealous_antonelli3",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"820e9694-e512-45f5-98c5-b78a729b89f4",
      "created_at":"2023-08-10T07:51:45.74739Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"amazing_williamson9",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"7982f865-f8e3-4337-969b-789926fd3d13",
      "created_at":"2023-08-10T07:51:46.056332Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"distracted_robinson5",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"12ec2e26-ee1c-4b8a-a104-09b60ba58059",
      "created_at":"2023-08-10T07:51:46.930575Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"focused_kirch8",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"0f9b8fad-e013-4cde-8653-174f15c57bda",
      "created_at":"2023-08-10T07:53:34.58902Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"eager_goldstine7",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"5e904073-a3b3-4ed2-89e2-33cbf0cca1f4",
      "created_at":"2023-08-10T07:53:34.754871Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"cranky_bardeen3",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"b1053a9d-6b50-4631-8353-bae5d668fd01",
      "created_at":"2023-08-10T07:53:34.846211Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"vibrant_ptolemy7",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"d540e06a-8813-46db-90e8-af28afff1a49",
      "created_at":"2023-08-10T07:53:35.265534Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"vigorous_mclean2",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   },
   {
      "id":"d5f83605-203a-4fa3-ad11-a1e1154674a3",
      "created_at":"2023-08-10T07:53:35.694644Z",
      "updated_at":{
         "Time":"0001-01-01T00:00:00Z",
         "Valid":false
      },
      "name":"friendly_shirley6",
      "provisioners":[
         "terraform"
      ],
      "tags":{
         
      }
   }
]

I uninstalled & installed provisioner instances a couple of times, and they are all exposed over API. Considering the daily pattern of scale up & down, we may blow up the table really quickly. I'm wondering if we need to clean up old daemon entries.

@spikecurtis
Copy link
Contributor Author

I uninstalled & installed provisioner instances a couple of times, and they are all exposed over API. Considering the daily pattern of scale up & down, we may blow up the table really quickly. I'm wondering if we need to clean up old daemon entries.

Yeah, the provisioner daemons table in the DB is a roach motel. We only ever add to it. It's also not clear to me what the purpose of the table is. Tracking in #9015

@spikecurtis spikecurtis changed the title feat: adds external provisioner daemon helm chart feat: add external provisioner daemon helm chart Aug 10, 2023
@spikecurtis spikecurtis merged commit 21af020 into main Aug 10, 2023
@spikecurtis spikecurtis deleted the spike/8243-helm-ext-provisioner branch August 10, 2023 09:59
@github-actions github-actions bot locked and limited conversation to collaborators Aug 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants