feat: Add support for update checks and notifications #4810

mafredri · 2022-10-31T15:46:56Z

This PR adds support for update checking to coder server.

Fixes #2701

TODO for site:

Only show message to owners
~~Refresh update check status at X interval?~~
Links / HTML styling in ~~GlobalSnackbar~~ AlertBanner or redirect on click
- ~~Or alternative display method?~~
Require manual dismissal of notification
Add tests

Description

The server implementation:

Periodically checks GitHub API for new release (24h)
Data is stored in DB (site_config) so that we don't spam GitHub API on every startup (this could otherwise cause our customers IP to be rate-limited in a crash-loop scenario)
This data is available via /api/v2/updatecheck
Logs a message when a new version is available, or on startup on outdated installations

Server log example:

2022-10-31 15:35:09.015 [INFO]	(coderd)	<./cli/server.go:374>	Server.func1.3	new version of coder available	{"new_version": "v0.11.0", "url": "https://github.com/coder/coder/releases/tag/v0.11.0", "upgrade_instructions": "https://coder.com/docs/coder-oss/latest/admin/upgrade"}
2022-10-31 15:35:09.018 [INFO]	(coderd.update_checker)	<./coderd/updatecheck/updatecheck.go:157>	(*Checker).start	time until next update check	{"duration": "22h24m56.543305984s"}

The notification in the Web UI looks like this:

~~An alternative display method may be required, or extending GlobalSnackbar features (no timeout, HTML content, etc).~~

mafredri · 2022-10-31T15:48:34Z

Adding @coder/backend as reviewer since server portion of this PR is complete (albeit this being a draft PR).

Frontend requires more work.

bpmct · 2022-10-31T16:03:33Z

This is sweet! Some questions:

~~What user role(s) does this alert show up for in the dashboard?~~ I see the note about owners
I noticed "this could otherwise cause our customers IP to be rate-limited in a crash-loop scenario." If the user is running Coder offline or has blocked network access to GitHub, will this cause the server to crash?
Does this show up for every release (major & minor)?
~~Is there a way to disable this functionality?~~ I see the config flag

bpmct · 2022-10-31T16:09:17Z

What do you think about putting this on the Deployment page as well? I know we display the version in the footer, but perhaps we could display it larger as a row in the deployment page + a callout to update?

mafredri · 2022-10-31T16:15:23Z

I noticed "this could otherwise cause our customers IP to be rate-limited in a crash-loop scenario." If the user is running Coder offline or has blocked network access to GitHub, will this cause the server to crash?

If the feature is enabled whilst offline, the user will simply see an error message in the server logs once every 24h. There will be no notification in the Web UI.

Does this show up for every release (major & minor)?

Currently yes, I'm open to tweaking it but I think it should be fine for both.

mafredri · 2022-10-31T16:16:31Z

What do you think about putting this on the Deployment page as well? I know we display the version in the footer, but perhaps we could display it larger as a row in the deployment page + a callout to update?

That's a pretty good idea IMO. 👍🏻

site/src/xServices/StateContext.tsx

site/src/xServices/updateCheck/updateCheckXService.ts

Kira-Pilot · 2022-10-31T19:04:25Z

A couple of thoughts:

TODO for site: Only show message to owners

We don't currently expose the concept of roles on the FE. What we probably want to do is add another permission to see this banner and then check that permission in our updateCheckXService. Then that service could expose a boolean like shouldDisplayUpdateBanner or whatever that toggles the visibility update banner component view.

TODO for site: Refresh update check status at X interval?

Regarding polling this endpoint: are we sure we want to poll? What if the user dismisses the notification - do we want it showing up some interval later? Would it suffice to have a notification show up on page refresh?
If we do want to poll, I wonder if down the road, we want to add an event listener to accomplish this.
In the meantime, if we want polling but not an event listener, we can use XState to accomplish this.

site/src/components/UpdateCheckBanner/UpdateCheckBanner.tsx

mafredri · 2022-11-30T20:36:17Z

site/src/components/AuthAndFrame/AuthAndFrame.tsx

+    } else {
+      updateCheckSend("CLEAR")
+    }
+  }, [authState, updateCheckSend])


review(site): I'm not super happy with how this turned out. I kind of wanted to represent this in the updateCheckXService, but I couldn't figure out how to do inter-machine dependencies.

This snippet here enables e.g. the following scenario:

User is loggedin as non-owner

User logs out

User logs in as owner

Update notification is shown

Yeah, inter-machine dependencies are tricky; I'm not sure if there's a better way. @presleyp do you have any workarounds? I think you looked at something like this recently with pagination.

mafredri · 2022-11-30T20:37:07Z

site/src/components/AuthAndFrame/AuthAndFrame.tsx

+              <UpdateCheckBanner
+                updateCheck={updateCheckState.context.updateCheck}
+                error={updateCheckState.context.error}
+                onDismiss={() => updateCheckSend("DISMISS")}


review(site): Dismissal as any user (practically only an owner) will prevent notification from showing up until browser window is reloaded, even if logged in/out in-between. This seems like acceptable behavior (managed via XState service).

mafredri · 2022-11-30T20:44:50Z

coderd/coderdtest/authorize.go

@@ -48,6 +48,7 @@ func AGPLRoutes(a *AuthTester) (map[string]string, map[string]RouteCheck) {
 		"GET:/healthz":                  {NoAuthorize: true},
 		"GET:/api/v2":                   {NoAuthorize: true},
 		"GET:/api/v2/buildinfo":         {NoAuthorize: true},
+		"GET:/api/v2/updatecheck":       {NoAuthorize: true},


review(api): Technically, we restrict access in the WebUI so that only owners can view this information. It's not sensitive information by any means, but it's somewhat pointless to show to users. That's why we do not require authorization here (but I can change it if this behavior seems too weird).

Unless the "updatecheck" can trigger a real update and potentially crash the deployment, I guess that it's safe to expose it to everyone. On the other hand, if there is a known security issue, all users would be informed that there is an update to install, so the system is currently vulnerable.

The updatecheck is read-only, so it won't do anything scandalous. If we ever add a POST endpoint, that should obviously be protected. You raised a good point about vulnerabilities. Since Coder is an open source product, this information would be available to anyone checking the GitHub releases too. More than protecting this endpoint, we should consider how we convey this information through our releases.

deansheather

backend ✅

coderd/coderd.go

coderd/updatecheck/updatecheck.go

deansheather · 2022-11-30T20:45:18Z

coderd/updatecheck/updatecheck.go

+)
+
+// Checker is responsible for periodically checking for updates.
+type Checker struct {


Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?

Do you have a suggestion for the how to do the lock or do we have a previous implementation of it? I'm not sure how to implement it with sqlc.

Essentially I'm imagining, start transaction => lock table in row exclusive mode => select row => (if it's time to update, fetch => update) => commit. But alas, sqlc.. 😄

Postgres advisory locks are very easy to use, you just select the lock and it'll automatically get released when either the transaction ends or the connection ends.

SELECT pg_advisory_lock("some_name"); -- returns an error if already locked (I think) -- or SELECT pg_try_advisory_lock("some_name"); -- returns true if successfully locked SELECT pg_advisory_unlock("some_name");

Here's an example from v1: https://github.com/coder/v1/blob/7259332777bae82e5bef3fa2afacb63f5606306d/product/coder/pkg/dlock/pglock/pglock.go

use pg_advisory_xact_lock for transactions for it to be auto-released if the transaction ends***

I haven't used those before. Good suggestion, it looks promising but I don't see us having any precedent for using these. I guess there's always a first time 😄.

They work well for our use in v1.

Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?

I might be blind here, but what exactly are the penalties? It doesn't seem to be frequent db access.

Seems like a waste to do one update check for each replica if only one result will ever be stored and it's very easy to wrap in a transaction with a lock

I'll defer this to a follow-up PR, this is a convenience fix more than anything. Ultimately I'd like for us to have a plan for how to deal with replicas, we can obviously have it be the wild west where the quickest draw wins, but assigning/limiting responsibilities via primary/secondary seems logical IMO.

deansheather · 2022-11-30T20:47:02Z

coderd/updatecheck/updatecheck.go

+	if r.Checked.IsZero() || time.Since(r.Checked) > c.opts.Interval {
+		r, err = c.update()
+		if err != nil {
+			return Result{}, xerrors.Errorf("update check failed: %w", err)
+		}
+	}


I think we should avoid doing an update check immediately on startup, this could be delayed by 10 minutes or something to avoid:

every replica doing database queries immediately on startup to try to do an update check,

any potential panics in this code causing coder to instantly crash on start

We only do the update check if it's been >24h since the last one. And unless all replicas start up at the same time, this shouldn't be a problem, especially if we add the table locking.

Regarding the panics, ultimately we mustn't have any, but if we did, I would say it's better the sooner it happens. Wouldn't want it to happen while Coder is in the middle of provisioning workspaces, etc.

every replica doing database queries immediately on startup to try to do an update check,

And unless all replicas start up at the same time, this shouldn't be a problem

It's a common pattern to define a startup/access delay for workers that hit the same resource, usually 0-60 sec. Maybe it can help also here?

I'll think about this for a follow up PR, I don't imagine a single database query will be problematic at startup, though. If we want to spread out service starts, it would be good to have a more structured plan than doing it individually and uniquely for each service.

site/src/components/AuthAndFrame/AuthAndFrame.tsx

mtojek · 2022-11-30T21:52:22Z

coderd/updatecheck/updatecheck.go

+	if r.Checked.IsZero() || time.Since(r.Checked) > c.opts.Interval {
+		r, err = c.update()
+		if err != nil {
+			return Result{}, xerrors.Errorf("update check failed: %w", err)
+		}
+	}


every replica doing database queries immediately on startup to try to do an update check,

And unless all replicas start up at the same time, this shouldn't be a problem

It's a common pattern to define a startup/access delay for workers that hit the same resource, usually 0-60 sec. Maybe it can help also here?

mtojek · 2022-11-30T21:53:35Z

coderd/updatecheck/updatecheck.go

+)
+
+// Checker is responsible for periodically checking for updates.
+type Checker struct {


Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?

I might be blind here, but what exactly are the penalties? It doesn't seem to be frequent db access.

mtojek · 2022-11-30T21:55:07Z

coderd/coderdtest/authorize.go

@@ -48,6 +48,7 @@ func AGPLRoutes(a *AuthTester) (map[string]string, map[string]RouteCheck) {
 		"GET:/healthz":                  {NoAuthorize: true},
 		"GET:/api/v2":                   {NoAuthorize: true},
 		"GET:/api/v2/buildinfo":         {NoAuthorize: true},
+		"GET:/api/v2/updatecheck":       {NoAuthorize: true},


Unless the "updatecheck" can trigger a real update and potentially crash the deployment, I guess that it's safe to expose it to everyone. On the other hand, if there is a known security issue, all users would be informed that there is an update to install, so the system is currently vulnerable.

cli/testdata/coder_server_--help.golden

site/src/components/UpdateCheckBanner/UpdateCheckBanner.tsx

Kira-Pilot · 2022-11-30T22:03:14Z

site/src/components/UpdateCheckBanner/UpdateCheckBanner.tsx

+          dismissible
+        />
+      )}
+    </>


Just curious why we throw an error if the check fails. Perhaps it would be better if this failed silently (we could throw something in the terminal if we wanted)? Maybe not, just a thought after seeing the error on local:

I thought about it and I worried we wouldn't notice if it stops working for whatever reason. I can make it silent or add a prefix (Update check failed: Route not found.). Ideally we would keep it silent and log it to Sentry or a similar bug tracker.

Yeah, maybe we could just add a console.error since we don't have Sentry.

I refactored this now, it looks like this: https://www.chromatic.com/test?appId=624de63c6aacee003aa84340&id=6388bfa22e0a55f2ff0c1f1e

I'm still game to remove it if that's preferred. 👍🏻

@mafredri nope looks good!

Kira-Pilot · 2022-11-30T22:09:08Z

site/src/testHelpers/entities.ts

+  current: true,
+  url: "file:///mock-url",
+  version: "v99.999.9999+c9cdf14",
+}


Kira-Pilot · 2022-11-30T22:26:38Z

site/src/xServices/updateCheck/updateCheckXService.ts

+    },
+    action: "read",
+  },
+} as const


I'm not sure why we need to export these constants or why we need to cast as const although I see this pattern in our other machines. @presley (or @mafredri) is there something dumb I'm missing?

Same here, I thought it was weird but I kept them just in case. I'll remove them and see if something breaks 👍🏻

Kira-Pilot

Nicely done! Code looks great so approving. I did have some trouble running locally, even after blowing away the db and running make gen. What am I missing?

Co-authored-by: Kira Pilot <kira@coder.com>

mafredri · 2022-12-01T13:06:15Z

Nicely done! Code looks great so approving. I did have some trouble running locally, even after blowing away the db and running make gen. What am I missing?

Thanks! Could you share what issue/error you ran into? I'm not seeing any errors in either the API or site and I've tried clearing my whole repo with git clean -xdf (WARNING: this command will truly delete everything unrelevant from the project dir, e.g. .envrc files, node_modules, generated XService files, etc).

The only thing I did notice is that scripts/develop.sh has started printing this:

> Start a 30-day trial of Enterprise? (yes/no)

So you kinda need to type (blindly) yes[Enter] into the terminal (I'll take a look if I can fix that in another PR.) (Done: #5231)

Kira-Pilot · 2022-12-01T15:15:34Z

Thanks! Could you share what issue/error you ran into?

@mafredri I took another look this morning and everything loaded fine! I struggled last night before I lost power so maybe my connection was unstable or something.

Everything looks great!

mafredri requested review from a team and Kira-Pilot October 31, 2022 15:48

mafredri self-assigned this Oct 31, 2022

mafredri force-pushed the mafredri/outdated-coder-installation-notice branch from 94accc8 to e327dd6 Compare October 31, 2022 16:37

Kira-Pilot reviewed Oct 31, 2022

View reviewed changes

site/src/xServices/StateContext.tsx Show resolved Hide resolved

Kira-Pilot reviewed Oct 31, 2022

View reviewed changes

site/src/xServices/updateCheck/updateCheckXService.ts Outdated Show resolved Hide resolved

Kira-Pilot reviewed Oct 31, 2022

View reviewed changes

site/src/xServices/updateCheck/updateCheckXService.ts Show resolved Hide resolved

Kira-Pilot reviewed Oct 31, 2022

View reviewed changes

site/src/xServices/updateCheck/updateCheckXService.ts Outdated Show resolved Hide resolved

github-actions bot added the stale This issue is like stale bread. label Nov 8, 2022

github-actions bot closed this Nov 12, 2022

mafredri reopened this Nov 12, 2022

mafredri removed the stale This issue is like stale bread. label Nov 12, 2022

github-actions bot added the stale This issue is like stale bread. label Nov 20, 2022

mafredri removed the stale This issue is like stale bread. label Nov 21, 2022

mafredri force-pushed the mafredri/outdated-coder-installation-notice branch 2 times, most recently from e1373c7 to ee967ae Compare November 29, 2022 11:44

Kira-Pilot reviewed Nov 29, 2022

View reviewed changes

site/src/components/UpdateCheckBanner/UpdateCheckBanner.tsx Outdated Show resolved Hide resolved

mafredri added 7 commits November 29, 2022 17:33

feat: Add support for checking for updates

cc28c1b

wip(site): Add support for informing of updates

c163af8

Add /api/v2/updatecheck to noauthorize list

e9ed0b7

Fix edge case where update check has never succeeded

64ad0f5

Fix logging

0238d5a

Use AlertBanner for coder version notice

9e1ea9e

Update golden files

979ff80

Add authorization, dismissal and fetch after login

81d11dd

mafredri force-pushed the mafredri/outdated-coder-installation-notice branch from 4e4d9b0 to 81d11dd Compare November 30, 2022 20:19

mafredri commented Nov 30, 2022

View reviewed changes

mafredri marked this pull request as ready for review November 30, 2022 20:37

mafredri requested a review from a team as a code owner November 30, 2022 20:37

mafredri requested review from Kira-Pilot and a team November 30, 2022 20:38

mafredri commented Nov 30, 2022

View reviewed changes

coder deleted a comment from github-actions bot Nov 30, 2022

deansheather approved these changes Nov 30, 2022

View reviewed changes