Skip to content

feat: Add support for update checks and notifications #4810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 1, 2022

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Oct 31, 2022

This PR adds support for update checking to coder server.

Fixes #2701

TODO for site:

  • Only show message to owners
  • Refresh update check status at X interval?
  • Links / HTML styling in GlobalSnackbar AlertBanner or redirect on click
    • Or alternative display method?
  • Require manual dismissal of notification
  • Add tests

Description

The server implementation:

  • Periodically checks GitHub API for new release (24h)
  • Data is stored in DB (site_config) so that we don't spam GitHub API on every startup (this could otherwise cause our customers IP to be rate-limited in a crash-loop scenario)
  • This data is available via /api/v2/updatecheck
  • Logs a message when a new version is available, or on startup on outdated installations

Server log example:

2022-10-31 15:35:09.015 [INFO]	(coderd)	<./cli/server.go:374>	Server.func1.3	new version of coder available	{"new_version": "v0.11.0", "url": "https://github.com/coder/coder/releases/tag/v0.11.0", "upgrade_instructions": "https://coder.com/docs/coder-oss/latest/admin/upgrade"}
2022-10-31 15:35:09.018 [INFO]	(coderd.update_checker)	<./coderd/updatecheck/updatecheck.go:157>	(*Checker).start	time until next update check	{"duration": "22h24m56.543305984s"}

The notification in the Web UI looks like this:

An alternative display method may be required, or extending GlobalSnackbar features (no timeout, HTML content, etc).

@mafredri
Copy link
Member Author

Adding @coder/backend as reviewer since server portion of this PR is complete (albeit this being a draft PR).

Frontend requires more work.

@mafredri mafredri requested review from a team and Kira-Pilot October 31, 2022 15:48
@mafredri mafredri self-assigned this Oct 31, 2022
@bpmct
Copy link
Member

bpmct commented Oct 31, 2022

This is sweet! Some questions:

  • What user role(s) does this alert show up for in the dashboard? I see the note about owners
  • I noticed "this could otherwise cause our customers IP to be rate-limited in a crash-loop scenario." If the user is running Coder offline or has blocked network access to GitHub, will this cause the server to crash?
  • Does this show up for every release (major & minor)?
  • Is there a way to disable this functionality? I see the config flag

@bpmct
Copy link
Member

bpmct commented Oct 31, 2022

What do you think about putting this on the Deployment page as well? I know we display the version in the footer, but perhaps we could display it larger as a row in the deployment page + a callout to update?

@mafredri
Copy link
Member Author

  • I noticed "this could otherwise cause our customers IP to be rate-limited in a crash-loop scenario." If the user is running Coder offline or has blocked network access to GitHub, will this cause the server to crash?

If the feature is enabled whilst offline, the user will simply see an error message in the server logs once every 24h. There will be no notification in the Web UI.

  • Does this show up for every release (major & minor)?

Currently yes, I'm open to tweaking it but I think it should be fine for both.

@mafredri
Copy link
Member Author

What do you think about putting this on the Deployment page as well? I know we display the version in the footer, but perhaps we could display it larger as a row in the deployment page + a callout to update?

That's a pretty good idea IMO. 👍🏻

@mafredri mafredri force-pushed the mafredri/outdated-coder-installation-notice branch from 94accc8 to e327dd6 Compare October 31, 2022 16:37
@Kira-Pilot
Copy link
Member

A couple of thoughts:

TODO for site: Only show message to owners

We don't currently expose the concept of roles on the FE. What we probably want to do is add another permission to see this banner and then check that permission in our updateCheckXService. Then that service could expose a boolean like shouldDisplayUpdateBanner or whatever that toggles the visibility update banner component view.

TODO for site: Refresh update check status at X interval?

Regarding polling this endpoint: are we sure we want to poll? What if the user dismisses the notification - do we want it showing up some interval later? Would it suffice to have a notification show up on page refresh?
If we do want to poll, I wonder if down the road, we want to add an event listener to accomplish this.
In the meantime, if we want polling but not an event listener, we can use XState to accomplish this.

@github-actions github-actions bot added the stale This issue is like stale bread. label Nov 8, 2022
@github-actions github-actions bot closed this Nov 12, 2022
@mafredri mafredri reopened this Nov 12, 2022
@mafredri mafredri removed the stale This issue is like stale bread. label Nov 12, 2022
@github-actions github-actions bot added the stale This issue is like stale bread. label Nov 20, 2022
@mafredri mafredri removed the stale This issue is like stale bread. label Nov 21, 2022
@mafredri mafredri force-pushed the mafredri/outdated-coder-installation-notice branch 2 times, most recently from e1373c7 to ee967ae Compare November 29, 2022 11:44
@mafredri mafredri force-pushed the mafredri/outdated-coder-installation-notice branch from 4e4d9b0 to 81d11dd Compare November 30, 2022 20:19
} else {
updateCheckSend("CLEAR")
}
}, [authState, updateCheckSend])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review(site): I'm not super happy with how this turned out. I kind of wanted to represent this in the updateCheckXService, but I couldn't figure out how to do inter-machine dependencies.

This snippet here enables e.g. the following scenario:

  1. User is loggedin as non-owner
  2. User logs out
  3. User logs in as owner
  4. Update notification is shown

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, inter-machine dependencies are tricky; I'm not sure if there's a better way. @presleyp do you have any workarounds? I think you looked at something like this recently with pagination.

<UpdateCheckBanner
updateCheck={updateCheckState.context.updateCheck}
error={updateCheckState.context.error}
onDismiss={() => updateCheckSend("DISMISS")}
Copy link
Member Author

@mafredri mafredri Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review(site): Dismissal as any user (practically only an owner) will prevent notification from showing up until browser window is reloaded, even if logged in/out in-between. This seems like acceptable behavior (managed via XState service).

@mafredri mafredri marked this pull request as ready for review November 30, 2022 20:37
@mafredri mafredri requested a review from a team as a code owner November 30, 2022 20:37
@mafredri mafredri requested review from Kira-Pilot and a team November 30, 2022 20:38
@@ -48,6 +48,7 @@ func AGPLRoutes(a *AuthTester) (map[string]string, map[string]RouteCheck) {
"GET:/healthz": {NoAuthorize: true},
"GET:/api/v2": {NoAuthorize: true},
"GET:/api/v2/buildinfo": {NoAuthorize: true},
"GET:/api/v2/updatecheck": {NoAuthorize: true},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review(api): Technically, we restrict access in the WebUI so that only owners can view this information. It's not sensitive information by any means, but it's somewhat pointless to show to users. That's why we do not require authorization here (but I can change it if this behavior seems too weird).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the "updatecheck" can trigger a real update and potentially crash the deployment, I guess that it's safe to expose it to everyone. On the other hand, if there is a known security issue, all users would be informed that there is an update to install, so the system is currently vulnerable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updatecheck is read-only, so it won't do anything scandalous. If we ever add a POST endpoint, that should obviously be protected. You raised a good point about vulnerabilities. Since Coder is an open source product, this information would be available to anyone checking the GitHub releases too. More than protecting this endpoint, we should consider how we convey this information through our releases.

@coder coder deleted a comment from github-actions bot Nov 30, 2022
@coder coder deleted a comment from github-actions bot Nov 30, 2022
Copy link
Member

@deansheather deansheather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend ✅

)

// Checker is responsible for periodically checking for updates.
type Checker struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?

Copy link
Member Author

@mafredri mafredri Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion for the how to do the lock or do we have a previous implementation of it? I'm not sure how to implement it with sqlc.

Essentially I'm imagining, start transaction => lock table in row exclusive mode => select row => (if it's time to update, fetch => update) => commit. But alas, sqlc.. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres advisory locks are very easy to use, you just select the lock and it'll automatically get released when either the transaction ends or the connection ends.

SELECT pg_advisory_lock("some_name"); -- returns an error if already locked (I think)
-- or
SELECT pg_try_advisory_lock("some_name"); -- returns true if successfully locked

SELECT pg_advisory_unlock("some_name");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pg_advisory_xact_lock for transactions for it to be auto-released if the transaction ends***

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used those before. Good suggestion, it looks promising but I don't see us having any precedent for using these. I guess there's always a first time 😄.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They work well for our use in v1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?

I might be blind here, but what exactly are the penalties? It doesn't seem to be frequent db access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a waste to do one update check for each replica if only one result will ever be stored and it's very easy to wrap in a transaction with a lock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer this to a follow-up PR, this is a convenience fix more than anything. Ultimately I'd like for us to have a plan for how to deal with replicas, we can obviously have it be the wild west where the quickest draw wins, but assigning/limiting responsibilities via primary/secondary seems logical IMO.

Comment on lines +113 to +118
if r.Checked.IsZero() || time.Since(r.Checked) > c.opts.Interval {
r, err = c.update()
if err != nil {
return Result{}, xerrors.Errorf("update check failed: %w", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid doing an update check immediately on startup, this could be delayed by 10 minutes or something to avoid:

  • every replica doing database queries immediately on startup to try to do an update check,
  • any potential panics in this code causing coder to instantly crash on start

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only do the update check if it's been >24h since the last one. And unless all replicas start up at the same time, this shouldn't be a problem, especially if we add the table locking.

Regarding the panics, ultimately we mustn't have any, but if we did, I would say it's better the sooner it happens. Wouldn't want it to happen while Coder is in the middle of provisioning workspaces, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every replica doing database queries immediately on startup to try to do an update check,

And unless all replicas start up at the same time, this shouldn't be a problem

It's a common pattern to define a startup/access delay for workers that hit the same resource, usually 0-60 sec. Maybe it can help also here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about this for a follow up PR, I don't imagine a single database query will be problematic at startup, though. If we want to spread out service starts, it would be good to have a more structured plan than doing it individually and uniquely for each service.

Comment on lines +113 to +118
if r.Checked.IsZero() || time.Since(r.Checked) > c.opts.Interval {
r, err = c.update()
if err != nil {
return Result{}, xerrors.Errorf("update check failed: %w", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every replica doing database queries immediately on startup to try to do an update check,

And unless all replicas start up at the same time, this shouldn't be a problem

It's a common pattern to define a startup/access delay for workers that hit the same resource, usually 0-60 sec. Maybe it can help also here?

)

// Checker is responsible for periodically checking for updates.
type Checker struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?

I might be blind here, but what exactly are the penalties? It doesn't seem to be frequent db access.

@@ -48,6 +48,7 @@ func AGPLRoutes(a *AuthTester) (map[string]string, map[string]RouteCheck) {
"GET:/healthz": {NoAuthorize: true},
"GET:/api/v2": {NoAuthorize: true},
"GET:/api/v2/buildinfo": {NoAuthorize: true},
"GET:/api/v2/updatecheck": {NoAuthorize: true},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the "updatecheck" can trigger a real update and potentially crash the deployment, I guess that it's safe to expose it to everyone. On the other hand, if there is a known security issue, all users would be informed that there is an update to install, so the system is currently vulnerable.

dismissible
/>
)}
</>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why we throw an error if the check fails. Perhaps it would be better if this failed silently (we could throw something in the terminal if we wanted)? Maybe not, just a thought after seeing the error on local:
Screen Shot 2022-11-30 at 5 02 28 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and I worried we wouldn't notice if it stops working for whatever reason. I can make it silent or add a prefix (Update check failed: Route not found.). Ideally we would keep it silent and log it to Sentry or a similar bug tracker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe we could just add a console.error since we don't have Sentry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this now, it looks like this: https://www.chromatic.com/test?appId=624de63c6aacee003aa84340&id=6388bfa22e0a55f2ff0c1f1e

I'm still game to remove it if that's preferred. 👍🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mafredri nope looks good!

current: true,
url: "file:///mock-url",
version: "v99.999.9999+c9cdf14",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

},
action: "read",
},
} as const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to export these constants or why we need to cast as const although I see this pattern in our other machines. @presley (or @mafredri) is there something dumb I'm missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I thought it was weird but I kept them just in case. I'll remove them and see if something breaks 👍🏻

Copy link
Member

@Kira-Pilot Kira-Pilot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Code looks great so approving. I did have some trouble running locally, even after blowing away the db and running make gen. What am I missing?

@mafredri
Copy link
Member Author

mafredri commented Dec 1, 2022

Nicely done! Code looks great so approving. I did have some trouble running locally, even after blowing away the db and running make gen. What am I missing?

Thanks! Could you share what issue/error you ran into? I'm not seeing any errors in either the API or site and I've tried clearing my whole repo with git clean -xdf (WARNING: this command will truly delete everything unrelevant from the project dir, e.g. .envrc files, node_modules, generated XService files, etc).

The only thing I did notice is that scripts/develop.sh has started printing this:

> Start a 30-day trial of Enterprise? (yes/no)

So you kinda need to type (blindly) yes[Enter] into the terminal (I'll take a look if I can fix that in another PR.) (Done: #5231)

@Kira-Pilot
Copy link
Member

Thanks! Could you share what issue/error you ran into?

@mafredri I took another look this morning and everything loaded fine! I struggled last night before I lost power so maybe my connection was unstable or something.

Everything looks great!

@mafredri mafredri changed the title feat: Add support for checking for updates feat: Add support for update checks and notifications Dec 1, 2022
@mafredri mafredri merged commit d9f2aaf into main Dec 1, 2022
@mafredri mafredri deleted the mafredri/outdated-coder-installation-notice branch December 1, 2022 17:43
@github-actions github-actions bot locked and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for detecting and informing users of outdated coder installations
6 participants