-
Notifications
You must be signed in to change notification settings - Fork 875
feat: Add support for update checks and notifications #4810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adding @coder/backend as reviewer since server portion of this PR is complete (albeit this being a draft PR). Frontend requires more work. |
This is sweet! Some questions:
|
What do you think about putting this on the |
If the feature is enabled whilst offline, the user will simply see an error message in the server logs once every 24h. There will be no notification in the Web UI.
Currently yes, I'm open to tweaking it but I think it should be fine for both. |
That's a pretty good idea IMO. 👍🏻 |
94accc8
to
e327dd6
Compare
A couple of thoughts:
We don't currently expose the concept of roles on the FE. What we probably want to do is add another permission to see this banner and then check that permission in our
Regarding polling this endpoint: are we sure we want to poll? What if the user dismisses the notification - do we want it showing up some interval later? Would it suffice to have a notification show up on page refresh? |
e1373c7
to
ee967ae
Compare
4e4d9b0
to
81d11dd
Compare
} else { | ||
updateCheckSend("CLEAR") | ||
} | ||
}, [authState, updateCheckSend]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review(site): I'm not super happy with how this turned out. I kind of wanted to represent this in the updateCheckXService, but I couldn't figure out how to do inter-machine dependencies.
This snippet here enables e.g. the following scenario:
- User is loggedin as non-owner
- User logs out
- User logs in as owner
- Update notification is shown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, inter-machine dependencies are tricky; I'm not sure if there's a better way. @presleyp do you have any workarounds? I think you looked at something like this recently with pagination.
<UpdateCheckBanner | ||
updateCheck={updateCheckState.context.updateCheck} | ||
error={updateCheckState.context.error} | ||
onDismiss={() => updateCheckSend("DISMISS")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review(site): Dismissal as any user (practically only an owner) will prevent notification from showing up until browser window is reloaded, even if logged in/out in-between. This seems like acceptable behavior (managed via XState service).
@@ -48,6 +48,7 @@ func AGPLRoutes(a *AuthTester) (map[string]string, map[string]RouteCheck) { | |||
"GET:/healthz": {NoAuthorize: true}, | |||
"GET:/api/v2": {NoAuthorize: true}, | |||
"GET:/api/v2/buildinfo": {NoAuthorize: true}, | |||
"GET:/api/v2/updatecheck": {NoAuthorize: true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review(api): Technically, we restrict access in the WebUI so that only owners can view this information. It's not sensitive information by any means, but it's somewhat pointless to show to users. That's why we do not require authorization here (but I can change it if this behavior seems too weird).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the "updatecheck" can trigger a real update and potentially crash the deployment, I guess that it's safe to expose it to everyone. On the other hand, if there is a known security issue, all users would be informed that there is an update to install, so the system is currently vulnerable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updatecheck is read-only, so it won't do anything scandalous. If we ever add a POST endpoint, that should obviously be protected. You raised a good point about vulnerabilities. Since Coder is an open source product, this information would be available to anyone checking the GitHub releases too. More than protecting this endpoint, we should consider how we convey this information through our releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend ✅
) | ||
|
||
// Checker is responsible for periodically checking for updates. | ||
type Checker struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion for the how to do the lock or do we have a previous implementation of it? I'm not sure how to implement it with sqlc
.
Essentially I'm imagining, start transaction => lock table in row exclusive mode => select row => (if it's time to update, fetch => update) => commit. But alas, sqlc.. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres advisory locks are very easy to use, you just select the lock and it'll automatically get released when either the transaction ends or the connection ends.
SELECT pg_advisory_lock("some_name"); -- returns an error if already locked (I think)
-- or
SELECT pg_try_advisory_lock("some_name"); -- returns true if successfully locked
SELECT pg_advisory_unlock("some_name");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use pg_advisory_xact_lock
for transactions for it to be auto-released if the transaction ends***
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used those before. Good suggestion, it looks promising but I don't see us having any precedent for using these. I guess there's always a first time 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They work well for our use in v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?
I might be blind here, but what exactly are the penalties? It doesn't seem to be frequent db access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a waste to do one update check for each replica if only one result will ever be stored and it's very easy to wrap in a transaction with a lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer this to a follow-up PR, this is a convenience fix more than anything. Ultimately I'd like for us to have a plan for how to deal with replicas, we can obviously have it be the wild west where the quickest draw wins, but assigning/limiting responsibilities via primary/secondary seems logical IMO.
if r.Checked.IsZero() || time.Since(r.Checked) > c.opts.Interval { | ||
r, err = c.update() | ||
if err != nil { | ||
return Result{}, xerrors.Errorf("update check failed: %w", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid doing an update check immediately on startup, this could be delayed by 10 minutes or something to avoid:
- every replica doing database queries immediately on startup to try to do an update check,
- any potential panics in this code causing coder to instantly crash on start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only do the update check if it's been >24h since the last one. And unless all replicas start up at the same time, this shouldn't be a problem, especially if we add the table locking.
Regarding the panics, ultimately we mustn't have any, but if we did, I would say it's better the sooner it happens. Wouldn't want it to happen while Coder is in the middle of provisioning workspaces, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every replica doing database queries immediately on startup to try to do an update check,
And unless all replicas start up at the same time, this shouldn't be a problem
It's a common pattern to define a startup/access delay for workers that hit the same resource, usually 0-60 sec. Maybe it can help also here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll think about this for a follow up PR, I don't imagine a single database query will be problematic at startup, though. If we want to spread out service starts, it would be good to have a more structured plan than doing it individually and uniquely for each service.
if r.Checked.IsZero() || time.Since(r.Checked) > c.opts.Interval { | ||
r, err = c.update() | ||
if err != nil { | ||
return Result{}, xerrors.Errorf("update check failed: %w", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every replica doing database queries immediately on startup to try to do an update check,
And unless all replicas start up at the same time, this shouldn't be a problem
It's a common pattern to define a startup/access delay for workers that hit the same resource, usually 0-60 sec. Maybe it can help also here?
) | ||
|
||
// Checker is responsible for periodically checking for updates. | ||
type Checker struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of all replicas doing the update check, could you lock the config table row so only one replica will attempt to do an update check at a period of ~24 hours?
I might be blind here, but what exactly are the penalties? It doesn't seem to be frequent db access.
@@ -48,6 +48,7 @@ func AGPLRoutes(a *AuthTester) (map[string]string, map[string]RouteCheck) { | |||
"GET:/healthz": {NoAuthorize: true}, | |||
"GET:/api/v2": {NoAuthorize: true}, | |||
"GET:/api/v2/buildinfo": {NoAuthorize: true}, | |||
"GET:/api/v2/updatecheck": {NoAuthorize: true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the "updatecheck" can trigger a real update and potentially crash the deployment, I guess that it's safe to expose it to everyone. On the other hand, if there is a known security issue, all users would be informed that there is an update to install, so the system is currently vulnerable.
dismissible | ||
/> | ||
)} | ||
</> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it and I worried we wouldn't notice if it stops working for whatever reason. I can make it silent or add a prefix (Update check failed: Route not found.
). Ideally we would keep it silent and log it to Sentry or a similar bug tracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we could just add a console.error since we don't have Sentry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this now, it looks like this: https://www.chromatic.com/test?appId=624de63c6aacee003aa84340&id=6388bfa22e0a55f2ff0c1f1e
I'm still game to remove it if that's preferred. 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mafredri nope looks good!
current: true, | ||
url: "file:///mock-url", | ||
version: "v99.999.9999+c9cdf14", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
}, | ||
action: "read", | ||
}, | ||
} as const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I thought it was weird but I kept them just in case. I'll remove them and see if something breaks 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done! Code looks great so approving. I did have some trouble running locally, even after blowing away the db and running make gen
. What am I missing?
Co-authored-by: Kira Pilot <kira@coder.com>
Thanks! Could you share what issue/error you ran into? I'm not seeing any errors in either the API or site and I've tried clearing my whole repo with The only thing I did notice is that
So you kinda need to type (blindly) |
@mafredri I took another look this morning and everything loaded fine! I struggled last night before I lost power so maybe my connection was unstable or something. Everything looks great! |
This PR adds support for update checking to coder server.
Fixes #2701
TODO for site:
Refresh update check status at X interval?GlobalSnackbarAlertBanner or redirect on clickOr alternative display method?Description
The server implementation:
site_config
) so that we don't spam GitHub API on every startup (this could otherwise cause our customers IP to be rate-limited in a crash-loop scenario)/api/v2/updatecheck
Server log example:
The notification in the Web UI looks like this:
An alternative display method may be required, or extending GlobalSnackbar features (no timeout, HTML content, etc).