-
Notifications
You must be signed in to change notification settings - Fork 886
GitHub Rate Limit #14982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey, I see you're 6 versions behind at this point. Is updating a practical option? cc @Emyrk is probably the best on our side to investigate. |
We will keep current version for a while. Now we have monitoring of this issue with Prometheus. Currently since Friday, we stop seeing issue. If we have new findings I will come here to report them. And considering to upgrade server version :) |
What may have resolved your issue is all the tokens hit the expiration time stored in the database. So we can invalidate the tokens before reaching out to github. What is strange strange is the I will take a longer look into this section of code for infinite loop style behavior. For your specific situation though, I'd ask what the |
@Emyrk is there a safe way to delete all expired tokens directly from DB? I found the oauth_expirary, we got 56 hits, should I just deleted them? Does this action might break any correlation from other Table?
|
@francisco-mata do you also have refresh tokens? If you delete all rows with expired oauth, users will have to relink. Oauth can be expired with a refresh key though, and the key will be refreshed on the next use. If the refresh key is also expired, or invalid because it was already used, then the key is essentially useless. But you cannot tell if the refresh is expired, as that is all kept on the IDP end of things. |
Hi @Emyrk, I will try to delete all old "expired" row to refresh that table to see if its possible to un-trigger the event. So we are considering to upgrade Coder Server, but we try upgrading Coder Stage deployment from 2.10 directly to 2.15 and DB crash, there are some migrations issues on DB. We could jump only to 2.11, after 2.12 it start to crash some of the migrations. Example: 2.11 to 2.12
2.12 to 2.13
So we are considering now to do a fresh 2.15 deployment and thne restore DB to restore data on new instance, because altering DB structure without really knowing the structure seems a flaky fix. |
@francisco-mata I see your status codes are 200.Is there any way we can get insight into the workspace activity? I ask because |
@Emyrk I dont follow you. We have 2 Coder + Github interactions. The one that block us is the Login using the Oauth. Can you explain how this token might be used? We expected every user when logged in via Github, when click the auth button, will go to Github and create a token that auths the user email belong to specific organization and it will grant access to Coder WebUI. Would be just 1 call/per user. Is there another process that create a new Oauth token that called this app inside Coder workflow? We know the other token created by the Github app should be the used by the WS if do git pull, or git fetch and so on, but it also should just generate 1 by WS and should be valid for X time. |
I am assuming the large amount of requests is from the External Auth to do WS operations. There should be a
Every You are correct that |
Awesome, I didnt saw the label name.... really useful. Yeah you are correct seems the issue is on the external github. We need to check with the template owners to see which WS could be triggering this loop of calls. Is there a way to have the visibility on which workspace might be triggering these Oauth requests? Or maybe a network on pod? Could be a possibility for Coder to alert or block a WS if it creates 500 request/min or X reqts/hr ? |
@francisco-mata There is no way today to know which workspace it is. Unfortunately I decided against adding labels for the workspaces, as workspace IDs are an unbounded set. And prometheus doe not like infinite labels.
This was an idea posted here: #11843. We had no reports of this issue for awhile, so we closed it. Might be worth bringing back. |
@Emyrk seems like a good feature to have in place, so you can protect Enterprise Infrastructure from a Coder misconfiguration or user miss behaving. We move our plan to deploy 2.15, we had some migration issues we didnt found solution so we only migrate templates and start WS from scratch. |
I am unable to identify any sort of loop in Coder that blows up the A solution I can (and will today) implement is to not try and refresh a token we know will not work. Currently if a refresh token is valid, we will continually retry on each invocation. This should limit the retry count for any single token. I do not think the spec has a standard "Bad Refresh" token response, so I'm unsure if this will work universally, as I need to know the error is with the refresh token, and not some unrelated error that a retry might fix. I have this idea in my head, will look into the implementation details and see how bullet proof I can make this. I think it should prevent these excessive calls from what I can tell. |
Hi @Emyrk, we finally upgraded to latest version and restore templates, this issue block all Coder operations so we needed to do something. We couldn't restore all DB due to migration issues on DBs. But well we figure it out, a full wipe out of all users/ws, just restoring templates on latest version. Even tho, as you say its better to be bullet proof on this case, because hitting this issue block all auth on Coder via Github. Also exploring the idea of having multiple auth id users for the same id Coder user so it can link up to same WS inside Coder might help out. We usually use LDAPs on some our internal applications, and you can correlate the same email address on user LDAPs and Github as the same Coder User. Regards |
@francisco-mata there is likely still value in creating an internal rate limit. My hesitancy is requiring even more configuration to make that work across all different types of IDPs. In the short term, I have this PR #15608 that might fix the issue. Essentially, if there is a problematic token, and it's spamming because of a bad refresh token, then this reduces it's spam to 1 refresh call per token. Rather than 1 refresh call per api request ( This will only fix it if the problem is related to refresh tokens, which I have a suspicion it might be. |
The same way I solved this for invalid refresh tokens, we should solve this for revoked tokens: #15890 |
FWIW, GitHub has some guidelines around hitting their rate limits, which includes a few headers with information about when the rate limits expire. |
@davo-canva yes, we should take this into account. Technically the external apps are generic oauth apps, I'm not sure how consistent these headers are across services. Considering these tokens are validated via some user action, like |
Moving this as a candidate for our next sprint to respect GitHub rate limits: #16040 |
We are hitting this issue: GitHub Rate Limit #10853
We are hitting this issue on Coder v2.10.3
We are not sure if users are triggering this aumount of Token Source code 429 and 200. We started to see this issue when we accidentally corrupt our DB and we had to recover it with a daily backup we had of it. Right after the restore, the following day we start too see the issue of oauth too many requests.
The text was updated successfully, but these errors were encountered: