GitHub Rate Limit #14982

francisco-mata · 2024-10-04T18:00:08Z

We are hitting this issue: GitHub Rate Limit #10853

We are hitting this issue on Coder v2.10.3

We are not sure if users are triggering this aumount of Token Source code 429 and 200. We started to see this issue when we accidentally corrupt our DB and we had to recover it with a daily backup we had of it. Right after the restore, the following day we start too see the issue of oauth too many requests.

ammario · 2024-10-05T18:22:32Z

Hey, I see you're 6 versions behind at this point. Is updating a practical option?

cc @Emyrk is probably the best on our side to investigate.

francisco-mata · 2024-10-07T19:52:40Z

We will keep current version for a while. Now we have monitoring of this issue with Prometheus.

Currently since Friday, we stop seeing issue. If we have new findings I will come here to report them. And considering to upgrade server version :)

Emyrk · 2024-10-09T14:39:06Z

What may have resolved your issue is all the tokens hit the expiration time stored in the database. So we can invalidate the tokens before reaching out to github.

What is strange strange is the TokenSource response code seems to be 200 in your metrics. A 200 would not trigger any invalidation code, and it seems for whatever reason the tokens believe they should be continually refreshed.

I will take a longer look into this section of code for infinite loop style behavior.

For your specific situation though, I'd ask what the oauth_expirary column of external_auth_links seems to generally be. And you might have gone past the original expiry, resetting all the tokens after the db corruption.

francisco-mata · 2024-10-10T21:43:02Z

@Emyrk is there a safe way to delete all expired tokens directly from DB?

I found the oauth_expirary, we got 56 hits, should I just deleted them? Does this action might break any correlation from other Table?

select * from external_auth_links WHERE oauth_expiry < NOW();

Emyrk · 2024-10-14T15:05:00Z

@francisco-mata do you also have refresh tokens?

If you delete all rows with expired oauth, users will have to relink.

Oauth can be expired with a refresh key though, and the key will be refreshed on the next use.

If the refresh key is also expired, or invalid because it was already used, then the key is essentially useless. But you cannot tell if the refresh is expired, as that is all kept on the IDP end of things.

francisco-mata · 2024-10-14T16:15:46Z

Hi @Emyrk,

I will try to delete all old "expired" row to refresh that table to see if its possible to un-trigger the event.

We hit the issue recently,

So we are considering to upgrade Coder Server, but we try upgrading Coder Stage deployment from 2.10 directly to 2.15 and DB crash, there are some migrations issues on DB. We could jump only to 2.11, after 2.12 it start to crash some of the migrations.

Example:

2.11 to 2.12

ALTER TABLE template_versions ADD CONSTRAINT template_versions_id_key UNIQUE (id);

2.12 to 2.13

ALTER TABLE users ADD CONSTRAINT users_id_key UNIQUE (id);
ALTER TABLE users ADD PRIMARY KEY (id);

So we are considering now to do a fresh 2.15 deployment and thne restore DB to restore data on new instance, because altering DB structure without really knowing the structure seems a flaky fix.

Emyrk · 2024-10-14T17:20:00Z

@francisco-mata I see your status codes are 200.Is there any way we can get insight into the workspace activity?

I ask because GIT_ASKPASS does an external auth query on each git operation. So a workspace doing a lot of git cli commands could be triggering these auth token Validates?

francisco-mata · 2024-10-16T15:38:48Z

@Emyrk I dont follow you. We have 2 Coder + Github interactions.
Login using Github Oauth and External Auth using a Github App to do WS operations.

The one that block us is the Login using the Oauth. Can you explain how this token might be used? We expected every user when logged in via Github, when click the auth button, will go to Github and create a token that auths the user email belong to specific organization and it will grant access to Coder WebUI. Would be just 1 call/per user. Is there another process that create a new Oauth token that called this app inside Coder workflow?

We know the other token created by the Github app should be the used by the WS if do git pull, or git fetch and so on, but it also should just generate 1 by WS and should be valid for X time.

Emyrk · 2024-10-17T17:21:55Z

Login using Github Oauth and External Auth using a Github App to do WS operations.

I am assuming the large amount of requests is from the External Auth to do WS operations. There should be a name property on the metrics to tell you if it is github-login or the External Auth app. It might separating TokenSource by the name label so we can know for sure.

We know the other token created by the Github app should be the used by the WS if do git pull, or git fetch and so on, but it also should just generate 1 by WS and should be valid for X time.

Every git operation can trigger a ValidateToken request to github. This is because a token can be revoked prematurely, so it's impossible to tell if a token will work without a request.

You are correct that TokenSource though should only occur on an expired auth token.

francisco-mata · 2024-10-18T20:19:24Z

Awesome, I didnt saw the label name.... really useful. Yeah you are correct seems the issue is on the external github. We need to check with the template owners to see which WS could be triggering this loop of calls.

Is there a way to have the visibility on which workspace might be triggering these Oauth requests? Or maybe a network on pod?

Could be a possibility for Coder to alert or block a WS if it creates 500 request/min or X reqts/hr ?

Emyrk · 2024-10-21T12:58:42Z

@francisco-mata There is no way today to know which workspace it is. Unfortunately I decided against adding labels for the workspaces, as workspace IDs are an unbounded set. And prometheus doe not like infinite labels.

Could be a possibility for Coder to alert or block a WS if it creates 500 request/min or X reqts/hr ?

This was an idea posted here: #11843. We had no reports of this issue for awhile, so we closed it. Might be worth bringing back.

francisco-mata · 2024-10-23T21:22:16Z

@Emyrk seems like a good feature to have in place, so you can protect Enterprise Infrastructure from a Coder misconfiguration or user miss behaving. We move our plan to deploy 2.15, we had some migration issues we didnt found solution so we only migrate templates and start WS from scratch.

Emyrk · 2024-11-20T15:03:02Z

I am unable to identify any sort of loop in Coder that blows up the TokenSource function.

A solution I can (and will today) implement is to not try and refresh a token we know will not work. Currently if a refresh token is valid, we will continually retry on each invocation. This should limit the retry count for any single token.

I do not think the spec has a standard "Bad Refresh" token response, so I'm unsure if this will work universally, as I need to know the error is with the refresh token, and not some unrelated error that a retry might fix.

I have this idea in my head, will look into the implementation details and see how bullet proof I can make this. I think it should prevent these excessive calls from what I can tell.

francisco-mata · 2024-11-20T15:25:33Z

Hi @Emyrk, we finally upgraded to latest version and restore templates, this issue block all Coder operations so we needed to do something.

We couldn't restore all DB due to migration issues on DBs. But well we figure it out, a full wipe out of all users/ws, just restoring templates on latest version. Even tho, as you say its better to be bullet proof on this case, because hitting this issue block all auth on Coder via Github.

Also exploring the idea of having multiple auth id users for the same id Coder user so it can link up to same WS inside Coder might help out. We usually use LDAPs on some our internal applications, and you can correlate the same email address on user LDAPs and Github as the same Coder User.

Regards

Emyrk · 2024-11-20T19:10:27Z

@francisco-mata there is likely still value in creating an internal rate limit. My hesitancy is requiring even more configuration to make that work across all different types of IDPs.

In the short term, I have this PR #15608 that might fix the issue. Essentially, if there is a problematic token, and it's spamming because of a bad refresh token, then this reduces it's spam to 1 refresh call per token. Rather than 1 refresh call per api request (git action, loading the account settings UI page, etc).

This will only fix it if the problem is related to refresh tokens, which I have a suspicion it might be.

Emyrk · 2024-12-16T21:58:52Z

The same way I solved this for invalid refresh tokens, we should solve this for revoked tokens: #15890

davo-canva · 2024-12-20T00:54:52Z

FWIW, GitHub has some guidelines around hitting their rate limits, which includes a few headers with information about when the rate limits expire.

https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#exceeding-the-rate-limit

Emyrk · 2024-12-20T15:13:06Z

@davo-canva yes, we should take this into account. Technically the external apps are generic oauth apps, I'm not sure how consistent these headers are across services.

Considering these tokens are validated via some user action, like git clone, the best we can do is just block the request locally. Very unfortunate, but better than a 429

bpmct · 2025-01-04T20:57:38Z

Moving this as a candidate for our next sprint to respect GitHub rate limits: #16040

coder-labeler bot added bug customer-reported Bugs reported by enterprise customers. Only humans may set this. labels Oct 4, 2024

francisco-mata mentioned this issue Oct 7, 2024

Allow user account to have multiple Authentication methods #15014

Open

Emyrk self-assigned this Oct 14, 2024

matifali removed the bug label Oct 14, 2024

Emyrk removed their assignment Oct 22, 2024

johnstcn mentioned this issue Nov 20, 2024

Rate limit external auth requests #11843

Open

Emyrk mentioned this issue Nov 20, 2024

chore: do not refresh tokens that have already failed refreshing #15608

Merged

Emyrk mentioned this issue Dec 16, 2024

Do not try an validate Oauth tokens that are known to be invalid #15890

Open

bpmct mentioned this issue Jan 4, 2025

Respect GitHub rate limits #16040

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Rate Limit #14982

GitHub Rate Limit #14982

francisco-mata commented Oct 4, 2024 •

edited by ammario

Loading

ammario commented Oct 5, 2024

francisco-mata commented Oct 7, 2024

Emyrk commented Oct 9, 2024

francisco-mata commented Oct 10, 2024

Emyrk commented Oct 14, 2024

francisco-mata commented Oct 14, 2024

Emyrk commented Oct 14, 2024

francisco-mata commented Oct 16, 2024

Emyrk commented Oct 17, 2024

francisco-mata commented Oct 18, 2024 •

edited

Loading

Emyrk commented Oct 21, 2024

francisco-mata commented Oct 23, 2024

Emyrk commented Nov 20, 2024

francisco-mata commented Nov 20, 2024 •

edited

Loading

Emyrk commented Nov 20, 2024

Emyrk commented Dec 16, 2024

davo-canva commented Dec 20, 2024

Emyrk commented Dec 20, 2024

bpmct commented Jan 4, 2025 •

edited

Loading

GitHub Rate Limit #14982

GitHub Rate Limit #14982

Comments

francisco-mata commented Oct 4, 2024 • edited by ammario Loading

ammario commented Oct 5, 2024

francisco-mata commented Oct 7, 2024

Emyrk commented Oct 9, 2024

francisco-mata commented Oct 10, 2024

Emyrk commented Oct 14, 2024

francisco-mata commented Oct 14, 2024

Emyrk commented Oct 14, 2024

francisco-mata commented Oct 16, 2024

Emyrk commented Oct 17, 2024

francisco-mata commented Oct 18, 2024 • edited Loading

Emyrk commented Oct 21, 2024

francisco-mata commented Oct 23, 2024

Emyrk commented Nov 20, 2024

francisco-mata commented Nov 20, 2024 • edited Loading

Emyrk commented Nov 20, 2024

Emyrk commented Dec 16, 2024

davo-canva commented Dec 20, 2024

Emyrk commented Dec 20, 2024

bpmct commented Jan 4, 2025 • edited Loading

francisco-mata commented Oct 4, 2024 •

edited by ammario

Loading

francisco-mata commented Oct 18, 2024 •

edited

Loading

francisco-mata commented Nov 20, 2024 •

edited

Loading

bpmct commented Jan 4, 2025 •

edited

Loading