Skip to content

GitHub Rate Limit #10853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
beagleknight opened this issue Nov 24, 2023 · 29 comments · Fixed by #12286
Closed

GitHub Rate Limit #10853

beagleknight opened this issue Nov 24, 2023 · 29 comments · Fixed by #12286
Assignees
Labels
s1 Bugs that break core workflows. Only humans may set this.

Comments

@beagleknight
Copy link
Contributor

Hey folks,

We have a GitHub App that we used to authenticate our coder users using OAuth. Since few days ago we are receiving many complaints about hitting rate limits. Whenever we try to sign in with GitHub we receive the following error:

{
  "message": "Internal error exchanging Oauth code.",
  "detail": "oauth2: cannot fetch token: 429 Too Many Requests .....
}

It seems we are hitting the "secondary rate limit" but we don't have a way to check its status, neither to know what is causing this. Do you know what could be the issue?

@cdr-bot cdr-bot bot added the bug label Nov 24, 2023
@fcsonline
Copy link
Contributor

fcsonline commented Nov 24, 2023

After contacting GitHub support, we were able to see a spike in the number of OAuth requests Coder performed against GitHub API. After talking to the whole team, we didn't perform anything special during this period of time.

image

Can be some edge case in Coder code that performs many OAuth negotiations?

If this scenario happens again, we will share Coder logs.

@kylecarbs
Copy link
Member

I don't think there's a scenario where we'd be spamming the API... but maybe we have a bug that could cause a redirect loop or something, and a user just left their browser up?

Has this only happened once? Or is there a way we can seemingly trigger it?

@fcsonline
Copy link
Contributor

We experienced two times in the last two weeks, but we have not seen any pattern, yet. If we experience again this issue, we will go to Coder logs to retrieve more information and context to be able to understand what is going on.

@Emyrk
Copy link
Member

Emyrk commented Jan 8, 2024

@kylecarbs we have experienced this on our deployment today

@Emyrk Emyrk self-assigned this Jan 8, 2024
@Emyrk Emyrk added the s1 Bugs that break core workflows. Only humans may set this. label Jan 8, 2024
@Emyrk
Copy link
Member

Emyrk commented Jan 9, 2024

I am unable to reproduce this locally. We might want to add some debug/logging/instrumentation and then try to figure it out when we hit it in dogfood.

@Emyrk
Copy link
Member

Emyrk commented Jan 9, 2024

I created a PR that will at the very least allow some alerting when this happens from metrics.

I think we still need to add some more debugging code to solve this, as there is a few parts of the code we mess with oauth: login, new/update workspace, provisioners, browser user settings page, GIT_ASK_PASS (

coder/cli/gitaskpass.go

Lines 67 to 77 in 475be73

for r := retry.New(250*time.Millisecond, 10*time.Second); r.Wait(ctx); {
token, err = client.ExternalAuth(ctx, agentsdk.ExternalAuthRequest{
Match: host,
Listen: true,
})
if err != nil {
continue
}
cliui.Infof(inv.Stderr, "You've been authenticated with Git!")
break
}
), and maybe some more.

This will mainly determine if it is external auth or login

@Emyrk
Copy link
Member

Emyrk commented Jan 9, 2024

Some notes

Validate Token?

GIT_ASK_PASS has the potential to call ValidateToken repearedly. Both the api side and the cli side have for loops that constantly try to auth.

Checking the response headers for githubs yields:

ValidateToken counts as a core resource. If the token is invalid it has different rate limit responses.

[API] map[limit:60 remain:0 reset:1704830225 resource:core used:418]
[API] map[limit:60 remain:0 reset:1704830225 resource:core used:419]

If it is valid, it looks like this:

[API] map[limit:5000 remain:4921 reset:1704831855 resource:core used:79]
[API] map[limit:5000 remain:4918 reset:1704831855 resource:core used:82]

We do make quite a few calls

Just navigating around the dashboard triggers some calls.

coderd_oauth2_external_requests_total{name="primary-github",source="AppInstallations",status_code="200"} 5
coderd_oauth2_external_requests_total{name="primary-github",source="ValidateToken",status_code="200"} 5

@matifali
Copy link
Member

Hitting this again now.

{"message":"Internal error exchanging Oauth code.","detail":"oauth2: cannot fetch token: 429 Too Many Requests"}

@Emyrk
Copy link
Member

Emyrk commented Jan 10, 2024

My hunch is that we are genuinely hitting the rate limits. I think our webpage and some of our cli functions are just a bit demanding.

5k requests an hour is not that many.

@matifali

This comment was marked as duplicate.

@johnstcn
Copy link
Member

johnstcn commented Jan 11, 2024

Curiously enough, with the new instrumentation it appears that we're hitting the unauthenticated rate limit:

# HELP coderd_oauth2_external_requests_rate_limit_next_reset_unix Unix timestamp for when the next interval starts
# TYPE coderd_oauth2_external_requests_rate_limit_next_reset_unix gauge
coderd_oauth2_external_requests_rate_limit_next_reset_unix{name="github",resource="core-unauthorized"} 1.704974981e+09
# HELP coderd_oauth2_external_requests_rate_limit_remaining The remaining number of allowed requests in this interval.
# TYPE coderd_oauth2_external_requests_rate_limit_remaining gauge
coderd_oauth2_external_requests_rate_limit_remaining{name="github",resource="core-unauthorized"} 58
# HELP coderd_oauth2_external_requests_rate_limit_reset_in_seconds Seconds until the next interval
# TYPE coderd_oauth2_external_requests_rate_limit_reset_in_seconds gauge
coderd_oauth2_external_requests_rate_limit_reset_in_seconds{name="github",resource="core-unauthorized"} 3599.544220509
# HELP coderd_oauth2_external_requests_rate_limit_total The total number of allowed requests per interval.
# TYPE coderd_oauth2_external_requests_rate_limit_total gauge
coderd_oauth2_external_requests_rate_limit_total{name="github",resource="core-unauthorized"} 60

@Emyrk
Copy link
Member

Emyrk commented Jan 12, 2024

@johnstcn This is hit when we try to do a ValidateToken with a bad token. I have a quick fix idea that might help that...

@Emyrk
Copy link
Member

Emyrk commented Jan 12, 2024

Evidence suggest it is from refreshing tokens.

Screenshot from 2024-01-12 14-02-05

@Emyrk
Copy link
Member

Emyrk commented Jan 12, 2024

Another case of for loop load

If you click "Click to Login" and the next page doesn't happen right away (like you need to click some stuff). The UI tries to check if the authenticate happened by repeatedly hitting the backend which hits the ValidateURL

Screenshot from 2024-01-12 14-56-18

@johnstcn
Copy link
Member

Also noticed that "Test Validate" fires off two HTTP requests:

Screen.Recording.2024-01-12.at.21.01.07.mov

@matifali
Copy link
Member

I guess most of the tokens also have an expiry. Can't we check expiry before trying to refresh?

@Emyrk
Copy link
Member

Emyrk commented Jan 15, 2024

@matifali We should already be doing that, but we should check this on the ValidateToken method to prevent using tokens we already know are invalid.

I am not sure if this will 100% solve it though 🤔

@Emyrk
Copy link
Member

Emyrk commented Jan 25, 2024

Just dropping some more info. We are seeing some spikes in "TokenSource" calls. It's very infrequent (once in the past week).

Screenshot from 2024-01-25 11-38-14

Zooming out to the month is interesting:

Screenshot from 2024-01-25 11-39-26

It looks like it used to be a lot worse? Or someone is doing something in a workspace or something that used to be running more frequently?

@matifali
Copy link
Member

matifali commented Jan 25, 2024

@Emyrk it could be due to running the coder docs workspace. It runs yarn dev server that calls GitHub API extensively and I hit the rate limit and got blocked multiple times. I can run it again and probably you will see the spikes.

See this for context.

@Emyrk
Copy link
Member

Emyrk commented Jan 25, 2024

@Emyrk it could be due to running the coder docs workspace. It runs yarn dev server that calls GitHub API extensively and I hit the rate limit and got blocked multiple times. I can run it again and probably you will see the spikes.

See this for context.

Let's sync, run it, and see what happens.

@Emyrk
Copy link
Member

Emyrk commented Jan 26, 2024

If that is what is going on, a workspace consuming the rate limit, I wonder how we would solve this. Rate limit the usage by the workspace???

@matifali
Copy link
Member

@Emyrk it could be due to running the coder docs workspace. It runs yarn dev server that calls GitHub API extensively and I hit the rate limit and got blocked multiple times. I can run it again and probably you will see the spikes.
See this for context.

I was able to reproduce this. See the screenshot.

image

@bpmct
Copy link
Member

bpmct commented Feb 15, 2024

Is this still ongoing @Emyrk?

@Emyrk
Copy link
Member

Emyrk commented Feb 15, 2024

@bpmct Unfortunately.

A lot of positive changes made it into the codebase, but we discovered that at least 1 cause of this was a single workspace that was spamming the external auth api.

This is still possible to do, so the issue still remains that a single workspace can consume the external auth rate limit. In practice, because Coder is deployed within known groups, this isn't the biggest security threat IMO since it's just a DOS attack from inside. It is more likely to just be a mistake in some script (that is what is was on our end).

@Emyrk
Copy link
Member

Emyrk commented Feb 20, 2024

I will confirm if a single workspace can use the entire rate limit of an app across multiple users.

@Emyrk
Copy link
Member

Emyrk commented Feb 23, 2024

I just confirmed that authenticated rate limits are per user:

https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/rate-limits-for-oauth-apps

All of these requests count towards your personal rate limit of 5,000 requests per hour.

So the rate limit tracking I implemented is a bit moot unless I also label with each user_id. Which feels a bit excessive?

@johnstcn
Copy link
Member

So the rate limit tracking I implemented is a bit moot unless I also label with each user_id. Which feels a bit excessive?

Agreed, we definitely don't want to label metrics with UUIDs.

@Emyrk
Copy link
Member

Emyrk commented Feb 23, 2024

I think we were really hitting the unauthenticated rate limit before. I drastically reduced it, but it is still possible to abuse our api. Although you'd have to be more deliberate now.

The PR has the info.

@francisco-mata
Copy link

We are hitting this issue on Coder v2.10.3

image

We are not sure if users are triggering this aumount of Token Source code 429 and 200. We started to see this issue when we accidentally corrupt our DB and we had to recover it with a daily backup we had of it. Right after the restore, the following day we start too see the issue of oauth too many requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s1 Bugs that break core workflows. Only humans may set this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants