Skip to content

GitHub API rate limit exceed for change log generation in large repos (> 3000 issues) #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
skywinder opened this issue Mar 20, 2015 · 18 comments

Comments

@skywinder
Copy link
Member

Since I need to fetch details about each issue separately - and GitHub api has limit to 5000 requests per hour - Need to figure out some thing.
Generate second token as temporary workaround?

@skywinder skywinder added the bug label Mar 20, 2015
@phene
Copy link

phene commented Mar 20, 2015

Just ran into this myself on a large repo. Are there intermediary files created during the run so it can be resumed?

@skywinder
Copy link
Member Author

@phene Unfortunately, not yet.
I will try to figure out something soon. (at least - work with 2 tokens together - that should be enough even for biggest GitHub repos).

@sneal
Copy link
Contributor

sneal commented Mar 20, 2015

What about limiting the amount issues pulled back via a command line arg? I don't really need everything from 2 years ago, just the last few months would be great.

github_changelog_generator --max-issues 500

@skywinder
Copy link
Member Author

@sneal Yep, it's a good idea! I will consider this in next release.

skywinder added a commit that referenced this issue Mar 21, 2015
skywinder added a commit that referenced this issue Mar 21, 2015
@skywinder
Copy link
Member Author

Ok, I added fallback for this error, so version 1.3.11 not crashes anymore - just print warning message:

Warning: GitHub API rate limit exceed (5000 per hour), change log may not contain some issues.

skywinder added a commit that referenced this issue Mar 21, 2015
Add  fallback with warning message to prevent crash in case of exceed API Rate Limit (temporary workaround for #71)
@skywinder skywinder added this to the Advanced change log generation milestone Mar 25, 2015
@estahn
Copy link
Contributor

estahn commented Mar 30, 2015

There are currently two options in the pipeline to fix this:

Example:

project = 'skywinder/github-changelog-generator'

# Enable cache
stack = Faraday::RackBuilder.new do |builder|
  builder.response :logger if options[:debug]
  builder.use :http_cache, store: ActiveSupport::Cache::FileStore.new(File.join(Dir.tmpdir, 'ghclgen')) if options[:cache]
  builder.use Octokit::Response::RaiseError
  builder.adapter Faraday.default_adapter
end
Octokit.middleware = stack
# Octokit.auto_paginate = true

@client = Octokit::Client.new
releases = Octokit.releases project, :per_page => 2
releases.each do |release|
  puts release.inspect
end
tags = Octokit.tags project, :per_page => 2
tags.each do |tag|
  puts tag.inspect
end
issues = Octokit.issues project, :per_page => 2, :filter => :all, :state => :closed
issues.each do |issue|
  puts issue.inspect
end

@daviddannunzio
Copy link

Is there an eta for when the --max-issues argument fix will be released?

@skywinder
Copy link
Member Author

@daviddannunzio next week for sure!

@daviddannunzio
Copy link

Thanks @skywinder !

@daviddannunzio
Copy link

Perhaps I'm not building the gem correctly, but I tried running the current master branch using a very low --max-issues number, and I still get rate limit errors. It looks like the --max-issues field does not limit the actual Github call. It just limits the issues we return after we've queried the API. Is this correct?

@sneal
Copy link
Contributor

sneal commented Apr 3, 2015

@daviddannunzio Did you previously hit your rate limit without waiting for it to timeout/reset when you tried it with --max-issues?

Something to keep in mind is that max-issues is not exact because issues are pulled back in pages (30 issues per page). It stops pulling back more issues if it goes over max_issues.

@daviddannunzio
Copy link

@sneal I believe I waited a full hour before trying to run the command again. I still got the error. Even with --max-issues set to 1.

@skywinder
Copy link
Member Author

@daviddannunzio let's move discussion about --max-issues param to #76.
If you are sure, that it's a bug - please open an issue and put there output logs of the script.

@gravitystorm
Copy link

This is a major problem for the project that I'm working on (https://github.com/gravitystorm/openstreetmap-carto) due to the moderately large number of issues and PRs (1322 and 773 so far). It is taking me a while just to implement our initial changelog, since I need to change issue tags, change config values, run github_changelog_generator again to see the results - and then I rapidly run out of API requests. In the future I will run out of API requests before the changelog can be generated just once!

Also, it takes a long time to download all the issues, so iterating our config values is painful.

I investigated the github API briefly. The list of issues (e.g. https://api.github.com/repos/gravitystorm/openstreetmap-carto/issues ) contains last-updated timestamps, so it seems possible to cache the details and only update when the timestamp changes. Do you think this would work? This way, repeated runs of github_changelog_generator would be faster and involve fewer API requests.

@olleolleolle
Copy link
Collaborator

@gravitystorm Could you open a separate issue for the caching? (I dig it!)

@aelavender
Copy link

Hi, sorry to dredge up an old thread. Ran into the rate-limiting issue when trying to process a very large repo (~12k PRs merged over the history of the project).

It appears that the part that makes the vast majority of requests is fetching issue events, which needs to be done once per issue/PR. Would it be reasonable, for large repos, to skip this step and instead used the merged_at or closed_at timestamp to determine where in the release history an item should live?

I'm not particularly fluent in Ruby, otherwise I'd offer to help.

@ferrarimarco ferrarimarco modified the milestones: Advanced change log generation, 2.0.0 Jun 22, 2019
@skywinder
Copy link
Member Author

@aelavender can you give us example repo for the tests?

@skywinder skywinder removed the bug label Dec 30, 2019
@skywinder
Copy link
Member Author

Close it in favor of #361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants