Skip to content

Use git history to find PRs in a tag instead of time #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hunner
Copy link
Contributor

@hunner hunner commented Feb 10, 2018

Fixes #617

@hunner
Copy link
Contributor Author

hunner commented Feb 10, 2018

Currently the two extra round-trip calls to the github api per PR are agonizingly slow. This should be revised to use the existing git history instead of github's api.

What should GCG do for PRs that are "rebase and squash" PR merge strategies similar to how https://github.com/syl20bnr/spacemacs/ does it? I suspect that even using merged_at didn't work as no PRs are "merged" just closed and referenced in commit messages, so we can probably ignore that.

@hunner
Copy link
Contributor Author

hunner commented Feb 10, 2018

Issues have no "merge" event so would have to continue to be time-based. I wonder what happens for issues that are opened and closed at points in the future?

@zyphlar
Copy link

zyphlar commented Feb 13, 2018

FYI I tried copy-pasting these file revisions into my local copy to try and fix things, and got two errors in succession:

/var/lib/gems/2.3.0/gems/github_changelog_generator-1.14.3/lib/github_changelog_generator/generator/generator_fetcher.rb:49:in sha_in_tag?': undefined local variable or method tag' for #<GitHubChangelogGenerator::Generator:0x000000014edf58> (NameError)

and after fixing that one (just changing "tag" to "tag_sha")

/var/lib/gems/2.3.0/gems/github_changelog_generator-1.14.3/lib/github_changelog_generator/generator/generator_fetcher.rb:49:in sha_in_tag?': Unknown status 'diverged' for comparing [my commit from march 2016]...[my commit for one tag version ago] (StandardError)`

@hunner
Copy link
Contributor Author

hunner commented Feb 13, 2018

@zyphlar Thanks! I am testing on a repo without diverged history, so I didn't notice the misnamed variable and haven't yet tested that error condition.

For easier testing you can add my github fork to your git repo's remotes and check out the branch directly; no need to hand-copy changes. Also you can comment directly on the line of code in the files view instead of just in the comment thread making it easier to find the related line :).

@zyphlar
Copy link

zyphlar commented Feb 13, 2018 via email

@hunner
Copy link
Contributor Author

hunner commented Feb 13, 2018

@zyphlar Perhaps the latest commit will fix the "merge commit is tagged" issue. And has the tag_sha bugfix, and uses caching to be faster now.

I still haven't handled divergent histories, which breaks how it used to work with time-based sorting, so that's coming.

@olleolleolle
Copy link
Collaborator

(Appreciating this line of research a lot, @hunner!)

if issue["events"] && (event = issue["events"].find { |e| e["event"] == "merged" })
sha = event["commit_id"]

sha_between_tags?(sha, newer_tag, older_tag)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now use the issue["first_occurring_tag"] as it has already been fetched.

# @param newer_tag The tag to determine if sha is a member.
# @param older_tag The tag prior to the newer tag.
# @return [Boolean] Whether the sha exists in the newer/older tag comparison.
def sha_between_tags?(sha, newer_tag, older_tag)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed; info is available via add_first_occurring_tag_to_prs()

@ekohl
Copy link
Contributor

ekohl commented Feb 15, 2018

When running this on theforeman/puppet-pulp

github_changelog_generator -u theforeman -p puppet-pulp -o new-changelog.md --future-release 5.3.1

I get warnings about being unable to fetch commits:

Warning: Can't fetch commit ccda7a9c5899a2f73cd9736f33c0055dc7e30e5d. It is probably referenced from another repo.
Warning: Can't fetch commit f92213bec90879d940c656fe23aa4e3dfab1c829. It is probably referenced from another repo.
Warning: Can't fetch commit d1f04e13084600390a806585915fd1c9e3479f91. It is probably referenced from another repo.Warning: Can't fetch commit e86ef951db00ce4ee478155a6b670c5a95d107d7. It is probably referenced from another repo.
Warning: Can't fetch commit d6f1cb4c71e2416c639be3762ae1568786a4cd29. It is probably referenced from another repo.
Warning: Can't fetch commit 0d8f4823550555b9f2cc0bc25a8ba681ce2c384f. It is probably referenced from another repo.

Warning: Can't fetch commit d9538cf5242cffc295880120ec06cbe8db324a0d. It is probably referenced from another repo.Warning: Can't fetch commit e6debe8c630582c1c1bdf88385b1adc22a67d762. It is probably referenced from another repo.

Warning: Can't fetch commit 8d0ae9b281515f4d123496f312164ddca6da4f61. It is probably referenced from another repo.
Warning: Can't fetch commit eed8b28e2b870cee637fabb5a4449726fb51eb0a. It is probably referenced from another repo.

Oddly enough git show can find all commits.

@hunner
Copy link
Contributor Author

hunner commented Feb 15, 2018

Cache invalidation allows them to be fetched, but I need to add ordering those commits by PR references. The actual "merge" commit doesn't show up for any cherry-picked commits by github's api: https://api.github.com/repos/theforeman/puppet-pulp/commits/c27e1eb vs https://api.github.com/repos/theforeman/puppet-pulp/commits/fb96de5 for https://api.github.com/repos/theforeman/puppet-pulp/pulls/282 and https://api.github.com/repos/theforeman/puppet-pulp/issues/282/events

@hunner
Copy link
Contributor Author

hunner commented Feb 16, 2018

This should also fix #497

@hunner
Copy link
Contributor Author

hunner commented Feb 16, 2018

And should fix #615

@hunner hunner mentioned this pull request Feb 16, 2018
@hunner
Copy link
Contributor Author

hunner commented Feb 16, 2018

And should fix #560

@zyphlar
Copy link

zyphlar commented Feb 16, 2018 via email

@ekohl
Copy link
Contributor

ekohl commented Feb 20, 2018

A quick test on a reasonably complex repository tells me this would be much better than the current code. I'd still have some issues since we've cherry picked manually so there are now github PRs to refer to. This a fundamental problem and might need metadata in a config file since there is no API that could tell us that.

@hunner
Copy link
Contributor Author

hunner commented Feb 20, 2018

This is ready-to-go.

@hunner
Copy link
Contributor Author

hunner commented Feb 23, 2018

tl;dr: Me rubberducking to figure out what to do about PRs that have been rebased outside of github's api.

Conclusion: I should add an option that checks comment events of a PR that matches --rebase-comment followed by a git SHA and consider those commits additional "merge" events for the PR.


The problem of rebased (traditional, or cherry-picked) merge commits not matching the API means that historically-included changes for existing changelogs would be removed from the changelog with no recourse other than to:

  1. mv the current CHANGELOG.md to HISTORY.md
  2. not rebase ever again

The first point isn't that bad but is annoying. The second point may be untenable to some projects. It would be nice to come up with some way to recognize rebased commits.

How could we tell?

  • If there is no merge event or merge_commit_sha this does not always indicate the lack of a rebase merge. Eg: https://api.github.com/repos/syl20bnr/spacemacs/commits/d526121202565d1d85a8bfaf0dcbb043f38b8884 on the develop branch is the rebase merge of https://api.github.com/repos/syl20bnr/spacemacs/pulls/10361 . These kinds of PRs may be too ambitious to try. And these may have not ever been included anyway since they are not considered merged by github's api.
  • If the merge_commit_sha of a given PR exists in the api but is not found in the git history, that would be the best indicator that the commit was rebased. What can we match on?
  • A given rebased commit has the same commit/author and status properties... maybe.
  • The commit titles may be the same, or may not.
  • The commits in a merged PR may or may not have been squashed into fewer commits or had some excluded during rebase.
  • The commits may reference the PR, but this would have to be done manually so there isn't any events to rely on.
  • There are single-commit two-parent merges, single-commit squash merges, and single-commit non-squashed merges. Handling each of these would be hard.
  • The merge commit may have two parents. Is the 2nd parent always the PR's commit?

There are too many ambiguous criteria around detecting merged commits, so what other options do we have?

  • We can fallback to the time-based method like always.
  • We can search the PR events for a special comment of the form "rebased commit: " that can indicate the inclusion of the PR in a tag based on a rebased sha.

The first point would mean any rebased commits would be just as wrong as they were before without a way to fix it. Not great.

The second option would solve rebased merged PRs, rebased "closed" PRs (do we want to?), cherry-picked PRs in multiple tags. Comments can be deleted or updated after merge/close to help adopting GCG for older PRs, and there can be multiple comments of this form indicating several tags in which the PR could be rebased for.

The second option is susceptible to outside influences of malicious users comment-spamming closed PRs with random SHAs unrelated to the PR. Threads could be locked by admins, but this is unconventional and non-default. But the submitter of a PR can affect the title already, and only admins can apply labels. Authors of PRs may spam garbage PRs, close them, and then leave comments so maybe just admins should be recognized in the events. What about if admins are removed in the future? Comments are no longer by admins. I think we can live with that.

@hunner hunner changed the title Use git history to find PRs in a tag instead of time (WIP) Use git history to find PRs in a tag instead of time Feb 23, 2018
@hunner hunner force-pushed the merged_ordering branch 2 times, most recently from 0a59813 to 0ee5d45 Compare February 28, 2018 01:38
@ekohl
Copy link
Contributor

ekohl commented Feb 28, 2018

When running with --unreleased-only it generates a 4.0.2...5.0.1 diff but I'd rather have the 5.0.0...5.0.1 diff. The repository in question is https://github.com/theforeman/puppet-candlepin which released 4.0.2 after 5.0.0 because we have some maintenance branches for released versions where we only want to cherry pick bugfixes.

@hunner
Copy link
Contributor Author

hunner commented Mar 5, 2018

it generates a 4.0.2...5.0.1 diff but I'd rather have the 5.0.0...5.0.1

This is because the tag comparison generation is done purely by date ordering (as most of the GCG comparisons are before this change) and the fact that the tags are on divergent branches is not currently considered.

A hash map of (branches => tags they contain) in addition to the array of time-ordered tags would be needed to resolve changelog diff comparison URLs in a realistic fashion. There isn't currently any place that needs this functionality (though it might have helped in my shas_in_tag comparison) so I'm probably not going to add it yet.

@hunner hunner force-pushed the merged_ordering branch from 26ef333 to 1617ee7 Compare March 5, 2018 23:23
@hunner
Copy link
Contributor Author

hunner commented Mar 5, 2018

Rebased onto #628 for lint.

Where would docs for the rebase comment functionality go? I see https://github.com/skywinder/github-changelog-generator/wiki/Advanced-change-log-generation-examples but I'd rather have them in the codebase than on github's wiki if possible.

@olleolleolle
Copy link
Collaborator

The man page could have a section about that, with any amount of detail. That’s a Markdown document which we use ronn to create man pages etc from.

@hunner hunner force-pushed the merged_ordering branch from 1617ee7 to 0b1eb3b Compare March 6, 2018 16:26
@@ -203,6 +203,9 @@ Automatically generate changelog from your tags, issues, labels and pull request

Displays Help

## REBASED COMMITS

Github pull requests that have been merged whose merge commit SHA has been modified through rebasing, cherry picking, or some other method may be tracked via a special comment on Github. Git commit SHAs found in comments on pull requests matching the regular expression `/rebased commit: ([0-9a-f]{40})/i` will be used in place of the original merge SHA when being added to the changelog.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally tiny thing: Github => GitHub is the on-brand spelling.

@hunner hunner force-pushed the merged_ordering branch from 38999c7 to e23378e Compare March 7, 2018 18:45
@hunner hunner changed the title (WIP) Use git history to find PRs in a tag instead of time Use git history to find PRs in a tag instead of time Mar 7, 2018
@hunner
Copy link
Contributor Author

hunner commented Mar 7, 2018

Ready for review again. @ekohl hopefully except for the tag comparison ordering this makes your changelogs generate more correctly.

@hunner
Copy link
Contributor Author

hunner commented Mar 7, 2018

Fixes #580

@@ -164,6 +164,7 @@ def fetch_closed_pull_requests
pull_requests = []
options = { state: "closed" }

# XXX Why does this override the base (HISTORY.md) setting?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the options hash to pass to the @client not the @options hash for gcg.

@hunner
Copy link
Contributor Author

hunner commented Mar 14, 2018

@olleolleolle Anything you'd like me to check into for the impacts of this change? Any other feedback before merging?

print("Associating PRs with tags: #{i}/#{total}\r") if @options[:verbose]
end
else
raise StandardError, "No merge sha found for PR #{pr['number']}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a super-weird PR with regards to the github api: puppetlabs/puppetlabs-concat#2

As mentioned there, it has no events and no merge commit. This raises an error state in the current code that I didn't expect to ever trigger. Thanks github.

hunner added 6 commits April 3, 2018 15:53
Fixes github-changelog-generator#617

Before this change PRs were associated with tags by the merge date of
the PR, regardless of git branching history. In this change the oldest
tag in which the merge commit of each PR occurs in is determined by
using the Github compare API. That association is then used to sort PRs
into their respective release.

This change uses caching of commit history fetching and comparisons for
speedup.
Unreleased PRs will be associated with the next release based on either
the --release-branch or the GitHub default branch if a release branch is
not specified

PRs not found in tags or in the release branch (due to the SHAs changing
during a rebase) may be associated with SHAs specified in GitHub
comments.
These can be confused for methods or local variables without the @
@hunner hunner force-pushed the merged_ordering branch from 7de51bd to 2f2a2fa Compare April 3, 2018 23:00
@hunner
Copy link
Contributor Author

hunner commented Apr 4, 2018

@olleolleolle Rebased again.

@olleolleolle
Copy link
Collaborator

I’m leaning towards merging this and seeing any issues we can shake out on that side of the merge.

The solution offered in this PR changes the unreleased version’s way of working - but it was wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants