-
Notifications
You must be signed in to change notification settings - Fork 533
Fix Git::Lib#commit_data
for GPG-signed commits
#610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Git::Lib#commit_data
for GPG-signed commits
#610
Conversation
Git::Lib#commit_data currently produces a malformed data hash for a commit containing a gitsig header field, with the majority of the PGP signature being considered part of the message. I'd like to fix this, so this introduces a new example repo with a single signed commit in it. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
Git::Lib#commit_data currently misparses commits containing a multiline gpgsig header, extracting only the first line as the gpgsig entry, and considering the rest of the key part of the commit message. Per the git docs[1] the gpgsig header is used to give the GPG signature of a signed commit, with continuation lines denoted by a leading space: tree eebfed94e75e7760540d1485c740902590a00332 parent 04b871796dc0420f8e7561a895b52484b701d51a author A U Thor <author@example.com> 1465981137 +0000 committer C O Mitter <committer@example.com> 1465981137 +0000 gpgsig -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 $ iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/ HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7 DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4 HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1 EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I= =jKHM -----END PGP SIGNATURE----- signed commit signed commit message body This commit adds a test and a special parsing case for the gpgsig header to accommodate this. [1] https://git-scm.com/docs/gitformat-signature#_commit_signatures Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
In the previous commit I introduced a new case for git commit header parsing to cover the gpgsig header, which supports multi-line values. I think the #process_commit_data method is getting a bit unwieldy, so this commit extracts the generic header parsing, separating it from the special-case handling of parent (which is not multi-line, but can have multiple entries and thus multiple values). By switching to a regex key/value extraction approach we're also able to avoid splitting the entire string before re-joining the value. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
The general format of `git cat-file tag` output is identical to that of `git cat-file commit`, so we can use the generic parsing helper. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
These methods always explicitly set the 'message' key of the output hash based on the `git cat-file` output, so we don't need a default in the starting hash. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
This was introduced in [1] to handle variance in indentation when parsing commit messages; at the time the #process_commit_data method had two callers, one which used it to process log lines, and the other which used it to process the output of `git cat-file commit <SHA>`. The former sees a 4-space indent, the latter, zero. Since [2], however, the log processing has been handled by the #process_commit_log_data method, so #process_commit_data exclusively handles un-indented inputs, and we can remove the indent parameter. [1] ruby-git@05117d4 [2] ruby-git@97e1fff Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
As with the previous commit, this default value was relevant when this method was also used to process git log output. It no longer is, and its only caller passes the sha value, so we can remove the default. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
As with the previous commits, this indent parameter appears to be a relic of old usage; the #process_tag_data method has a single caller that explicitly sets the indent to zero, so the indent parameter and associated handling can just be removed. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
2767684
to
392cdf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great!
Great job and thank you for your contribution. I think the one test case is sufficient but I don't understand the use case as well as you do. If you think the codebase would benefit from additional test cases, they would be gladly accepted as a separate PR. I am not a big fan of committing these test repos vs. creating them on the fly because it can be hard to understand the condition that is being tested. If you wanted to redo the test case and add creating the test repository on the fly, that would also be gladly accepted as a separate PR. #602 and #513 should different ways of doing that. As far as the tidying, seems great to me. Unused / untested code is technical debt. Thanks for removing it. |
Thanks for the merge!
Ah cool - I hadn't spotted that there were dynamically-created repos for some tests, so I followed the static pattern. Agreed that on-the-fly creation seems much clearer if it can be made to work with the PGP signing, I'll have a go. |
This replaces the test repo introduced in ruby-git#610 with one created on the fly in the test. I've switched from GPG to SSH signing to minimise the likelihood that we'll need extra dependencies on CI or other contributors' machines. I've also removed some of the commit metadata assertions that didn't feel particularly necessary, and this allowed me to reduce the setup steps for clarity.
This replaces the test repo introduced in ruby-git#610 with one created on the fly in the test. I've switched from GPG to SSH signing to minimise the likelihood that we'll need extra dependencies on CI or other contributors' machines. I've also removed some of the commit metadata assertions that didn't feel particularly necessary, and this allowed me to reduce the setup steps for clarity. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
This replaces the test repo introduced in #610 with one created on the fly in the test. I've switched from GPG to SSH signing to minimise the likelihood that we'll need extra dependencies on CI or other contributors' machines. I've also removed some of the commit metadata assertions that didn't feel particularly necessary, and this allowed me to reduce the setup steps for clarity. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
Your checklist for this pull request
🚨Please review the guidelines for contributing to this repository.
Description
Git::Lib#commit_data
currently misparses commits containing a multilinegpgsig
header, extracting only the first line as thegpgsig
entry, and considering the rest of the key part of the commit message. For example, in this repo:Per the git docs the
gpgsig
header is used to give the GPG signature of a signed commit, with continuation lines denoted by a leading space:This PR updates
Git::Lib#commit_data
to support multiline header values, includinggpgsig
.Feedback
All feedback is most welcome, but in particular:
#process_{x}_data
methods to remove unused parameters or default values that confused me when reading the code. If you'd prefer I kept that sort of thing to a separate PR, I'd be happy to do so.