Skip to content

Commit b6e031d

Browse files
Fix Git::Lib#commit_data for GPG-signed commits (#610)
* Introduce example repo with a signed commit Git::Lib#commit_data currently produces a malformed data hash for a commit containing a gitsig header field, with the majority of the PGP signature being considered part of the message. I'd like to fix this, so this introduces a new example repo with a single signed commit in it. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> * Fix parsing of multiline gpgsig commit header field Git::Lib#commit_data currently misparses commits containing a multiline gpgsig header, extracting only the first line as the gpgsig entry, and considering the rest of the key part of the commit message. Per the git docs[1] the gpgsig header is used to give the GPG signature of a signed commit, with continuation lines denoted by a leading space: tree eebfed94e75e7760540d1485c740902590a00332 parent 04b871796dc0420f8e7561a895b52484b701d51a author A U Thor <author@example.com> 1465981137 +0000 committer C O Mitter <committer@example.com> 1465981137 +0000 gpgsig -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 $ iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/ HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7 DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4 HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1 EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I= =jKHM -----END PGP SIGNATURE----- signed commit signed commit message body This commit adds a test and a special parsing case for the gpgsig header to accommodate this. [1] https://git-scm.com/docs/gitformat-signature#_commit_signatures Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> * Extract commit header parsing In the previous commit I introduced a new case for git commit header parsing to cover the gpgsig header, which supports multi-line values. I think the #process_commit_data method is getting a bit unwieldy, so this commit extracts the generic header parsing, separating it from the special-case handling of parent (which is not multi-line, but can have multiple entries and thus multiple values). By switching to a regex key/value extraction approach we're also able to avoid splitting the entire string before re-joining the value. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> * Use cat-file header parsing for Git::Lib#tag_data The general format of `git cat-file tag` output is identical to that of `git cat-file commit`, so we can use the generic parsing helper. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> * Remove unnecessary default tag/commit message values These methods always explicitly set the 'message' key of the output hash based on the `git cat-file` output, so we don't need a default in the starting hash. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> * Remove #process_commmit_data indent parameter This was introduced in [1] to handle variance in indentation when parsing commit messages; at the time the #process_commit_data method had two callers, one which used it to process log lines, and the other which used it to process the output of `git cat-file commit <SHA>`. The former sees a 4-space indent, the latter, zero. Since [2], however, the log processing has been handled by the #process_commit_log_data method, so #process_commit_data exclusively handles un-indented inputs, and we can remove the indent parameter. [1] 05117d4 [2] 97e1fff Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> * Remove #process_commit_data default sha value As with the previous commit, this default value was relevant when this method was also used to process git log output. It no longer is, and its only caller passes the sha value, so we can remove the default. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> * Remove #process_tag_data indent parameter As with the previous commits, this indent parameter appears to be a relic of old usage; the #process_tag_data method has a single caller that explicitly sets the indent to zero, so the indent parameter and associated handling can just be removed. Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> --------- Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com> Co-authored-by: James Couball <jcouball@yahoo.com>
1 parent b12b820 commit b6e031d

File tree

12 files changed

+99
-26
lines changed

12 files changed

+99
-26
lines changed

lib/git/lib.rb

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -221,54 +221,57 @@ def object_size(sha)
221221
def commit_data(sha)
222222
sha = sha.to_s
223223
cdata = command_lines('cat-file', 'commit', sha)
224-
process_commit_data(cdata, sha, 0)
224+
process_commit_data(cdata, sha)
225225
end
226226

227-
def process_commit_data(data, sha = nil, indent = 4)
227+
def process_commit_data(data, sha)
228228
hsh = {
229-
'sha' => sha,
230-
'message' => '',
231-
'parent' => []
229+
'sha' => sha,
230+
'parent' => []
232231
}
233232

234-
loop do
235-
key, *value = data.shift.split
236-
237-
break if key.nil?
238-
233+
each_cat_file_header(data) do |key, value|
239234
if key == 'parent'
240-
hsh['parent'] << value.join(' ')
235+
hsh['parent'] << value
241236
else
242-
hsh[key] = value.join(' ')
237+
hsh[key] = value
243238
end
244239
end
245240

246-
hsh['message'] = data.collect {|line| line[indent..-1]}.join("\n") + "\n"
241+
hsh['message'] = data.join("\n") + "\n"
247242

248243
return hsh
249244
end
250245

246+
CAT_FILE_HEADER_LINE = /\A(?<key>\w+) (?<value>.*)\z/
247+
248+
def each_cat_file_header(data)
249+
while (match = CAT_FILE_HEADER_LINE.match(data.shift))
250+
key = match[:key]
251+
value_lines = [match[:value]]
252+
253+
while data.first.start_with?(' ')
254+
value_lines << data.shift.lstrip
255+
end
256+
257+
yield key, value_lines.join("\n")
258+
end
259+
end
260+
251261
def tag_data(name)
252262
sha = sha.to_s
253263
tdata = command_lines('cat-file', 'tag', name)
254-
process_tag_data(tdata, name, 0)
264+
process_tag_data(tdata, name)
255265
end
256266

257-
def process_tag_data(data, name, indent=4)
258-
hsh = {
259-
'name' => name,
260-
'message' => ''
261-
}
262-
263-
loop do
264-
key, *value = data.shift.split
265-
266-
break if key.nil?
267+
def process_tag_data(data, name)
268+
hsh = { 'name' => name }
267269

268-
hsh[key] = value.join(' ')
270+
each_cat_file_header(data) do |key, value|
271+
hsh[key] = value
269272
end
270273

271-
hsh['message'] = data.collect {|line| line[indent..-1]}.join("\n") + "\n"
274+
hsh['message'] = data.join("\n") + "\n"
272275

273276
return hsh
274277
end
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ref: refs/heads/main
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[core]
2+
repositoryformatversion = 0
3+
filemode = true
4+
bare = false
5+
logallrefupdates = true
6+
ignorecase = true
7+
precomposeunicode = true
137 Bytes
Binary file not shown.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0000000000000000000000000000000000000000 a043c677c93d9f2b285771a29742cc73715e41ea Simon Coffey <simon.coffey@futurelearn.com> 1673868871 +0000 commit (initial): Signed commit
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0000000000000000000000000000000000000000 a043c677c93d9f2b285771a29742cc73715e41ea Simon Coffey <simon.coffey@futurelearn.com> 1673868871 +0000 commit (initial): Signed commit
Binary file not shown.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
a043c677c93d9f2b285771a29742cc73715e41ea

tests/files/signed_commits/notes.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
it is very important that changes to this file are verified

tests/units/test_signed_commits.rb

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#!/usr/bin/env ruby
2+
3+
require File.dirname(__FILE__) + '/../test_helper'
4+
require "fileutils"
5+
6+
class TestSignedCommits < Test::Unit::TestCase
7+
def git_working_dir
8+
cwd = FileUtils.pwd
9+
if File.directory?(File.join(cwd, 'files'))
10+
test_dir = File.join(cwd, 'files')
11+
elsif File.directory?(File.join(cwd, '..', 'files'))
12+
test_dir = File.join(cwd, '..', 'files')
13+
elsif File.directory?(File.join(cwd, 'tests', 'files'))
14+
test_dir = File.join(cwd, 'tests', 'files')
15+
end
16+
17+
create_temp_repo(File.expand_path(File.join(test_dir, 'signed_commits')))
18+
end
19+
20+
def create_temp_repo(clone_path)
21+
filename = 'git_test' + Time.now.to_i.to_s + rand(300).to_s.rjust(3, '0')
22+
@tmp_path = File.join("/tmp/", filename)
23+
FileUtils.mkdir_p(@tmp_path)
24+
FileUtils.cp_r(clone_path, @tmp_path)
25+
tmp_path = File.join(@tmp_path, File.basename(clone_path))
26+
Dir.chdir(tmp_path) do
27+
FileUtils.mv('dot_git', '.git')
28+
end
29+
tmp_path
30+
end
31+
32+
def setup
33+
@lib = Git.open(git_working_dir).lib
34+
end
35+
36+
def test_commit_data
37+
data = @lib.commit_data('a043c677c93d9f2b')
38+
39+
assert_equal('Simon Coffey <simon.coffey@futurelearn.com> 1673868871 +0000', data['author'])
40+
assert_equal('92fd5b7c1aeb6a4e2602799f01608b3deebbad2d', data['tree'])
41+
assert_equal(<<~EOS.chomp, data['gpgsig'])
42+
-----BEGIN PGP SIGNATURE-----
43+
44+
iHUEABYKAB0WIQRmiEtd91BkbBpcgV2yCJ+VnJz/iQUCY8U2cgAKCRCyCJ+VnJz/
45+
ibjyAP48dGdoFgWL2BjV3CnmebdVjEjTCQtF2QGUybJsyJhhcwEAwbzAAGt3YHfS
46+
uuLNH9ki9Sqd+/CH+L8Q2dPM5F4l3gg=
47+
=3ATn
48+
-----END PGP SIGNATURE-----
49+
EOS
50+
assert_equal(<<~EOS, data['message'])
51+
Signed commit
52+
53+
This will allow me to test commit data extraction for signed commits.
54+
I'm making the message multiline to make sure that message extraction is
55+
preserved.
56+
EOS
57+
end
58+
end

0 commit comments

Comments
 (0)