-
Notifications
You must be signed in to change notification settings - Fork 2.5k
DRY commit parsing #4445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRY commit parsing #4445
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the code deduplication here, thanks for that! Despite my comments, there seems to be a reproducible segfault on Win64 platforms. At least both W64 builds have segfaulted at the exact same point.
src/commit_list.c
Outdated
#include "odb.h" | ||
#include "commit.h" | ||
#include "signature.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't really see why repository.h and signature.h are required now. At least that's not obvious from the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from a previous attempt, where I tried to make that parser use already existing functions, before I noticed the "main" parse code. Will remove if that's unwarranted (I don't think it is).
@@ -19,6 +19,7 @@ | |||
#include "message.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the commits are out of order. You first use the new generic mechanism and introduce it only in the next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I caused some historic wreckage when doing some rebases, so when date-sorting, the commits are reverted. I already tried to fix it but failed, so I think I'll end up extracting patches and reapplying manually.
src/commit.c
Outdated
buffer += strlen("tree ") + GIT_OID_HEXSZ + 1; | ||
} | ||
|
||
if (flags & (GIT_COMMIT_PARSE_ALL|GIT_COMMIT_PARSE_QUICK)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this condition just be removed? You're doing it for all flags anyway, so no real need to check for those flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that mostly for completeness — that makes it easier to spot the differences between ALL
and QUICK
, and also (somewhat) future-proof. I can remove though, if that's preferred.
src/commit.c
Outdated
@@ -428,11 +441,16 @@ int git_commit__parse(void *_commit, git_odb_object *odb_obj) | |||
} | |||
|
|||
/* Always parse the committer; we need the commit time */ | |||
commit->committer = git__malloc(sizeof(git_signature)); | |||
GITERR_CHECK_ALLOC(commit->committer); | |||
if (flags & (GIT_COMMIT_PARSE_ALL|GIT_COMMIT_PARSE_QUICK)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here -- no need for the conditional if you do so for all available flags anyway.
src/commit.h
Outdated
@@ -36,4 +36,11 @@ struct git_commit { | |||
void git_commit__free(void *commit); | |||
int git_commit__parse(void *commit, git_odb_object *obj); | |||
|
|||
typedef enum { | |||
GIT_COMMIT_PARSE_QUICK = (1 << 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add some documentation here stating what those two flags mean and what information exactly is not going to get parsed for QUICK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, will do.
a77f9b3
to
80d0eca
Compare
|
80d0eca
to
22aafe3
Compare
I've debugged and fixed the issue. You were underallocating the commit object due to using |
@pks-t Many thanks for the fix ! |
ad936af
to
2a88f8c
Compare
There you go ! |
5ea2118
to
a6aac31
Compare
a6aac31
to
2c70a29
Compare
2c70a29
to
3f6ce50
Compare
ce657ea
to
cd6f417
Compare
cd6f417
to
2bb3111
Compare
2bb3111
to
e675aef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was lingering a lot longer than intended. Hope the additional remark isn't too frustrating after such a long time.
src/commit.h
Outdated
@@ -37,4 +37,11 @@ void git_commit__free(void *commit); | |||
int git_commit__parse(void *commit, git_odb_object *obj); | |||
int git_commit__parse_raw(void *commit, const char *data, size_t size); | |||
|
|||
typedef enum { | |||
GIT_COMMIT_PARSE_QUICK = (1 << 0), /**< Only parse parents and committer info */ | |||
GIT_COMMIT_PARSE_ALL = (1 << 1), /**< Parse all commit data */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... do these two flags really make sense? After all, you're not using PARSE_QUICK
anywhere in git_commit__parse_internal
and in fact PARSE_QUICK
and PARSE_ALL
do not make any sense together, right? I guess having only one of them would be more sensible, and I lean towards retaining PARSE_QUICK
. In that case, default behaviour would be to parse everything and by passing the optional PARSE_QUICK
flag we would skip certain information
src/commit.c
Outdated
@@ -383,24 +384,32 @@ int git_commit_amend( | |||
return error; | |||
} | |||
|
|||
int git_commit__parse_raw(void *_commit, const char *data, size_t size) | |||
static int git_commit__parse_internal(git_commit *commit, const char *data, size_t size, git_commit__parse_flags flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I now realize that this really accepts an enum, only. I first thought that this was a bitfield due to its naming (flags instead of flag). So another route to make this a tad easier to grasp might be to just rename it to git_commit__parse_mode mode
, as it's not really about flags at all
This allows us to pick which data from a commit we're interested in. This will be used by the revwalk code, which is only interested in parents' and committer data.
e153232
to
93288fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the flags
parameter to be an unsigned int
to fix problems with or'ing multiple enums with C++ compilers, removed the GIT_COMMIT_PARSE_ALL
enum which was redundant and fixed two memory allocation issues. This is good to go now, will merge as soon as CI is green
93288fb
to
5988cf3
Compare
Meh, there's some fallout from recent changes (e.g. changing to Werror on Win32, deprecation of |
@ethomson: want to give this another round of reviews or may I just proceed and merge? |
src/commit_list.c
Outdated
|
||
buffer += parent_len; | ||
if ((error = git_commit__parse_ext(commit, obj, GIT_COMMIT_PARSE_QUICK)) < 0 || | ||
!git__is_uint16(git_array_size(commit->parent_ids))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set an error message if git__is_uint16
is false.
I’d recommend restructuring this a bit to preserve the parse error in case we want to start using more exotic error codes, but that’s rather optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks for catching! I now set an error message and also restructured to retain the error code.
Thanks for the ping. One minor request but otherwise 👍 |
commit->parents[i] = git_revwalk__commit_lookup(walk, &oid); | ||
if (commit->parents[i] == NULL) | ||
return -1; | ||
commit = git__calloc(1, sizeof(*commit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few "I don't remember" commits related to that, as fixups in the actual shallow implementation, here and here. AFAIR, I was getting asserts/crashes in cache.c, until I made the initialization like this. But it's fuzzy now, and I think I wasn't able to extract a testcase for it anyways. Now that I'm thinking about it, maybe that's because the ODB memory semantics being weird 😆 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are! That's why we've deprecated the git_object__size
thingy, because nobody was able to remember what it actually does. And in most cases, it did not do what was intended.
The commit list's in- and out-degrees are currently stored as `unsigned short`. When assigning it the value of `git_array_size`, which returns an `size_t`, this generates a warning on some Win32 platforms due to loosing precision. We could just cast the returned value of `git_array_size`, which would work fine for 99.99% of all cases as commits typically have less than 2^16 parents. For crafted commits though we might end up with a wrong value, and thus we should definitely check whether the array size actually fits into the field. To ease the check, let's convert the fields to store the degrees as `uint16_t`. We shouldn't rely on such unspecific types anyway, as it may lead to different behaviour across platforms. Furthermore, this commit introduces a new `git__is_uint16` function to check whether it actually fits -- if not, we return an error.
a46ad20
to
5cf17e0
Compare
@ethomson: amended to fix your remarks, thanks! |
:squirrel: |
Thanks for last-mile touch ups @pks-t ! Glad to see duplicated code go away 😉. |
Splitted from #4331. See this PR for context.
This adds a "quick" mode to our commit parsing machinery, and makes the revwalk code instead of going with its own. The outcome is a nice, single place where grafts can be handled.