Skip to content

DRY commit parsing #4445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 3, 2019
Merged

Conversation

tiennou
Copy link
Contributor

@tiennou tiennou commented Dec 13, 2017

Splitted from #4331. See this PR for context.

This adds a "quick" mode to our commit parsing machinery, and makes the revwalk code instead of going with its own. The outcome is a nice, single place where grafts can be handled.

This was referenced Dec 13, 2017
Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the code deduplication here, thanks for that! Despite my comments, there seems to be a reproducible segfault on Win64 platforms. At least both W64 builds have segfaulted at the exact same point.

#include "odb.h"
#include "commit.h"
#include "signature.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really see why repository.h and signature.h are required now. At least that's not obvious from the diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from a previous attempt, where I tried to make that parser use already existing functions, before I noticed the "main" parse code. Will remove if that's unwarranted (I don't think it is).

@@ -19,6 +19,7 @@
#include "message.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the commits are out of order. You first use the new generic mechanism and introduce it only in the next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I caused some historic wreckage when doing some rebases, so when date-sorting, the commits are reverted. I already tried to fix it but failed, so I think I'll end up extracting patches and reapplying manually.

src/commit.c Outdated
buffer += strlen("tree ") + GIT_OID_HEXSZ + 1;
}

if (flags & (GIT_COMMIT_PARSE_ALL|GIT_COMMIT_PARSE_QUICK)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this condition just be removed? You're doing it for all flags anyway, so no real need to check for those flags.

Copy link
Contributor Author

@tiennou tiennou Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that mostly for completeness — that makes it easier to spot the differences between ALL and QUICK, and also (somewhat) future-proof. I can remove though, if that's preferred.

src/commit.c Outdated
@@ -428,11 +441,16 @@ int git_commit__parse(void *_commit, git_odb_object *odb_obj)
}

/* Always parse the committer; we need the commit time */
commit->committer = git__malloc(sizeof(git_signature));
GITERR_CHECK_ALLOC(commit->committer);
if (flags & (GIT_COMMIT_PARSE_ALL|GIT_COMMIT_PARSE_QUICK)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here -- no need for the conditional if you do so for all available flags anyway.

src/commit.h Outdated
@@ -36,4 +36,11 @@ struct git_commit {
void git_commit__free(void *commit);
int git_commit__parse(void *commit, git_odb_object *obj);

typedef enum {
GIT_COMMIT_PARSE_QUICK = (1 << 0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add some documentation here stating what those two flags mean and what information exactly is not going to get parsed for QUICK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will do.

@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from a77f9b3 to 80d0eca Compare December 15, 2017 17:13
@tiennou
Copy link
Contributor Author

tiennou commented Dec 15, 2017

git rebase master --ignore-date did the trick w.r.t ordering. I'll investigate the W64 crash, but I'm a slight disadvantage here…

@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from 80d0eca to 22aafe3 Compare December 15, 2017 17:27
@pks-t
Copy link
Member

pks-t commented Jan 12, 2018

I've debugged and fixed the issue. You were underallocating the commit object due to using git_odb_object_size instead of git_object__size.

@tiennou
Copy link
Contributor Author

tiennou commented Jan 17, 2018

@pks-t Many thanks for the fix ! :shipit: 🌟

@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from ad936af to 2a88f8c Compare February 2, 2018 23:50
@tiennou
Copy link
Contributor Author

tiennou commented Feb 12, 2018

There you go !

@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from 5ea2118 to a6aac31 Compare April 22, 2018 13:30
@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from a6aac31 to 2c70a29 Compare July 1, 2018 10:28
@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from 2c70a29 to 3f6ce50 Compare August 2, 2018 17:14
@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch 4 times, most recently from ce657ea to cd6f417 Compare October 25, 2018 21:44
@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from cd6f417 to 2bb3111 Compare February 3, 2019 09:22
@tiennou tiennou force-pushed the shallow/dry-commit-parsing branch from 2bb3111 to e675aef Compare May 21, 2019 23:05
Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was lingering a lot longer than intended. Hope the additional remark isn't too frustrating after such a long time.

src/commit.h Outdated
@@ -37,4 +37,11 @@ void git_commit__free(void *commit);
int git_commit__parse(void *commit, git_odb_object *obj);
int git_commit__parse_raw(void *commit, const char *data, size_t size);

typedef enum {
GIT_COMMIT_PARSE_QUICK = (1 << 0), /**< Only parse parents and committer info */
GIT_COMMIT_PARSE_ALL = (1 << 1), /**< Parse all commit data */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... do these two flags really make sense? After all, you're not using PARSE_QUICK anywhere in git_commit__parse_internal and in fact PARSE_QUICK and PARSE_ALL do not make any sense together, right? I guess having only one of them would be more sensible, and I lean towards retaining PARSE_QUICK. In that case, default behaviour would be to parse everything and by passing the optional PARSE_QUICK flag we would skip certain information

src/commit.c Outdated
@@ -383,24 +384,32 @@ int git_commit_amend(
return error;
}

int git_commit__parse_raw(void *_commit, const char *data, size_t size)
static int git_commit__parse_internal(git_commit *commit, const char *data, size_t size, git_commit__parse_flags flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I now realize that this really accepts an enum, only. I first thought that this was a bitfield due to its naming (flags instead of flag). So another route to make this a tad easier to grasp might be to just rename it to git_commit__parse_mode mode, as it's not really about flags at all

This allows us to pick which data from a commit we're interested in.
This will be used by the revwalk code, which is only interested in
parents' and committer data.
@pks-t pks-t force-pushed the shallow/dry-commit-parsing branch 2 times, most recently from e153232 to 93288fb Compare October 3, 2019 06:45
Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the flags parameter to be an unsigned int to fix problems with or'ing multiple enums with C++ compilers, removed the GIT_COMMIT_PARSE_ALL enum which was redundant and fixed two memory allocation issues. This is good to go now, will merge as soon as CI is green

@pks-t pks-t force-pushed the shallow/dry-commit-parsing branch from 93288fb to 5988cf3 Compare October 3, 2019 06:59
@pks-t
Copy link
Member

pks-t commented Oct 3, 2019

Meh, there's some fallout from recent changes (e.g. changing to Werror on Win32, deprecation of git_object__size). I'll follow up shortly to fix these issues.

@pks-t
Copy link
Member

pks-t commented Oct 3, 2019

@ethomson: want to give this another round of reviews or may I just proceed and merge?


buffer += parent_len;
if ((error = git_commit__parse_ext(commit, obj, GIT_COMMIT_PARSE_QUICK)) < 0 ||
!git__is_uint16(git_array_size(commit->parent_ids))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set an error message if git__is_uint16 is false.

I’d recommend restructuring this a bit to preserve the parse error in case we want to start using more exotic error codes, but that’s rather optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks for catching! I now set an error message and also restructured to retain the error code.

@ethomson
Copy link
Member

ethomson commented Oct 3, 2019

Thanks for the ping. One minor request but otherwise 👍

commit->parents[i] = git_revwalk__commit_lookup(walk, &oid);
if (commit->parents[i] == NULL)
return -1;
commit = git__calloc(1, sizeof(*commit));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few "I don't remember" commits related to that, as fixups in the actual shallow implementation, here and here. AFAIR, I was getting asserts/crashes in cache.c, until I made the initialization like this. But it's fuzzy now, and I think I wasn't able to extract a testcase for it anyways. Now that I'm thinking about it, maybe that's because the ODB memory semantics being weird 😆 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are! That's why we've deprecated the git_object__size thingy, because nobody was able to remember what it actually does. And in most cases, it did not do what was intended.

The commit list's in- and out-degrees are currently stored as `unsigned
short`. When assigning it the value of `git_array_size`, which returns
an `size_t`, this generates a warning on some Win32 platforms due to
loosing precision.

We could just cast the returned value of `git_array_size`, which would
work fine for 99.99% of all cases as commits typically have less than
2^16 parents. For crafted commits though we might end up with a wrong
value, and thus we should definitely check whether the array size
actually fits into the field.

To ease the check, let's convert the fields to store the degrees as
`uint16_t`. We shouldn't rely on such unspecific types anyway, as it may
lead to different behaviour across platforms. Furthermore, this commit
introduces a new `git__is_uint16` function to check whether it actually
fits -- if not, we return an error.
@pks-t pks-t force-pushed the shallow/dry-commit-parsing branch from a46ad20 to 5cf17e0 Compare October 3, 2019 10:24
@pks-t
Copy link
Member

pks-t commented Oct 3, 2019

@ethomson: amended to fix your remarks, thanks!

@ethomson
Copy link
Member

ethomson commented Oct 3, 2019

:squirrel:

@pks-t pks-t merged commit f04a58b into libgit2:master Oct 3, 2019
@pks-t
Copy link
Member

pks-t commented Oct 3, 2019

Thanks a lot, both to you @tiennou for working on this and @ethomson for the review!

@tiennou
Copy link
Contributor Author

tiennou commented Oct 3, 2019

Thanks for last-mile touch ups @pks-t ! Glad to see duplicated code go away 😉.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants