Skip to content

smart: use push_glob instead of manual filtering #5195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 13, 2019

Conversation

tiennou
Copy link
Contributor

@tiennou tiennou commented Jul 31, 2019

The code worked under the assumption that anything under refs/tags are tag objects, and all the rest would be peelable to a commit. As it is completely valid to have tags to blobs under a non refs/tags ref, this would cause failures when trying to peel a tag to a commit (#3595).

Fix the broken filtering by switching to git_revwalk_push_glob, which already handles this case.

Fixes #3595, supersedes #4383, #5193.

@@ -358,7 +314,10 @@ int git_smart__negotiate_fetch(git_transport *transport, git_repository *repo, c
if ((error = git_pkt_buffer_wants(wants, count, &t->caps, &data)) < 0)
return error;

if ((error = fetch_setup_walk(&walk, repo)) < 0)
if ((error = git_revwalk_new(&walk, repo)) < 0)
return error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goto on_error to clean up data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you felt like it, making data the first param to git_pkt_buffer_wants might help in understanding that it's an out param...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

This is in the vicinity of #5105, hence I didn't want more extensive changes here. If that's fine with you I'll tackle that out there.

/* 12 heads (including one packed head) + 1 note + 2 remotes + 7 tags */
assert_retrieval("*", 22);
/* 12 heads (including one packed head) + 1 note + 2 remotes + 7 tags + 1 blob */
assert_retrieval("*", 23);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please excuse my ignorance, but I really don't get why there's any changes to either refs::foreachglob or refs::iterator. As far as I can see, both these test suites act on a repository that has been opened via git_repository_open. No cloning took place and thus no negotiation via smart protocol. But the code changes you do is only in the smart protocol, so I'm wondering why values in either of those test suites should have changed at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - I agree. I think it was a cherry-pick from that other PR, and that introduced a new ref? I think that this doesn't fit here anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I got a green build locally, and was surprised to have it come back red on CI (looking into it). AFAICS this test only adds one of those weird tags-to-non-commits, thus ensuring we correctly ignore it in whatever place we do not (😉) ; hence I'd vote to keep it. I can try adding a "clone this repo" in another testcase to exercise the actual fix to the smart transport.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is no file added to any test repo, so the reference counts couldn't have changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Since git looks for the presence of file named HEAD to determine whether a directory is a repository or not, our test resources always appear as actual repositories. So you've probably still got a dirty test directory (with those refs) but you don't know it until you actually look in there.

This has bothered me for some time.

Maybe we should treat HEADs like .git files/folders and rename them during test resource copying, to prevent this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was it, sorry about that. GitX sometimes eats add entries when building amend indexes, and it slipped out as an untracked file. This should be good now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to be sorry about! :)

@tiennou
Copy link
Contributor Author

tiennou commented Aug 1, 2019

/rebuild

@libgit2-azure-pipelines
Copy link

Okay, @tiennou, I started to rebuild this pull request as build #2483.

@tiennou tiennou force-pushed the fix/commitish-smart-push branch from 13dabf3 to e0f3c95 Compare August 1, 2019 11:13
if ((error = git_revwalk_new(&walk, repo)) < 0)
return error;

git_revwalk_sorting(walk, GIT_SORT_TIME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, one more question: previously, we've been sorting by time, now we aren't. While obvious that this will greatly improve performance as we now don't have to go through all commits, it's not immediately obvious to me why this is fine and didn't find anything in the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I took @carlosmn's comment about blanket-replacing at face-value 😉. Checking around, pack-protocol seems to imply there's no expected order so this seems fine — though a second opinion would be welcome.

Copy link
Member

@pks-t pks-t Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diving into the code a bit more. Previously, we've sorted all commits by date, which would cause us to compare the newest 256 commits present in our own repos with the HAVEs of the remote. Now with push_glob, we're pushing the commits in an unspecified order that is completely independent of the date. If one has a repo with more than 256 refs, then it may happen that we now compare 256 old commits with the HAVEs. Heuristically, it's obvious that the first variant which adds commits by date may be a lot more efficient in the common case.

This analysis neglects that previously we had to load all commits of the repo to properly sort them by time, which was quite inefficient, too. I think we can do better than that, though: if push_glob were to use git_commit_list_insert_by_date, then we'd avoid inserting commits in a suboptimal order but we still wouldn't suck in all the repo's commit objects.

So I'd propose to modify push_glob to accept another parameter that determines how to sort pushed commits. push_glob will pass than on to push_commit, which will either use git_commit_list_insert or git_commit_list_insert_by_date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the in-depth analysis @pks-t. I've amended the PR by API-fying a bit more (DRYing all the functions), so that the revwalk could be tweaked accordingly. Not sure if it's worth making this API public instead of the current one — no idea how useful it would be to "outsiders" — so it's a little rough around the edges (a single flags might have been sufficient).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(though re-reading I'm now unsure if the insertion was to happen from an "initially sorted revwalk", as previously, or if it's okay with just the insertion being done by commit date. Right now the latter is done).

@biodranik
Copy link

Tested this change, it works fine. LGTM.

The code worked under the assumption that anything under `refs/tags` are
tag objects, and all the rest would be peelable to a commit. As it is
completely valid to have tags to blobs under a non `refs/tags` ref, this
would cause failures when trying to peel a tag to a commit.

Fix the broken filtering by switching to `git_revwalk_push_glob`, which
already handles this case.
@tiennou tiennou force-pushed the fix/commitish-smart-push branch from e0f3c95 to 74512a1 Compare August 21, 2019 17:49
@tiennou tiennou force-pushed the fix/commitish-smart-push branch from 74512a1 to a71000d Compare August 23, 2019 13:01
Before we can tweak the revwalk to be more efficent when negotiating,
we need to add an "insertion mode" option. Since there's already an implicit
set of those, make it visible, at least privately.
@tiennou tiennou force-pushed the fix/commitish-smart-push branch from a71000d to 6542eac Compare August 23, 2019 13:17
@tiennou tiennou force-pushed the fix/commitish-smart-push branch from 6542eac to 53f51c6 Compare August 23, 2019 13:22
Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @tiennou! One more thing about default options, otherwise I'm happy!

@@ -38,7 +38,7 @@ git_commit_list_node *git_revwalk__commit_lookup(
return commit;
}

static int push_commit(git_revwalk *walk, const git_oid *oid, int uninteresting, int from_glob)
int git_revwalk__push_commit(git_revwalk *walk, const git_oid *oid, const git_revwalk__push_options *opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably fall back to default options if opts == NULL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the argument to be mandatory, mostly because the "mode" kinda is (OID vs glob, else it's ambiguous what should happen w.r.t to peeling). But I forgot to assert that though 😉.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, it's internal anyway so all callers can be checked trivially. Let's just go ahead

@pks-t pks-t merged commit 5d8a465 into libgit2:master Sep 13, 2019
@pks-t
Copy link
Member

pks-t commented Sep 13, 2019

Thanks a lot, @tiennou!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libgit2 assumes everything not under refs/tags are commits
4 participants