-
Notifications
You must be signed in to change notification settings - Fork 2.5k
smart: use push_glob instead of manual filtering #5195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/transports/smart_protocol.c
Outdated
@@ -358,7 +314,10 @@ int git_smart__negotiate_fetch(git_transport *transport, git_repository *repo, c | |||
if ((error = git_pkt_buffer_wants(wants, count, &t->caps, &data)) < 0) | |||
return error; | |||
|
|||
if ((error = fetch_setup_walk(&walk, repo)) < 0) | |||
if ((error = git_revwalk_new(&walk, repo)) < 0) | |||
return error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goto on_error
to clean up data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you felt like it, making data
the first param to git_pkt_buffer_wants
might help in understanding that it's an out param...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
This is in the vicinity of #5105, hence I didn't want more extensive changes here. If that's fine with you I'll tackle that out there.
045b7b6
to
13dabf3
Compare
/* 12 heads (including one packed head) + 1 note + 2 remotes + 7 tags */ | ||
assert_retrieval("*", 22); | ||
/* 12 heads (including one packed head) + 1 note + 2 remotes + 7 tags + 1 blob */ | ||
assert_retrieval("*", 23); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please excuse my ignorance, but I really don't get why there's any changes to either refs::foreachglob or refs::iterator. As far as I can see, both these test suites act on a repository that has been opened via git_repository_open
. No cloning took place and thus no negotiation via smart protocol. But the code changes you do is only in the smart protocol, so I'm wondering why values in either of those test suites should have changed at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - I agree. I think it was a cherry-pick from that other PR, and that introduced a new ref? I think that this doesn't fit here anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I got a green build locally, and was surprised to have it come back red on CI (looking into it). AFAICS this test only adds one of those weird tags-to-non-commits, thus ensuring we correctly ignore it in whatever place we do not (😉) ; hence I'd vote to keep it. I can try adding a "clone this repo" in another testcase to exercise the actual fix to the smart transport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there is no file added to any test repo, so the reference counts couldn't have changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Since git
looks for the presence of file named HEAD
to determine whether a directory is a repository or not, our test resources always appear as actual repositories. So you've probably still got a dirty test directory (with those refs) but you don't know it until you actually look in there.
This has bothered me for some time.
Maybe we should treat HEAD
s like .git
files/folders and rename them during test resource copying, to prevent this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was it, sorry about that. GitX sometimes eats add
entries when building amend indexes, and it slipped out as an untracked file. This should be good now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to be sorry about! :)
/rebuild |
Okay, @tiennou, I started to rebuild this pull request as build #2483. |
13dabf3
to
e0f3c95
Compare
if ((error = git_revwalk_new(&walk, repo)) < 0) | ||
return error; | ||
|
||
git_revwalk_sorting(walk, GIT_SORT_TIME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, one more question: previously, we've been sorting by time, now we aren't. While obvious that this will greatly improve performance as we now don't have to go through all commits, it's not immediately obvious to me why this is fine and didn't find anything in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I took @carlosmn's comment about blanket-replacing at face-value 😉. Checking around, pack-protocol seems to imply there's no expected order so this seems fine — though a second opinion would be welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diving into the code a bit more. Previously, we've sorted all commits by date, which would cause us to compare the newest 256 commits present in our own repos with the HAVEs of the remote. Now with push_glob
, we're pushing the commits in an unspecified order that is completely independent of the date. If one has a repo with more than 256 refs, then it may happen that we now compare 256 old commits with the HAVEs. Heuristically, it's obvious that the first variant which adds commits by date may be a lot more efficient in the common case.
This analysis neglects that previously we had to load all commits of the repo to properly sort them by time, which was quite inefficient, too. I think we can do better than that, though: if push_glob
were to use git_commit_list_insert_by_date
, then we'd avoid inserting commits in a suboptimal order but we still wouldn't suck in all the repo's commit objects.
So I'd propose to modify push_glob
to accept another parameter that determines how to sort pushed commits. push_glob
will pass than on to push_commit
, which will either use git_commit_list_insert
or git_commit_list_insert_by_date
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the in-depth analysis @pks-t. I've amended the PR by API-fying a bit more (DRYing all the functions), so that the revwalk could be tweaked accordingly. Not sure if it's worth making this API public instead of the current one — no idea how useful it would be to "outsiders" — so it's a little rough around the edges (a single flags
might have been sufficient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(though re-reading I'm now unsure if the insertion was to happen from an "initially sorted revwalk", as previously, or if it's okay with just the insertion being done by commit date. Right now the latter is done).
Tested this change, it works fine. LGTM. |
The code worked under the assumption that anything under `refs/tags` are tag objects, and all the rest would be peelable to a commit. As it is completely valid to have tags to blobs under a non `refs/tags` ref, this would cause failures when trying to peel a tag to a commit. Fix the broken filtering by switching to `git_revwalk_push_glob`, which already handles this case.
e0f3c95
to
74512a1
Compare
74512a1
to
a71000d
Compare
Before we can tweak the revwalk to be more efficent when negotiating, we need to add an "insertion mode" option. Since there's already an implicit set of those, make it visible, at least privately.
a71000d
to
6542eac
Compare
6542eac
to
53f51c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, @tiennou! One more thing about default options, otherwise I'm happy!
@@ -38,7 +38,7 @@ git_commit_list_node *git_revwalk__commit_lookup( | |||
return commit; | |||
} | |||
|
|||
static int push_commit(git_revwalk *walk, const git_oid *oid, int uninteresting, int from_glob) | |||
int git_revwalk__push_commit(git_revwalk *walk, const git_oid *oid, const git_revwalk__push_options *opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably fall back to default options if opts == NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted the argument to be mandatory, mostly because the "mode" kinda is (OID vs glob, else it's ambiguous what should happen w.r.t to peeling). But I forgot to assert that though 😉.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, it's internal anyway so all callers can be checked trivially. Let's just go ahead
Thanks a lot, @tiennou! |
The code worked under the assumption that anything under
refs/tags
are tag objects, and all the rest would be peelable to a commit. As it is completely valid to have tags to blobs under a nonrefs/tags
ref, this would cause failures when trying to peel a tag to a commit (#3595).Fix the broken filtering by switching to
git_revwalk_push_glob
, which already handles this case.Fixes #3595, supersedes #4383, #5193.