-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Shallow support v2 #5254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shallow support v2 #5254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few little bits and bobs, and comments/insights about it, especially in the grand scheme of protocol support changes 😉, but I'm glad to see this finally shape up !
|
||
assert(grafts); | ||
|
||
git_oidmap_foreach_value(grafts->commits, graft, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cooool 😉.
|
||
error = git_futils_readbuffer_updated(&contents, grafts->path, | ||
&grafts->path_checksum, &updated); | ||
if (error < 0 || error == GIT_ENOTFOUND || !updated) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the kind of thing where I'm wondering "how would it work after I add the write-side of it" (ie. the actual part where the grafts are updated from the server's view). Swallowing ENOTFOUND
would make an externally-triggered delete of the graft file (eg. an out-of-process unshallow takes place which just unlink()
the file, stuff like that 😉). I haven't checked the semantics of readbuffer_updated
in this case…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's removed then it would return GIT_ENOTFOUND
. But in fact, we should clear the grafts' contents in that case, which I missed out for now, so that definitely needs to be added
tests/repo/grafts.c
Outdated
cl_git_sandbox_cleanup(); | ||
} | ||
|
||
void test_repo_grafts__graft_register(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/register/add
tests/repo/shallow.c
Outdated
git_oid_equal(&tmp_oid, &oids.ids[1])) || | ||
(git_oid_equal(&g_shallow_oid, &oids.ids[1]) && | ||
git_oid_equal(&tmp_oid, &oids.ids[0]))); | ||
git_oidarray_free(&oids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… so ideally, this testcase should just verify that unlinks are also picked up by dropping all shallow roots. I forgot 😉.
const git_oid *oid; | ||
size_t i = 0; | ||
int error; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(out && grafts); | |
{ | ||
git_buf contents = GIT_BUF_INIT; | ||
int error, updated = 0; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(grafts); | |
|
||
if ((error = git_grafts_new(&grafts)) < 0) | ||
goto error; | ||
grafts->path = git__strdup(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grafts->path = git__strdup(path); | |
grafts->path = git__strdup(path); | |
GIT_ERROR_CHECK_ALLOC(grafts->path); |
a6ceab9
to
f65aa53
Compare
Thanks a lot for your review! Ain't got time to do another round of updates just now, but will do so later |
f65aa53
to
c32cdcc
Compare
Addressed all comments by @tiennou and fixed a memory leak introduced with the refresh code |
So, just for kicks, I've rebased the rest of the shallow code on top of that, and pushed it here, and it broke the only testcase I had 😉. It seems the syntax between grafts and shallow files is different, so the parsing can't be as shared as you have it… For the rest, this is still a very WIP rebase — since your grafting API is so much better than mine, I can use it as part of the external API more easily, and now there are quite a few git_oidarray/git_array_oid_t/git_grafts mismatches/cruft. Hopefully having some was-known-good code will help 😉. You'll see parts of should-now-be-updated-#5105 as well. |
Bummer. I'll have a look at it. Is it because of the parsing simplification or of yet unknown cause?
Yeah. Honestly, I don't really like this |
With regards to the format. It should in fact be the same format as "info/grafts", except for that every such line may not have any parents. Stated differently, each line shall hold a single commit ID and nothing else. And that's a perfectly valid grafts file, right? Could very well be though that I screwed up the parsing part while refactoring to use our own parser. Do you have a simple test grafts file that the new code fails to parse correctly? |
Ah. One thing I noticed in your additions: you have a Edit: yup, that's very likely to be the issue here. You loop around |
AFAIR, it does not matter — the server will skip whatever we tell it as part of its revwalk. It's only so the server is aware of where to start the shallow pack. |
Yes, the code expects a separator after the OID, and thus for the shallow syntax the parser errors.
I'm fine with how your API looks, and looks sufficiently generic as to be used for the negotiation step. It's just about which part is made public, since any transport implementation will need to manipulate them (ie. it floats around "git2/sys/transport.h", again). |
c32cdcc
to
e79bd3c
Compare
I honestly don't get it, sorry. For the shallow clones I performed, git(1) created a ".git/shallow" file that simply contained a single OID followed by newline. Nothing inbetween, no separator. I also tried to run your feature branch, but it crashed in seemingly unrelated locations (for instance dereferencing Anyway -- I've written another test suite grafts::parse that ties down the grafts parsing logic, and as far as I can see it currently behaves exactly like I'd expect it to. Please feel free to create another test case for it that shows what I'm doing wrong. |
bdfd791
to
9e03ca3
Compare
That's strange, I was getting one failure, on the mkfile-based testcase, because of an expected space after the graft OID. I have no idea if an empty graft line works (a.k.a is it a syntax error, or synonymous with a "unparented" commit). IIRC I'll definitely investigate though. AFAIK the only major think missing is "commit the shallow file" once we're sure all objects have been unpacked, a.k.a locking and transactions 😉. |
It doesn't, it just complains loudly and visibly on every git command that gets executed. I've tested for several different cases how we behave and how git behaves. Stuff like creating cyclic commit graphs by grafting A to B and B to A and such stuff. I was surprised that we actually handled this just fine and behaved exactly like git does. The only exception I found was grafting a commit to a tree, where we just bailed out but git just displayed the commit, but without any parents. But I guess it's perfectly fine for us to say "Nope, that's wrong, don't do it."
Cool. Just let me know when you think this PR may progress so that I can tackle the next set of features towards shallow support. |
9e03ca3
to
7b6f4f7
Compare
I've rebased this branch on the latest master. I've also decided to bail out of the public API and thus removed the function |
I share the concerns, and I agree we don't want to push more design work here until the protocol part is more fleshed out (as you say, refresh & lifecycle issues). It's just that we do allow custom transports, and those transports that want to handle shallow fetches will want access as well, and thus it's doomed to become public anyways… Also, I feel like there should be a way to know you're not really done revwalking — ideally, a GUI client might want to inform the user that there are more commits available, to propose a shallow fetch of more of them. Right now you can't say, apart from a blanket "this repo is shallow", and you can't know if the last OID you've revwalked really was the last… |
On Tue, Nov 05, 2019 at 09:19:35AM -0800, Etienne Samson wrote:
I share the concerns, and I agree we don't want to push more
design work here until the protocol part is more fleshed out
(as you say, refresh & lifecycle issues). It's just that we do
allow custom transports, and those transports that want to
handle shallow fetches will want access as well, and thus it's
doomed to become public anyways…
Also, I feel like there should be a way to know you're *not
really done* revwalking — ideally, a GUI client might want to
inform the user that there are more commits available, to
propose a shallow fetch of more of them. Right now you can't
say, apart from a blanket "this repo is shallow", and you can't
know if the last OID you've revwalked really was the last…
Yeah, definitely. I'd just like to postpone it for a bit, as I
hope that we'll get some more experience with this new stuff e.g.
when the shallow protocol support lands. This PR won't make it in
for v1.0 anyway, so I'd say we should get it into the next branch
as soon as we've started stabilizing for v1.0 to let it cook,
then implement the shallow protocol on top of that and finally
come up with an external interface before v1.1 is released.
|
Is this still planned for 1.1? :) |
I'd very much like to aim for it. Let me re-roll and fix conflicts to get it going again. |
7b6f4f7
to
1e1ea85
Compare
/rebuild |
Sorry @pks-t, an error occurred while trying to requeue the build. |
In order to increase maintainability in the future, we should try to make structures as self-contained and opaque to its users as possible. Thus it is probably not a good idea to just typedef `git_graftmap` to `git_oidmap`, as that will make it a lot harder in the future to extend the API in the future, if this need ever arises. Refactor the code to instead declare a real structure `git_grafts`, which is completely opaque to its callers.
Instead of using the newly introduced `git_buf_foreach_line`, which modifies the buffer itself, we should try to use our existing parsing infrastructure in "parse.h". Convert the grafts parsing code to make use of `git_parse_ctx`. Remove the `git_buf_foreach_line` macro, as grafts have been its sole user.
Parsing of grafts files is currently contained in the repository code. To make grafts-related logic more self-contained, move it into "grafts.c" instead.
The shallow roots are in fact another user of the grafting mechanism, and in essence they do use the same file format for grafted commits. Thus, instead of hand-coding the parsing logic a second time, we can just reuse the `git_grafts` structure for shallow commits, as well.
When loading shallow grafts, we add each of the grafting commits to the grafts backed by ".git/info/grafts". Keeping track of both grafts separately, but partially storing them in a common grafts structure is likely to lead to inconsistencies. In fact, there already are inconsistencies if refreshing shallow grafts at a later point, as we only refresh the shallows, but not the normal grafts in that case. Disentangle both grafting stores and instead check both separately when parsing commits.
The refresh logic for both "normal" and shallow grafts are currently part of the repository code and implemented twice. Unify them into the grafts code by introducing two new functions to create grafts from a file and to refresh a grafts structure.
If replacing an already existing graft in the grafts map, then we need to free the previous `git_commit_graft` structure.
Currently, we expose the function `git_repository_shallow_roots` to get all grafted roots of the repository. This already paints us into a corner, though, as we certainly need to experiment with some functionality of the grafting mechanism before we can happily expose some of its functionality. Most importantly, we need to get right when to refresh grafts and when not. Thus, this commit removes the public function with no public replacement. We should first try and see what usecases people come up with to e.g. expose the `git_grafts` mechanism directly in the future or do something different altogether. Instead, we provide an internal interface to get weak pointers to the grafting structs part of the repository itself.
bfab05a
to
79af067
Compare
Rebased again to spur interest and check whether it still passes CI. |
It does! 😁 Sorry, I'm slammed right now, I'll try to take an 👀 early next week. |
@ethomson No worries, take your time! |
@pks-t @ethomson thank you so much for all the awesome work! 😊😊👍 Is there anything that can be assisted with in regards to this PR? Libgit2/git2go is used in the Flux project (https://github.com/fluxcd/flux2) and used to clone repositories regularly into Kubernetes clusters to provide a GitOps workflow. Libgit2 and go-git are the two libraries supported but Libgit2 is the one that supports git wire protocol v2 which is needed for some Git providers and will most likely be more and more important in the future. The challenge right now is that go-git supports shallow cloning which takes a lot less space, time and bandwidth and Libgit2 doesn't which means those using git wire protocol v2 will have a a slower experience and taking up more space as well as bandwidth. With this PR, when it's available with git2go, Libgit2 can be used by everyone. I hope you have a great weekend! 🚀🎉 |
@@ -1237,6 +1263,20 @@ int git_repository_set_index(git_repository *repo, git_index *index) | |||
return 0; | |||
} | |||
|
|||
int git_repository_grafts__weakptr(git_grafts **out, git_repository *repo) | |||
{ | |||
assert(out && repo && repo->shallow_grafts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cut & paste error here?
Presumably this should be
assert(out && repo && repo->grafts);
Any progress? This would be really useful for me. |
Kindly bump the question about any progress on this :) |
This kind-of succeeds the pull request #4331 from @tiennou, as he indicated that he's busy with the libssh backend for now. I've done quite a lot of changes to the interface itself:
git_grafts
to help refactoring and extension in the future, should the need ever arise.git_grafts_refresh
. This will also make it easier in the future to implement functions likegit_grafts_write
, for example when creating shallow clones or if users need to amend the grafts files on their own.@tiennou pointed out already that grafts are in fact kind of deprecated, so some of the listed benefits may never apply. But I still think that the resulting code is a lot easier to read by totally encapsulating it into its own code module.