Skip to content

Shallow support v2 #5254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 9, 2023
Merged

Shallow support v2 #5254

merged 14 commits into from
May 9, 2023

Conversation

pks-t
Copy link
Member

@pks-t pks-t commented Oct 3, 2019

This kind-of succeeds the pull request #4331 from @tiennou, as he indicated that he's busy with the libssh backend for now. I've done quite a lot of changes to the interface itself:

  • Grafts are now a "real" standalone structure git_grafts to help refactoring and extension in the future, should the need ever arise.
  • Grafts can now be directly tied to a file so that one can just call git_grafts_refresh. This will also make it easier in the future to implement functions like git_grafts_write, for example when creating shallow clones or if users need to amend the grafts files on their own.
  • The grafts code is a lot more encapsulated in "grafts.c". The structure itself is now completely opaque to its users, so that all logic with respects to graft commit storage and parsing of files is contained in "grafts.c".

@tiennou pointed out already that grafts are in fact kind of deprecated, so some of the listed benefits may never apply. But I still think that the resulting code is a lot easier to read by totally encapsulating it into its own code module.

tiennou
tiennou previously approved these changes Oct 3, 2019
Copy link
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few little bits and bobs, and comments/insights about it, especially in the grand scheme of protocol support changes 😉, but I'm glad to see this finally shape up !


assert(grafts);

git_oidmap_foreach_value(grafts->commits, graft, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cooool 😉.


error = git_futils_readbuffer_updated(&contents, grafts->path,
&grafts->path_checksum, &updated);
if (error < 0 || error == GIT_ENOTFOUND || !updated) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of thing where I'm wondering "how would it work after I add the write-side of it" (ie. the actual part where the grafts are updated from the server's view). Swallowing ENOTFOUND would make an externally-triggered delete of the graft file (eg. an out-of-process unshallow takes place which just unlink() the file, stuff like that 😉). I haven't checked the semantics of readbuffer_updated in this case…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's removed then it would return GIT_ENOTFOUND. But in fact, we should clear the grafts' contents in that case, which I missed out for now, so that definitely needs to be added

cl_git_sandbox_cleanup();
}

void test_repo_grafts__graft_register(void)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/register/add

git_oid_equal(&tmp_oid, &oids.ids[1])) ||
(git_oid_equal(&g_shallow_oid, &oids.ids[1]) &&
git_oid_equal(&tmp_oid, &oids.ids[0])));
git_oidarray_free(&oids);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… so ideally, this testcase should just verify that unlinks are also picked up by dropping all shallow roots. I forgot 😉.

const git_oid *oid;
size_t i = 0;
int error;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(out && grafts);

{
git_buf contents = GIT_BUF_INIT;
int error, updated = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(grafts);


if ((error = git_grafts_new(&grafts)) < 0)
goto error;
grafts->path = git__strdup(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
grafts->path = git__strdup(path);
grafts->path = git__strdup(path);
GIT_ERROR_CHECK_ALLOC(grafts->path);

@pks-t pks-t force-pushed the pks/shallow-support branch from a6ceab9 to f65aa53 Compare October 3, 2019 12:43
@pks-t
Copy link
Member Author

pks-t commented Oct 3, 2019

Thanks a lot for your review! Ain't got time to do another round of updates just now, but will do so later

@pks-t pks-t force-pushed the pks/shallow-support branch from f65aa53 to c32cdcc Compare October 3, 2019 15:06
@pks-t
Copy link
Member Author

pks-t commented Oct 3, 2019

Addressed all comments by @tiennou and fixed a memory leak introduced with the refresh code

@tiennou
Copy link
Contributor

tiennou commented Oct 3, 2019

So, just for kicks, I've rebased the rest of the shallow code on top of that, and pushed it here, and it broke the only testcase I had 😉.

It seems the syntax between grafts and shallow files is different, so the parsing can't be as shared as you have it… For the rest, this is still a very WIP rebase — since your grafting API is so much better than mine, I can use it as part of the external API more easily, and now there are quite a few git_oidarray/git_array_oid_t/git_grafts mismatches/cruft. Hopefully having some was-known-good code will help 😉. You'll see parts of should-now-be-updated-#5105 as well.

@pks-t
Copy link
Member Author

pks-t commented Oct 3, 2019

So, just for kicks, I've rebased the rest of the shallow code on top of that, and pushed it here, and it broke the only testcase I had .

Bummer. I'll have a look at it. Is it because of the parsing simplification or of yet unknown cause?

It seems the syntax between grafts and shallow files is different, so the parsing can't be as shared as you have it… For the rest, this is still a very WIP rebase — since your grafting API is so much better than mine, I can use it as part of the external API more easily, and now there are quite a few git_oidarray/git_array_oid_t/git_grafts mismatches/cruft. Hopefully having some was-known-good code will help . You'll see parts of should-now-be-updated-#5105 as well.

Yeah. Honestly, I don't really like this git_oidarray and git_array_oid_t stuff as I think it's weird to use. I thought about just going all in and exposing the git_grafts object directly. So we expose a git_repository_grafts and git_repository_shallow_grafts function that allows one to retrieve the respective grafts, and then we simply expose parts of its API. What do you think about that?

@pks-t
Copy link
Member Author

pks-t commented Oct 3, 2019

With regards to the format. It should in fact be the same format as "info/grafts", except for that every such line may not have any parents. Stated differently, each line shall hold a single commit ID and nothing else. And that's a perfectly valid grafts file, right?

Could very well be though that I screwed up the parsing part while refactoring to use our own parser. Do you have a simple test grafts file that the new code fails to parse correctly?

@pks-t
Copy link
Member Author

pks-t commented Oct 3, 2019

Ah. One thing I noticed in your additions: you have a git_grafts_geti function, but indeed the grafting commits are not guaranteed to be in any particular order. Is the ordering of them important to anything?

Edit: yup, that's very likely to be the issue here. You loop around geti, but this will depend on the oidmap behaviour. Some indices in between may not be populated depending on the map's allocated size, so you won't loop through all OIDs.

@tiennou
Copy link
Contributor

tiennou commented Oct 3, 2019

Is the ordering of them important to anything?

AFAIR, it does not matter — the server will skip whatever we tell it as part of its revwalk. It's only so the server is aware of where to start the shallow pack.

@tiennou
Copy link
Contributor

tiennou commented Oct 3, 2019

Could very well be though that I screwed up the parsing part while refactoring to use our own parser. Do you have a simple test grafts file that the new code fails to parse correctly?

Yes, the code expects a separator after the OID, and thus for the shallow syntax the parser errors.
The format outputted by the write code as part of the test should work — I ran Git against the shallow clone while debugging, but it's not part of clar.

(git_oidarray/git_array_oid_t/git_grafts mismatches/cruft discussion). What do you think about that?

I'm fine with how your API looks, and looks sufficiently generic as to be used for the negotiation step. It's just about which part is made public, since any transport implementation will need to manipulate them (ie. it floats around "git2/sys/transport.h", again).

@pks-t pks-t force-pushed the pks/shallow-support branch from c32cdcc to e79bd3c Compare October 10, 2019 10:11
@pks-t
Copy link
Member Author

pks-t commented Oct 10, 2019

Yes, the code expects a separator after the OID, and thus for the shallow syntax the parser errors.

I honestly don't get it, sorry. For the shallow clones I performed, git(1) created a ".git/shallow" file that simply contained a single OID followed by newline. Nothing inbetween, no separator. I also tried to run your feature branch, but it crashed in seemingly unrelated locations (for instance dereferencing opts->depth when opts is NULL, and one crash in submodules code).

Anyway -- I've written another test suite grafts::parse that ties down the grafts parsing logic, and as far as I can see it currently behaves exactly like I'd expect it to. Please feel free to create another test case for it that shows what I'm doing wrong.

@pks-t pks-t force-pushed the pks/shallow-support branch 2 times, most recently from bdfd791 to 9e03ca3 Compare October 10, 2019 12:20
@tiennou
Copy link
Contributor

tiennou commented Oct 10, 2019

I honestly don't get it, sorry.

That's strange, I was getting one failure, on the mkfile-based testcase, because of an expected space after the graft OID. I have no idea if an empty graft line works (a.k.a is it a syntax error, or synonymous with a "unparented" commit). IIRC git hard-errors on a broken .git/info/grafts, hence the separate parser I had.

I'll definitely investigate though. AFAIK the only major think missing is "commit the shallow file" once we're sure all objects have been unpacked, a.k.a locking and transactions 😉.

@pks-t
Copy link
Member Author

pks-t commented Oct 11, 2019

IIRC git hard-errors on a broken .git/info/grafts, hence the separate parser I had.

It doesn't, it just complains loudly and visibly on every git command that gets executed. I've tested for several different cases how we behave and how git behaves. Stuff like creating cyclic commit graphs by grafting A to B and B to A and such stuff. I was surprised that we actually handled this just fine and behaved exactly like git does.

The only exception I found was grafting a commit to a tree, where we just bailed out but git just displayed the commit, but without any parents. But I guess it's perfectly fine for us to say "Nope, that's wrong, don't do it."

I'll definitely investigate though. AFAIK the only major think missing is "commit the shallow file" once we're sure all objects have been unpacked, a.k.a locking and transactions .

Cool. Just let me know when you think this PR may progress so that I can tackle the next set of features towards shallow support.

@pks-t
Copy link
Member Author

pks-t commented Oct 24, 2019

I've rebased this branch on the latest master. I've also decided to bail out of the public API and thus removed the function git_repository_shallow_roots for now. I for myself am much too uncertain yet how to design the public API and don't want to paint us into a corner just yet. I think we should get a bit more comfortable with things like both lifecycle and when to refresh and only then start exposing some certain sets of functionality, ideally based on definitive usecases.

@tiennou
Copy link
Contributor

tiennou commented Nov 5, 2019

I share the concerns, and I agree we don't want to push more design work here until the protocol part is more fleshed out (as you say, refresh & lifecycle issues). It's just that we do allow custom transports, and those transports that want to handle shallow fetches will want access as well, and thus it's doomed to become public anyways…

Also, I feel like there should be a way to know you're not really done revwalking — ideally, a GUI client might want to inform the user that there are more commits available, to propose a shallow fetch of more of them. Right now you can't say, apart from a blanket "this repo is shallow", and you can't know if the last OID you've revwalked really was the last…

@pks-t
Copy link
Member Author

pks-t commented Nov 6, 2019 via email

@Nivl
Copy link

Nivl commented Apr 27, 2020

Is this still planned for 1.1? :)

@pks-t
Copy link
Member Author

pks-t commented May 12, 2020

Is this still planned for 1.1? :)

I'd very much like to aim for it. Let me re-roll and fix conflicts to get it going again.

@pks-t pks-t force-pushed the pks/shallow-support branch from 7b6f4f7 to 1e1ea85 Compare May 12, 2020 10:57
@pks-t
Copy link
Member Author

pks-t commented May 12, 2020

/rebuild

@libgit2-azure-pipelines
Copy link

Sorry @pks-t, an error occurred while trying to requeue the build.

tiennou and others added 11 commits June 27, 2020 14:33
In order to increase maintainability in the future, we should try to
make structures as self-contained and opaque to its users as possible.
Thus it is probably not a good idea to just typedef `git_graftmap` to
`git_oidmap`, as that will make it a lot harder in the future to extend
the API in the future, if this need ever arises.

Refactor the code to instead declare a real structure `git_grafts`,
which is completely opaque to its callers.
Instead of using the newly introduced `git_buf_foreach_line`, which
modifies the buffer itself, we should try to use our existing parsing
infrastructure in "parse.h".

Convert the grafts parsing code to make use of `git_parse_ctx`. Remove
the `git_buf_foreach_line` macro, as grafts have been its sole user.
Parsing of grafts files is currently contained in the repository code.
To make grafts-related logic more self-contained, move it into
"grafts.c" instead.
The shallow roots are in fact another user of the grafting mechanism,
and in essence they do use the same file format for grafted commits.
Thus, instead of hand-coding the parsing logic a second time, we can
just reuse the `git_grafts` structure for shallow commits, as well.
When loading shallow grafts, we add each of the grafting commits to the
grafts backed by ".git/info/grafts". Keeping track of both grafts
separately, but partially storing them in a common grafts structure is
likely to lead to inconsistencies. In fact, there already are
inconsistencies if refreshing shallow grafts at a later point, as we
only refresh the shallows, but not the normal grafts in that case.

Disentangle both grafting stores and instead check both separately when
parsing commits.
The refresh logic for both "normal" and shallow grafts are currently
part of the repository code and implemented twice. Unify them into the
grafts code by introducing two new functions to create grafts from a
file and to refresh a grafts structure.
If replacing an already existing graft in the grafts map, then we need
to free the previous `git_commit_graft` structure.
Currently, we expose the function `git_repository_shallow_roots` to get
all grafted roots of the repository. This already paints us into a
corner, though, as we certainly need to experiment with some
functionality of the grafting mechanism before we can happily expose
some of its functionality. Most importantly, we need to get right when
to refresh grafts and when not.

Thus, this commit removes the public function with no public
replacement. We should first try and see what usecases people come up
with to e.g. expose the `git_grafts` mechanism directly in the future
or do something different altogether. Instead, we provide an internal
interface to get weak pointers to the grafting structs part of the
repository itself.
@pks-t pks-t force-pushed the pks/shallow-support branch from bfab05a to 79af067 Compare June 27, 2020 12:34
@pks-t
Copy link
Member Author

pks-t commented Jun 27, 2020

Rebased again to spur interest and check whether it still passes CI.

@ethomson
Copy link
Member

Rebased again to spur interest and check whether it still passes CI.

It does! 😁

Sorry, I'm slammed right now, I'll try to take an 👀 early next week.

@pks-t
Copy link
Member Author

pks-t commented Jun 28, 2020

@ethomson No worries, take your time!

@simongottschlag
Copy link

simongottschlag commented Apr 16, 2021

@pks-t @ethomson thank you so much for all the awesome work! 😊😊👍

Is there anything that can be assisted with in regards to this PR?

Libgit2/git2go is used in the Flux project (https://github.com/fluxcd/flux2) and used to clone repositories regularly into Kubernetes clusters to provide a GitOps workflow. Libgit2 and go-git are the two libraries supported but Libgit2 is the one that supports git wire protocol v2 which is needed for some Git providers and will most likely be more and more important in the future.

The challenge right now is that go-git supports shallow cloning which takes a lot less space, time and bandwidth and Libgit2 doesn't which means those using git wire protocol v2 will have a a slower experience and taking up more space as well as bandwidth.

With this PR, when it's available with git2go, Libgit2 can be used by everyone.

I hope you have a great weekend! 🚀🎉

@libgit2 libgit2 deleted a comment from RiadGahlouz Apr 16, 2021
@libgit2 libgit2 deleted a comment from phillebaba Apr 16, 2021
@libgit2 libgit2 deleted a comment from litetex Apr 16, 2021
@@ -1237,6 +1263,20 @@ int git_repository_set_index(git_repository *repo, git_index *index)
return 0;
}

int git_repository_grafts__weakptr(git_grafts **out, git_repository *repo)
{
assert(out && repo && repo->shallow_grafts);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cut & paste error here?
Presumably this should be
assert(out && repo && repo->grafts);

@jwpjrdev
Copy link

Any progress? This would be really useful for me.

@birunts
Copy link

birunts commented Jun 6, 2022

Kindly bump the question about any progress on this :)

This was referenced Aug 30, 2022
@ethomson ethomson merged commit e07aa9c into libgit2:main May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants