Skip to content

Handle repository format v1 #5388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 2, 2020
Merged

Handle repository format v1 #5388

merged 1 commit into from
Apr 2, 2020

Conversation

bk2204
Copy link
Contributor

@bk2204 bk2204 commented Feb 6, 2020

Git has supported repository format version 1 for some time. This format is just like version 0, but it supports extensions. Implementations must reject extensions that they don't support.

Add support for this format version and reject any extensions but extensions.noop, which is the only extension we currently support.

This is a prerequisite to adding SHA-256 support.

@ethomson
Copy link
Member

ethomson commented Feb 7, 2020

Implementations must reject extensions that they don't support.

We need to have a plan for this, since it's hard to really know what extensions we support. Extensions are a mixed bag of plumbing and application layer configuration.

Concretely: libgit2 can support worktreeConfig just fine, but preciousObjects is something that an application author may wish to support. We need to allow somebody to build a repacker on top of libgit2 that can honor it while obeying the "if a version-1 repository specifies any extensions.* keys that the running git has not implemented, the operation MUST NOT proceed" notion.

I presume that this means that applications will need a mechanism by which they can register that they understand a particular extension, and then we can just claim understanding on their behalf when opening a config.

One gotcha here is that git has specified identical language in the unknown extension in the configuration case as the unknown value to an extension configuration case:

If a version-1 repository specifies any extensions.* keys that the running git has not implemented, the operation MUST NOT proceed. Similarly, if the value of any known key is not understood by the implementation, the operation MUST NOT proceed.

One is to assume that "the operation MUST NOT proceed" in the first sentence must refer to literally any git operation, because you have no way of knowing whether the extension applies to the operation that you were asked to do. That is to say, if an implementation doesn't understand the preciousObjects extension, then it can't do anything at all and all operations must fail.

However, if one is to apply that same thinking to the next sentence, then does git clone need to validate that the value of preciousObjects is "understood by the implementation"? This would mean that we would need implementations to give us some list of values that are legitimate, which is terrible.

🤔

@ethomson
Copy link
Member

ethomson commented Feb 7, 2020

For more information, see the comments in #3253

@ethomson
Copy link
Member

ethomson commented Feb 7, 2020

Also - to clarify my comments above - I'm not looking for you to solve this problem here, I'm just talking it over. Obviously there are a great many extensions that we can implement ourselves without worrying about application support above us and certainly noop is one of them. ;)

@tiennou
Copy link
Contributor

tiennou commented Feb 7, 2020

There's also some points here, about one way to implement those "extensions points", at the protocol level. I planned to share that with the repository layer, but never went through with it… It's been a while, as well 😉.

@bk2204
Copy link
Contributor Author

bk2204 commented Feb 7, 2020

Implementations must reject extensions that they don't support.

We need to have a plan for this, since it's hard to really know what extensions we support. Extensions are a mixed bag of plumbing and application layer configuration.

Concretely: libgit2 can support worktreeConfig just fine, but preciousObjects is something that an application author may wish to support. We need to allow somebody to build a repacker on top of libgit2 that can honor it while obeying the "if a version-1 repository specifies any extensions.* keys that the running git has not implemented, the operation MUST NOT proceed" notion.

I presume that this means that applications will need a mechanism by which they can register that they understand a particular extension, and then we can just claim understanding on their behalf when opening a config.

I think there are several options that we can support out of the box just natively. You've mentioned worktreeConfig as one; noop is a trivial example of another. We may want to allow people to indicate they support certain, specified other ones, such as preciousObjects. However, that should be a whitelist of options we know people can support. If someone lied to us and said they supported extensions.objectFormat, we should not allow them to do that, because doing so can and will corrupt the repository.

All of that, however, is orthogonal to this series, which implements only the barest, most essential v1 repository format, and nothing more. My goal is to get SHA-256 support working, so at the moment, I'm not planning to offer folks a way to register extensions. That would be good and valuable work for someone else to pick up, though.

One gotcha here is that git has specified identical language in the unknown extension in the configuration case as the unknown value to an extension configuration case:

If a version-1 repository specifies any extensions.* keys that the running git has not implemented, the operation MUST NOT proceed. Similarly, if the value of any known key is not understood by the implementation, the operation MUST NOT proceed.

One is to assume that "the operation MUST NOT proceed" in the first sentence must refer to literally any git operation, because you have no way of knowing whether the extension applies to the operation that you were asked to do. That is to say, if an implementation doesn't understand the preciousObjects extension, then it can't do anything at all and all operations must fail.

Yes, that's my understanding. If the config is set and not understood, no operation by that implementation may proceed on the repository.

However, if one is to apply that same thinking to the next sentence, then does git clone need to validate that the value of preciousObjects is "understood by the implementation"? This would mean that we would need implementations to give us some list of values that are legitimate, which is terrible.

Yes, git clone needs to hard fail if the config is set in the global or system configuration and it's not understood. That's the way Git works, and it's important for data integrity if people are relying on their Git server not destroying objects (e.g., for compliance or backup reasons).

I think we can have a list of extensions which we are willing to support, whether internally or with help from the program calling libgit2. If we don't know about it, then users can't specify that they handle it. That's the safe and robust approach.

@ethomson
Copy link
Member

ethomson commented Feb 8, 2020

I think we can have a list of extensions which we are willing to support, whether internally or with help from the program calling libgit2. If we don't know about it, then users can't specify that they handle it. That's the safe and robust approach.

I agree. My concern is that the interface for users to specify this is severely disappointing, since they have to specify all the values that they support as well.

If the extension model in git was to define the supported value range and keep it immutable, it would be significantly easier for us. That's not a thing we can solve here, of course, but I'll bring this up on the git mailing list.

@tiennou
Copy link
Contributor

tiennou commented Feb 10, 2020

My own "design plan" had been to get #4856 merged first, mostly because the current repository open path is somewhat convoluted (there's a few discrepancies because of custom/in-memory repositories, worktrees and envvars collusion 😉). This would enable passing the supported list of options could be passed as part of repository opening, as a git_strarray, or maybe some "validation" callback ; which could also benefit the custom/in-memory users…

How to check for options would then be either handled internally as "the default list", a.k.a the only extensions libgit2 has support for/knows about, or delegated to the user's code.

Git has supported repository format version 1 for some time.  This
format is just like version 0, but it supports extensions.
Implementations must reject extensions that they don't support.

Add support for this format version and reject any extensions but
extensions.noop, which is the only extension we currently support.

While we're at it, also clean up an error message.
@bk2204
Copy link
Contributor Author

bk2204 commented Feb 11, 2020

I think I'd prefer to merge this series first, since #4856 appears to not have seen any changes since October and also isn't passing tests. This series should, with my latest changes, be ready to merge as it is.

I realize it doesn't have a lot of additional support for user extensibility, but it is a framework to build upon, and it is also completely API-compatible with older versions.

@ethomson
Copy link
Member

I realize it doesn't have a lot of additional support for user extensibility, but it is a framework to build upon, and it is also completely API-compatible with older versions.

Seems reasonable. I think that there are open questions around how we support user-extensibility.
In the meantime, there's none here so this seems like we should 👍 . Thanks, @bk2204.

We're basically locked down for the imminent 0.99, we'll :shipit: this asap once we open up the tree again.

@ethomson
Copy link
Member

ethomson commented Apr 1, 2020

@pks-t as well for review

@pks-t
Copy link
Member

pks-t commented Apr 2, 2020 via email

@pks-t
Copy link
Member

pks-t commented Apr 2, 2020

This PR looks fine as is to me, thanks a lot for working on it @bk2204!

Seems reasonable. I think that there are open questions around how we support user-extensibility.

I guess we need a mechanism similar to the transport registration, where an application may provide a callback it's being passed to all extensions that a repository requires. But, this is going to open a can of worms I've already been stumbling over when implementing reftables. Want to handle an extension that says the reference database is not the fs-backend one anymore? Well, good luck doing that without implementing this support in libgit2 directly.

We'll need to grow more extensible in some places, refdbs is only one of them. And I'm not sure we can grow generic enough to allow applications to handle arbitrary extensions in the long run. Which doesn't mean we shouldn't try our best.

Anyway, this PR is a good step in the right direction to support extensions at all and doesn't force us down any specific route, so I'm happy to merge it. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants