Skip to content

Introduction of the "refFormat" extension #7102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pks-gitlab
Copy link

This pull request introduces support for the "refFormat" extension. On the one hand it introduces support for reading repositories that have this extension, even though we naturally only understand the files format right now. On the other hand this pull request moves the logic to initialize the refdb into the refdb backends so that the logic can be customized for every ref format.

I've decided to pull out these changes into a separate pull request in preparation for the reftable format. It's already somewhat non-trivial, so I think that reviewing it separately might be easier.

@pks-t pks-t requested a review from ethomson July 11, 2025 15:40
pks-t added 4 commits July 11, 2025 17:42
To support multiple different reference backend implementations,
Git introduced a "refStorage" extension that stores the reference
storage format a Git client should try to use.

Wire up the logic to read this new extension when we open a repository
from disk. For now, only the "files" backend is supported by us. When
trying to open a repository that has a refstorage format that we don't
understand we now error out.

There are two functions that create a new repository that doesn't really
have references. While those are mostly non-functional when it comes to
references, we do expect that you can access the refdb, even if it's not
yielding any refs. For now we mark those to use the "files" backend, so
that the status quo is retained. Eventually though it might not be the
worst idea to introduce an explicit "in-memory" reference database. But
that is outside the scope of this patch series.
While we only support initializing repositories with the "files"
reference backend right now, we are in the process of implementing a
second backend with the "reftable" format. And while we already have the
infrastructure to decide which format a repository should use when we
open it, we do not have infrastructure yet to create new repositories
with a different reference format.

Introduce a new field `git_repository_init_options::refdb_type`. If
unset, we'll default to the "files" backend. Otherwise though, if set to
a valid `git_refdb_t`, we will use that new format to initialize the
repostiory.

Note that for now the only thing we do is to write the "refStorage"
extension accordingly. What we explicitly don't yet do is to also handle
the backend-specific logic to initialize the refdb on disk. This will be
implemented in subsequent commits.
In our tests for "onbranch" config conditionals we set HEAD to point to
various different branches via `git_repository_create_head()`. This
function circumvents the refdb though and directly writes to the "HEAD"
file. While this works now, it will create problems once we have
multiple refdb backends.

Furthermore, the function is about to go away in the next commit. So
let's prepare for that and use `git_reference_symbolic_create()`
instead.
The initialization of the on-disk state of refdbs is currently not
handled by the actual refdb backend, but it's implemented ad-hoc where
needed. This is problematic once we have multiple different refdbs as
the filesystem structure is of course not the same.

Introduce a new callback function `git_refdb_backend::init()`. If set,
this callback can be invoked via `git_refdb_init()` to initialize the
on-disk state of a refdb. Like this, each backend can decide for itself
how exactly to do this.

Note that the initialization of the refdb is a bit intricate. A
repository is only recognized as such when it has a "HEAD" file as well
as a "refs/" directory. Consequently, regardless of which refdb format
we use, those files must always be present. This also proves to be
problematic for us, as we cannot access the repository and thus don't
have access to the refdb if those files didn't exist.

To work around the issue we thus handle the creation of those files
outside of the refdb-specific logic. We actually use the same strategy
as Git does, and write the invalid reference "ref: refs/heads/.invalid"
into "HEAD". This looks almost like a ref, but the name of that ref
is not valid and should thus trip up Git clients that try to read that
ref in a repository that really uses a different format.

So while that invalid "HEAD" reference will of course get rewritten by
the "files" backend, other backends should just retain it as-is.
@pks-gitlab pks-gitlab force-pushed the pks-refformat-extension branch from ab5f957 to 8d0ff81 Compare July 11, 2025 15:50
@pks-gitlab
Copy link
Author

The Windows failures are all unrelated to my changes, as far as I can see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants