Skip to content

Conversation

tiennou
Copy link
Contributor

@tiennou tiennou commented Jul 21, 2017

Note that this is on top of the gather-reflog-message PR, because it's reflog-message-related.

This fixes #4292 (test in 228a8be) and a bug where we would lose reflogs (test in 07e1234).

As this is my first foray into the refdb code, I hope I didn't do toooo badly 😉. The first few commits are prerequisite work on massaging the code to be more amenable to the task at hand. Highlights are :

  • e4f878e, where some flags are added so the later code can be more granular on what it needs to be done.
  • d78629d, where the various *_delete calls gain the standard who/signature pair, because the reflog needs it.
  • eaf15ff - 7c2f505, where the ref-renaming code is gradually updated/bugfixed until it behaves correctly.

@tiennou tiennou force-pushed the fix/ref-rename-reflog branch 2 times, most recently from 62287b8 to a86de58 Compare July 21, 2017 22:10
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from a86de58 to 90a35d5 Compare September 16, 2017 22:05
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from 90a35d5 to f516449 Compare November 14, 2017 23:43
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch 2 times, most recently from c23b3b2 to bca24eb Compare November 26, 2017 18:02
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from bca24eb to 74c64c4 Compare February 2, 2018 23:40
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from 74c64c4 to 80d7f20 Compare April 11, 2018 20:04
@tiennou
Copy link
Contributor Author

tiennou commented Apr 18, 2018

Rebased, which makes the commits ID above slightly wrong 🤷‍♂️ .

Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit at loss with these changes, probably because I'm not too familiar with our reference code and because there was no reasoning layed out why these changes are necessary. I do get the overall goal, but what I'm still missing is why the current code does not suffice (or what I am misunderstanding).

git_reference *tmp = NULL, *head = NULL, *peeled = NULL;
const char *name;

if (ref->type == GIT_REF_SYMBOLIC)
return 0;

/* if we can't resolve, we use {0}*40 as old id */
if (git_reference_name_to_id(&old_id, backend->repo, ref->name) < 0)
memset(&old_id, 0, sizeof(old_id));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we've always passed a valid old ID into reflog_append, where we now always pass NULL to it (as there seems to be only one caller, which passes NULL). This is no problem, though, as reflog_append does the exact same dance that is getting deleted here in case old is NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to make of that, so thanks, I guess 😉? My rational argument would be that it moves the duties of formatting NULL to reflog_append.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but 1) IMHO the code was already doing magic to grab the oids 2) there's no test case covering what actually ends up in the reflog (AFAICS passing or not passing the oids here results in no reaction from the testsuite).

If that's okay with you, I'd like to revisit that in another PR (or maybe I manage to build a test case that cares, in which case I'll tack it on).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always in favor of more tests :)

src/refdb_fs.c Outdated
@@ -1399,16 +1399,14 @@ static int refdb_fs_backend__delete_tail(
goto cleanup;
}

if ((flags & GIT_REFDB__SKIP_REFLOG) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have an explanation here why we'd always want to write that log in the commit message. At least I'm a bit clueless why that should be the case, when not writing it has been requested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this is so the HEAD reflog gets its 0124adbc 00000000 sig Branch: renamed foo to bar, which marks the deletion (the first part of the "renaming" pair). But I'm not quite sure anymore 😞.

}
if ((error = loose_commit(&head_lock, head)) < 0)
goto cleanup;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code feels a bit like a layering violation to me. What happens in case where we rename a reference that is not part of the FS refdb, but part of a different one? Or in case where HEAD itself is not part of the FS refdb? Sure, especially the second scenario is unlikely, but it just feels a bit weird to handle HEAD at this level.

What confuses me is that in a later commit, you delete code that is supposed to update HEAD. Why is it necessary to move the code into refdb_fs and why didn't the other code suffice? Furthermore, shouldn't these two commits belong together then, ideally with your line of thought why the move is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the layering, though I'm not sure I understand your cases : the first one should not happen, because the current refdb is being called on behalf of it being the ref's backend. The second is already true because the refdb already relookups HEAD when writing.

The strange-two step process is there to make the reflog actually correct (before the last step, the git_repository_set_head call would insert a third, duplicate message). Looking back, they can be merged, so I don't know why I kept them separate. As to the why, it's because the reference renaming process is much more finicky with locks than what can be done at this level, so it's either "dump everything on the refdb so it can handle it" (this approach) or "make it more transaction-y" so we can work at an higher level.

@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from 80d7f20 to e5cd238 Compare May 8, 2018 11:20
@tiennou
Copy link
Contributor Author

tiennou commented May 8, 2018

Sorry for the lack of reasoning. There's additional context in #4294, but the gist of the issue is that 1) we don't reflog things like git does when deleting and renaming branches (we must pass more context when deleting, since we might need to update the HEAD reflog, which happens anytime you can see a who, msg signature pair), 2) we will break things when deleting refs if anything goes wrong (usually by eagerly deleting reflogs).

I remember wondering about layering here too: right now the reflog handling seems brittle IMHO, because our refdb_fs will take care of some parts (it maybe_appends_head when writing), but it's not sufficient for the harder cases (renames & deletes).

An alternative implementation would be to build on the git_transaction API so that we can have append_reflog/rename_reflog be handled atomically within the actual git__reference_delete/git__reference_rename functions, and hopefully leave the low-level methods cleaner. That would be a much more invasive change to the refdb API though, because all those who, msg could be removed in favor of a git_transaction_append_reflog/rename_reflog API.

@tiennou tiennou force-pushed the fix/ref-rename-reflog branch 2 times, most recently from 3536516 to e3c2859 Compare October 15, 2018 22:57
Copy link
Contributor Author

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased, and ready for review. I'm not sure I can have correct reflogs and not change the deletion API, so here's a quick demonstration script :

git init unborn-delete-ref && cd unborn-delete-ref
echo foo > file && git add file && git commit -m init
git update-ref -d refs/heads/master
cat .git/logs/HEAD

You'll end up with an unborn HEAD, and a partial HEAD reflog with no message (I didn't -m anything to update-ref) and a signature.

I might be able to limit the API changes to the refdb (if it's supposed to be our update-ref), given that git complains when you try to delete the branch you're on.

git_reference *tmp = NULL, *head = NULL, *peeled = NULL;
const char *name;

if (ref->type == GIT_REF_SYMBOLIC)
return 0;

/* if we can't resolve, we use {0}*40 as old id */
if (git_reference_name_to_id(&old_id, backend->repo, ref->name) < 0)
memset(&old_id, 0, sizeof(old_id));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but 1) IMHO the code was already doing magic to grab the oids 2) there's no test case covering what actually ends up in the reflog (AFAICS passing or not passing the oids here results in no reaction from the testsuite).

If that's okay with you, I'd like to revisit that in another PR (or maybe I manage to build a test case that cares, in which case I'll tack it on).

@pks-t
Copy link
Member

pks-t commented Nov 2, 2018

Hm. As far as I see, the only API changes are with regards to introducing the author and message arguments so that we are able to write correct reflogs, right? What about just adding two new function calls instead which do take these parameters. In case where a backend has these functions we use them, otherwise we'll fall back to the old ones. In that case, we have to accept the fact that some backends simply can't handle correct reflogs.

As one example, doing the API change would break the redis refdb backend that lives in libgit2/libgit2-backends.

@tiennou
Copy link
Contributor Author

tiennou commented Nov 2, 2018

Yeah, looking more closely at it, the API changes is caused because I'm fixing HEAD-reflog-isn't-correct-on-unborning-it (a.k.a the deletion part), submit that as another PR, and then I can consider that "a missing feature" and not get bothered about backward-compatibility 😉.

I've tried to do a clean API break, and it's messy. I think I have to update the version (which we're not enforcing anywhere BTW), else I'd end up checking some trailing memory when using an old-style backend. I also can't add the implementation next to del without also breaking old backends. So I'm left with a del_v2 function far from it's deprecated friend, which I'm not fond of, and an update to the version. Which means I could just copy the whole struct as git_refdb_backend_v2, fix the del function there, and autoupdate v1 backends to v2 via a wrapper for the del style-change.

I'm likely to split out of this PR all that deletion stuff to another one, so at least the reported issue (and bugs) can be fixed. The deletion is just something I found out while experimenting on reflogs.

@pks-t
Copy link
Member

pks-t commented Nov 7, 2018

I've tried to do a clean API break, and it's messy. I think I have to update the version (which we're not enforcing anywhere BTW), else I'd end up checking some trailing memory when using an old-style backend. I also can't add the implementation next to del without also breaking old backends. So I'm left with a del_v2 function far from it's deprecated friend, which I'm not fond of, and an update to the version. Which means I could just copy the whole struct as git_refdb_backend_v2, fix the del function there, and autoupdate v1 backends to v2 via a wrapper for the del style-change.

Yeah, it'll get messy. I think disambiguating between two different structs based on whether the version field indicates old or new layout would be the easiest way to get it right. It has its own problems, though, as we'd first have to cast the old version struct to the new one, which is not allowed in C.

Next-best might be something like this:

struct opts {
    int version;
    char *opt;
};

struct optsv2 {
    struct opts v1;
    char *new_opt;
}

struct optsv3 {
    union {
        struct opts v1;
        struct optsv2 v2;
    }
    char *newest_opt;
}

Like this, we can guarantee that the old options struct will never change while being able to add new options. Code might look as following:

int fn(struct opts *_o)
{
    struct optsv3 o = INIT_DEFAULTS;

    if (_o && o->version == 1) {
        memcpy(&o.v1, _o, sizeof(o.v1));
    } else if (_o && o->version == 2) {
        memcpy(&o.v2, _o, sizeof(o.v2));
    } else if (_o && o->version == 3) {
        memcpy(&o, _o, sizeof(o.v3));
    }

    o.v1.opt;
    o.v2.new_opt;
    o.newest_opt;
}

Might be messy, but I think it's the easiest to get right because we don't need to care about keeping the memory layout of old options. Alternatively, one could just make sure to only ever append new fields to the options struct. But it makes it harder than necessary to keep track of how many bytes we're allowed to memcpy.

I don't know. I guess we need to learn how to do this properly now.

@ethomson
Copy link
Member

ethomson commented Nov 7, 2018

Hmm. This is a little tricky, because there is no o->version in your example. It's only accessible as o->v1->version or o->v2->v1->version...

I think I see where you're going though, and it's nice in the sense that we don't have to duplicate options in structs (v2 can inherit from v1 and then only worry about the new bits, if I'm thinking about this correctly) but I think it's going to be incredibly confusing for end-users. In your example, if I have struct optsv3 o = INIT_DEFAULTS; then obviously I should set o.newest_opt, but what if I don't? What if I want to keep setting the v2 option because that's the one that I was using before? So what if I instead set o.v2.new_opt? Or o.v1.opt? (Or worse yet, o.v2.v1.opt?

If I wanted to opt in to a newer set of options, now I have to understand exactly what is deprecated and needs to move to the new-style. (I think. Or we have to figure out how to understand this options structure.)

I would propose that we introduce a dummy structure that only has a version component:

struct git_opt_t {
    int version;
};

Then we can:

struct opts {
    int version;
    char *opt;
    char *unchanged;
};

struct optsv2 {
    int version;
    char *new_opt;
    char *unchanged;
}

struct optsv3 {
    int version;
    char *newest_opt;
    char *unchanged;
}

Now, as an end-user, I get broken hard and clearly if I try to upgrade to optsv3 and I know that I have to set opt instead of newest_opt. But many options get to stay unchanged and my code stays as-is.

Now in our bits that take opts from a user (eg, checkout_data_init and the like) then we would need to do a bit more work, I concede. Something like:

struct optsv3 opts = INIT;

switch (((git_opt_t *)given_opts)->version) {
case 1:
    /* call a bespoke function that knows how to convert v1 to v3 */
    break;
case 2:
    /* call a bespoke function that knows how to convert v2 to v3 */
    break;
case 3:
    memcpy(&opts, given_opts, sizeof(struct optsv3));
    break;    
default:
    system("shutdown -r now"); /* UB */
}

@tiennou tiennou force-pushed the fix/ref-rename-reflog branch 4 times, most recently from bb6d734 to 0859f4d Compare February 2, 2019 18:03
@tiennou
Copy link
Contributor Author

tiennou commented Feb 3, 2019

This has been rebased, and should be green. The API breakage is isolated to the last commits (there's some kind of layering violation going on beforehand, see git_reference__log_signature).

I've tried to do my best to have the correct behavior at runtime, though I have to work with the constraint of having no way to attach more data to a git_refdb_backend, because it's just a bunch of callbacks (in that case, I can't use a downgrading trampoline, because I can't pass the previous backend's implementation to it, so I settled on a runtime version-check. An alternative would be to rewrap with a complete git_refdb_backend, which seemed cumbersome).

I've relegated the source-compatibility issues to a bunch of typedefs for the v1 structs 😒.

So, as a refresher, this PR's ultimate goal is :

  • make the refdb backend handle all HEAD refreshes (either when renaming its "pointing to" ref, or when deleting it), since it has all the inner knowledge (ie. locks) to do so.
  • correct a critical bug: we loose reflogs when renaming a ref, as we eagerly delete its backing file.
  • correct a few reflog bugs: we'd miss some entries when renaming/deleting HEAD.

Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review up to (excluding) f104277 due to lack of time

* itself). A refdb backend implementation must provide this function.
* itself).
*
* A refdb backend implementation must provide this function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"backend" is inconsistent with all the other comments

* databases, this may pack all loose references.
*
* A refdb implementation may provide this function; if it is not
* provided, nothing will be done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Nothing will be done" sounds a bit sloppy -- I certainly would expect the refdb to still perform its usual operations ;) Maybe "if it is not provided, the backend will not be asked to compress its references"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A refdb implementation may provide this function, if appropriate; a missing function will result in the compression request being silently ignored ?

* Lock a reference. The opaque parameter will be passed to the unlock function
* Lock a reference.
*
* The opaque parameter will be passed to the unlock function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, but I find this pre-existing documentation a bit weird. Proposal: "If the implementation sets the payload_out parameter, then its value will be passed to a later unlock of that same reference."?

*
* Only one of target or symbolic_target will be set.
* `success` will be true if the reference should be update, false if
* the lock must be discarded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also mention the payload parameter here. "The payload parameter will point to the value set up by the previous call to the lock function."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm intentionally vague here because that API is not clean (as per the success == 2 case), and I'm not really after fixing that issue (yet, since the how-to-backward-compat issue is still undecided). I might tackle it here though, for that same reason.

src/refdb.c Outdated
!backend->reflog_read || !backend->reflog_write ||
!backend->reflog_rename || !backend->reflog_delete ||
(backend->lock && !backend->unlock)) {
git_error_set(GIT_ERROR_REFERENCE, "incomplete refdb backend implementation");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we'd specify what exactly is missing. But I doubt that it'd be worth the additional required complexity to handle all the different cases, especially given that there aren't a lot of backends anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was just about facilitating the implementer's work (which I hope has a debugger handy) by complaining as soon as possible instead of segfaulting later on first use.

git_refdb_backend *_backend,
const git_reference *ref,
git_filebuf *file,
int update_reflog,
git__refdb_flags flags,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know it's not allowed to use enums as bitmasks. You'd instead have to use e.g. an unsigned int here.

git_reference *tmp = NULL, *head = NULL, *peeled = NULL;
const char *name;

if (ref->type == GIT_REF_SYMBOLIC)
return 0;

/* if we can't resolve, we use {0}*40 as old id */
if (git_reference_name_to_id(&old_id, backend->repo, ref->name) < 0)
memset(&old_id, 0, sizeof(old_id));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always in favor of more tests :)

src/refdb_fs.c Outdated
if ((error = git_reference__log_signature(&who, backend->repo)) < 0)
return error;

error = refdb_fs_backend__delete_impl(_backend, ref_name, who, NULL, old_id, old_target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so with this backwards-compatible change we'd be simply passing no message to the refdb deletion. Seems reasonable to me.

if ((error = maybe_append_head(backend, ref, old_id, NULL, who, message)) < 0)
goto cleanup;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hunk looks unrelated to what this commit is doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is the part that actually writes to the reflog when deleting. That's taken from write_tail, minus the "write to our own reflog part" because we already deleted it.

I'll rewrite the commit message though, as it's not clear (must explain why, not what).

/* XXX: warn ? */
git_error_clear();
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ((error = git_reference_lookup(&head, backend->repo, GIT_HEAD_FILE)) < 0) {
    /* Our repo is somehow missing its HEAD. Ignore the reflog in that case */
    if (error == GIT_ENOTFOUND)
         error = 0;
    return error;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also don't need to git_error_clear(). We state that errors are only valid immediately after a call to libgit2 that's returned an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for the test case that explicitely removes HEAD and expect the refdb to keep working. This makes a missing HEAD be silently ignored (and the warn comment is there to remind me that I need to warn about that when warning support will have landed).

I'll use your version though.

Copy link
Contributor Author

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the half-review @pks-t, much appreciated, as always. I've added a few notes where applicable, to help comprehension (hopefully). I'll wait to have a complete review before changing anything though.

* databases, this may pack all loose references.
*
* A refdb implementation may provide this function; if it is not
* provided, nothing will be done.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A refdb implementation may provide this function, if appropriate; a missing function will result in the compression request being silently ignored ?

*
* Only one of target or symbolic_target will be set.
* `success` will be true if the reference should be update, false if
* the lock must be discarded.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm intentionally vague here because that API is not clean (as per the success == 2 case), and I'm not really after fixing that issue (yet, since the how-to-backward-compat issue is still undecided). I might tackle it here though, for that same reason.

src/refdb.c Outdated
!backend->reflog_read || !backend->reflog_write ||
!backend->reflog_rename || !backend->reflog_delete ||
(backend->lock && !backend->unlock)) {
git_error_set(GIT_ERROR_REFERENCE, "incomplete refdb backend implementation");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was just about facilitating the implementer's work (which I hope has a debugger handy) by complaining as soon as possible instead of segfaulting later on first use.

@@ -43,6 +43,14 @@ enum {
PEELING_FULL
};

typedef enum {
GIT_REFDB__FORCE_WRITE = (1 << 0),
GIT_REFDB__SKIP_REFLOG = (1 << 1),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Force write is used to skip reference naming conflicts checks when writing.

} else if (success) {
error = refdb_fs_backend__write_tail(lock, backend, ref, flags, NULL, NULL, sig, message);
if (error == 0) {
error = loose_commit(lock, ref);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the previous versions of the _tail functions unconditionally released lock before returning, which could cause problems when renaming, as you need to keep the locks in place until you're really done messing with files (or restoring failures) 😉.

I somewhat pondered prefixing all those low-level helpers with ref_, so it gives more context.

if (error == 0) {
error = loose_commit(lock, ref);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic values ✨. I noticed, and it's used internally by the transaction code to signal that this is an unlock of a deleted reference (hence the deletion can proceed).

I'm not quite sure how to fix though… Options are :

  • make success be a enum of what is asked of the backend.
  • wrap the lock in a struct so it can keep track of why it was locked in the first place.

My preference goes to 2, as 1 looks like an abstraction leak.

return error;
if ((error = refdb_fs_backend__write_tail(&file, _backend, ref, flags, old_id, old_target, who, message)) < 0) {
goto cleanup;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax is hard 😉. It's usually leftovers because of debugging code being present at some time.

if ((error = maybe_append_head(backend, ref, old_id, NULL, who, message)) < 0)
goto cleanup;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is the part that actually writes to the reflog when deleting. That's taken from write_tail, minus the "write to our own reflog part" because we already deleted it.

I'll rewrite the commit message though, as it's not clear (must explain why, not what).

/* XXX: warn ? */
git_error_clear();
return 0;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for the test case that explicitely removes HEAD and expect the refdb to keep working. This makes a missing HEAD be silently ignored (and the warn comment is there to remind me that I need to warn about that when warning support will have landed).

I'll use your version though.

src/refdb_fs.c Outdated
int error = 0;

git__refdb_flags flags = (GIT_REFDB__SKIP_LOCK | GIT_REFDB__FORCE_WRITE);
if (!update_reflog) {
flags |= GIT_REFDB__SKIP_REFLOG;
}
if (success == 2) {
error = refdb_fs_backend__delete_tail(lock, backend, ref, flags, NULL, NULL);
if (!sig && (error = git_reference__log_signature(&who, _backend->repo)) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, now we're aborting whenever we are not able to get a signature. This may be seen as a regression, but due to the implementation of git_reference__log_signature it can only happen when we're out-of-memory. So this is indeed fine.

src/refdb_fs.c Outdated
error = refdb_fs_backend__delete_tail(lock, backend, ref, flags, NULL, NULL, sig ? sig : who, message);

if (!sig)
git_signature_free(who);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd favor to instead NULL-initialize who and unconditionally free it. Makes it easier to maintain in the future

git_filebuf file = GIT_FILEBUF_INIT;
git_reference *old, *new = NULL;
git_filebuf old_lock = GIT_FILEBUF_INIT;
git_filebuf new_lock = GIT_FILEBUF_INIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for two lock structs, as you clear the first one before doing anything with the other one. Please keep it, though, if you feel like it helps understanding the code.

/* We delete the old ref first so we don't cause a failure later if the new
* name is the suffix or prefix of the old. Also, we skip the reflog, because
* we want to rename it */
if ((error = refdb_fs_backend__delete_tail(&old_lock, _backend, old, GIT_REFDB__SKIP_REFLOG, NULL, NULL, who, message)) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to pass in who and message when you're skipping the reflog anyway?

git_reference_free(old);
return -1;
}
/* postpone empty hierarchy cleanup after we relinquish the lock */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment sounds like you're queueing the hierarchy cleanup for later, but in fact you are performing it now. Maybe "Now that we have released the lock, we can try to clean up empty directories"?

@@ -1274,7 +1274,7 @@ static int maybe_append_head(
if (strcmp(ref->name, GIT_HEAD_FILE) == 0)
goto cleanup;

if ((error = git_reference_lookup(&tmp, backend->repo, GIT_HEAD_FILE)) < 0)
if ((error = git_reference_dup(&tmp, head)) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to dup? We can simply use head instead, can't we? At least I don't see us passing its ownership to anybody else.


nentries_after = reflog_entrycount(g_repo, GIT_HEAD_FILE);

git_buf_printf(&msg, "branch: renamed %s to %s", git_reference_name(current_ref), git_reference_name(renamed_ref));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cl_git_pass(git_buf_printf(...));


nentries = reflog_entrycount(g_repo, GIT_HEAD_FILE);

cl_git_pass(git_branch_move(&renamed_ref, current_ref, "renamed", 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reorder a bit to make the intention clearer:

		nentries = reflog_entrycount(g_repo, GIT_HEAD_FILE);

		cl_git_pass(git_repository_head(&head_ref, g_repo));
		cl_git_pass(git_reference_resolve(&current_ref, head_ref));
		cl_git_pass(git_branch_move(&renamed_ref, current_ref, "renamed", 1));

		cl_assert_equal_i(nentries + 2, reflog_entrycount(g_repo, GIT_HEAD_FILE));

} else {
rename_cb_data payload;
payload.old_name = ref->name;
memcpy(&payload.new_name, &normalized, sizeof(normalized));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to get it straight, cause I'm not quite sure I understand from the commit message: previously, we have been updating HEAD in the refs code, so it was completely agnostic from the refdb backend. Now, we require every refdb backend to actually implement this by itself. Wouldn't this mean that we effectively break refdb backends that exist outside of our tree?

const git_oid *old_id, const char *old_target);

/* Old v1 struct version */
struct git_refdb_backend_v1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still haven't decided how we actually want to proceed with such changes. Thing is, by renaming the old struct we're effectively also breaking old programs, as they cannot be compiled anymore (or worse, the compiler may just print an incompatible function pointer warning that goes unnoticed, and we run into strange behavior). This scenario also isn't helped by GIT_REFDB_BACKEND_VERSION, as a recompile would also get the newer version.

So maybe we need to instead introduce a new git_refdb_backend_v2?

@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from 6e5ab14 to 2658b13 Compare May 21, 2019 22:59
@tiennou tiennou mentioned this pull request Jun 10, 2019
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from 2658b13 to d5a2212 Compare June 14, 2019 06:45
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from d5a2212 to e29f214 Compare September 5, 2019 09:15
@tiennou tiennou mentioned this pull request Nov 5, 2019
2 tasks
…ions

Renaming references need more precise control of when locks should be
taken/relinquished, because we might hit a couple of edge-cases when
reflogs are involved (eg. refs being moved "below" their current
namespace).
Deleting the currently checked out reference has to update the HEAD reflog, so we need a signature and message
This makes the check for orphaned references happen while deciding if we
should write a reflog (in `write_tail`) instead of special-casing the
actual writing.
Also, don't loose the old reflog when renaming
@tiennou tiennou force-pushed the fix/ref-rename-reflog branch from e29f214 to fcdae78 Compare December 6, 2019 21:43
Base automatically changed from master to main January 7, 2021 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Renaming the current branch doesn't cause a message to be written to the HEAD reflog
3 participants