-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Forced checkout #4129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forced checkout #4129
Conversation
Thanks, @pks-t for the failing test. I started banging on some of checkout's deficiencies in #3839, though I did not discover this case. :/ I mention that PR only in case this is something you're interested in pursuing - it would be good to know that there are definitely more problems that we should be thinking about. (Though you should not feel compelled to take any of that code if you do pursue this.) Otherwise, I will try to take a look at this problem this weekend. I'm pretty familiar with the checkout code by this point, so I'm happy to do it. |
I'd really love something like |
b524172
to
ba98327
Compare
I finally accumulated enough bravery to dive into that part of code again and fixed the issue. We now first determine whether the directory contains any unversioned files and, if it does, advance into it instead of immediately deleting it. I guess code code be prettier, but I'm not too familiar with checkout/index logic. |
I've spent a bit of time studying this - I'm going to need to dig into the checkout code to get familiar with it again. |
Yeah. I've been cursing quite a bit investigating the issue and the surrounding code. It's quite hard to grasp. |
So, after looking at this a bit, I think that it would be nice to keep the processing within a single loop and not do a second processing loop in Does it seem like it would be easier to simply always The following patch does that, and all our unit tests pass, but I do worry a bit that there are corner cases that I'm missing. diff --git a/src/checkout.c b/src/checkout.c
index 06cf65c43..25018d291 100644
--- a/src/checkout.c
+++ b/src/checkout.c
@@ -331,44 +331,6 @@ static int checkout_queue_remove(checkout_data *data, const char *path)
return git_vector_insert(&data->removes, copy);
}
-static int contains_unversioned(int *unversioned, git_index *index, const char *dir)
-{
- git_vector entries = GIT_VECTOR_INIT;
- git_buf path = GIT_BUF_INIT;
- const char *workdir, *entry;
- int error = 0;
- size_t i;
-
- if ((workdir = git_repository_workdir(index->rc.owner)) == NULL) {
- error = -1;
- goto out;
- }
-
- if ((error = git_buf_joinpath(&path, workdir, dir)) < 0)
- goto out;
-
- if ((error = git_path_dirload(&entries, path.ptr, strlen(workdir), 0)) < 0)
- goto out;
-
- *unversioned = 0;
-
- git_vector_foreach(&entries, i, entry) {
- const git_index_entry *ientry =
- git_index_get_bypath(index, entry, 0);
-
- if (ientry == NULL) {
- *unversioned = 1;
- break;
- }
- }
-
-out:
- git_vector_free_deep(&entries);
- git_buf_free(&path);
-
- return error;
-}
-
/* note that this advances the iterator over the wd item */
static int checkout_action_wd_only(
checkout_data *data,
@@ -408,18 +370,8 @@ static int checkout_action_wd_only(
*/
const git_index_entry *e = git_index_get_byindex(data->index, pos);
- if (e != NULL && data->diff->pfxcomp(e->path, wd->path) == 0) {
- int unversioned;
- if ((error = !contains_unversioned(
- &unversioned, data->index, wd->path)) < 0)
- return error;
-
- if (unversioned)
- return git_iterator_advance_into(wditem, workdir);
-
- notify = GIT_CHECKOUT_NOTIFY_DIRTY;
- remove = ((data->strategy & GIT_CHECKOUT_FORCE) != 0);
- }
+ if (e != NULL && data->diff->pfxcomp(e->path, wd->path) == 0)
+ return git_iterator_advance_into(wditem, workdir);
}
} |
ba98327
to
f2d5ded
Compare
If the `GIT_CHECKOUT_FORCE` flag is given to any of the `git_checkout` invocations, we remove files which were previously staged. But while doing so, we unfortunately also remove unstaged files in a directory which contains at least one staged file, resulting in potential data loss. The issue resides in our logic which tries to determine which action to excute for a tree entry. We currently assume that a directory is safe to be deleted as soon as it contains at least one versioned entry - which is ovbiously wrong. Instead, we first have to determine whether the directory contains any unversioned files and, if so, advance into the directory itself. As such, the directory will not be deleted, but all versioned entries inside of the directory will be reset correctly. This commit also adds two tests to verify behavior.
This is certainly a much saner fix than my additional loop here. Playing around with it a bit I didn't find any unexpected edge cases, i.e. it seems to do what it should and also feels correct. But given this code part is quite complicated I'm not as sure as I'd like to be. I'm a bit on the edge. While my approach is certainly a dirty hack in comparison to your fix, it should at least do much less harm if there were any additional edge cases we forgot about. Dunno, I just feel unsure about including it in the upcoming release without much exposure to testing. But given that it fixes a bug leading to data loss I guess we don't really have any choice here, do we? |
So, I beat my head against this some more and I'm pretty convinced that the simpler case is okay. So the case is that there's an item in the working directory but not in the checkout target. (Hence we're in the The old code (that we've been shipping for years) - the code with the faulty assumption - looks to see if there's an index entry beneath that directory. Eg, it looks to see if we have a workdir entry That's not cool. Instead, by simply pushing the directory itself onto the stack and iterating each entry, we will deal with the files one by one - whether they're in the index (and can be force removed) or not (and are precious). In particular, the edge case that I was concerned about was leaving empty directories around, but when we queue a delete, we'll remove empty directories that are left behind. So this was a bad optimization, assuming that we didn't need to |
Okay, so let's bite the dust and use your fix then. Shall I simply merge in your change into my commit and credit you in the commit message or do you want to create a commit for yourself? |
Cool - I opened #4260 with this change. |
Closing in favor of #4260. |
A failing test for #4125.
The issue is that we delete untracked files in directories which contain newly created tracked files. E.g. a directory
dir/
contains a newly added and stageddir/tracked
and an untrackeddir/untracked
.git reset --hard
will deletedir/tracked
only when doing agit reset --hard HEAD -- dir
and not touchdir/untracked
.The issue resised in
checkout_action_wd_only
. The function is responsible for handling items which are present only in the working directory and not in the current commit. The interesting call path is the following:checkout_action_wd_only
ondir/
dir/tracked
If this criteria matches, we decide that we can simply delete the complete directory (if
GIT_CHECKOUT_FORCE
is given). We obviously miss the step to see if there are any untracked files inside of the directory - in which case we cannot just delete it. This check is missing here.So I actually couldn't think of an efficient thing yet to fix the issue, so for now I'm just leaving the test and my observations here.