Skip to content

Conversation

pks-t
Copy link
Member

@pks-t pks-t commented Feb 17, 2017

A failing test for #4125.

The issue is that we delete untracked files in directories which contain newly created tracked files. E.g. a directory dir/ contains a newly added and staged dir/tracked and an untracked dir/untracked. git reset --hard will delete dir/tracked only when doing a git reset --hard HEAD -- dir and not touch dir/untracked.

The issue resised in checkout_action_wd_only. The function is responsible for handling items which are present only in the working directory and not in the current commit. The interesting call path is the following:

  1. call checkout_action_wd_only on dir/
  2. is it tracked? -> yes
  3. is it a tree? -> yes
  4. is the index entry following the tree tracked as well? -> yes, we have dir/tracked

If this criteria matches, we decide that we can simply delete the complete directory (if GIT_CHECKOUT_FORCE is given). We obviously miss the step to see if there are any untracked files inside of the directory - in which case we cannot just delete it. This check is missing here.

So I actually couldn't think of an efficient thing yet to fix the issue, so for now I'm just leaving the test and my observations here.

@ethomson
Copy link
Member

Thanks, @pks-t for the failing test.

I started banging on some of checkout's deficiencies in #3839, though I did not discover this case. :/

I mention that PR only in case this is something you're interested in pursuing - it would be good to know that there are definitely more problems that we should be thinking about. (Though you should not feel compelled to take any of that code if you do pursue this.)

Otherwise, I will try to take a look at this problem this weekend. I'm pretty familiar with the checkout code by this point, so I'm happy to do it.

@pks-t
Copy link
Member Author

pks-t commented Mar 14, 2017

I'd really love something like cl_expect_failure that actually says "Okay, we know this test currently fails, but we want to record that and be notified as soon as it stops failing". Guess I'll whip something up for clar

@pks-t pks-t force-pushed the pks/forced-checkout branch from b524172 to ba98327 Compare April 10, 2017 11:41
@pks-t
Copy link
Member Author

pks-t commented Apr 10, 2017

I finally accumulated enough bravery to dive into that part of code again and fixed the issue. We now first determine whether the directory contains any unversioned files and, if it does, advance into it instead of immediately deleting it. I guess code code be prettier, but I'm not too familiar with checkout/index logic.

@pks-t pks-t mentioned this pull request May 2, 2017
8 tasks
@ethomson
Copy link
Member

ethomson commented May 5, 2017

I've spent a bit of time studying this - I'm going to need to dig into the checkout code to get familiar with it again.

@pks-t
Copy link
Member Author

pks-t commented May 5, 2017

Yeah. I've been cursing quite a bit investigating the issue and the surrounding code. It's quite hard to grasp.

@ethomson
Copy link
Member

ethomson commented Jun 4, 2017

So, after looking at this a bit, I think that it would be nice to keep the processing within a single loop and not do a second processing loop in contains_unversioned. When there's an unversioned item in the directory, we'll end up doing the advance_into.

Does it seem like it would be easier to simply always advance_into?

The following patch does that, and all our unit tests pass, but I do worry a bit that there are corner cases that I'm missing.

diff --git a/src/checkout.c b/src/checkout.c
index 06cf65c43..25018d291 100644
--- a/src/checkout.c
+++ b/src/checkout.c
@@ -331,44 +331,6 @@ static int checkout_queue_remove(checkout_data *data, const char *path)
    return git_vector_insert(&data->removes, copy);
 }
 
-static int contains_unversioned(int *unversioned, git_index *index, const char *dir)
-{
-   git_vector entries = GIT_VECTOR_INIT;
-   git_buf path = GIT_BUF_INIT;
-   const char *workdir, *entry;
-   int error = 0;
-   size_t i;
-
-   if ((workdir = git_repository_workdir(index->rc.owner)) == NULL) {
-       error = -1;
-       goto out;
-   }
-
-   if ((error = git_buf_joinpath(&path, workdir, dir)) < 0)
-       goto out;
-
-   if ((error = git_path_dirload(&entries, path.ptr, strlen(workdir), 0)) < 0)
-       goto out;
-
-   *unversioned = 0;
-
-   git_vector_foreach(&entries, i, entry) {
-       const git_index_entry *ientry =
-           git_index_get_bypath(index, entry, 0);
-
-       if (ientry == NULL) {
-           *unversioned = 1;
-           break;
-       }
-   }
-
-out:
-   git_vector_free_deep(&entries);
-   git_buf_free(&path);
-
-   return error;
-}
-
 /* note that this advances the iterator over the wd item */
 static int checkout_action_wd_only(
    checkout_data *data,
@@ -408,18 +370,8 @@ static int checkout_action_wd_only(
             */
            const git_index_entry *e = git_index_get_byindex(data->index, pos);
 
-           if (e != NULL && data->diff->pfxcomp(e->path, wd->path) == 0) {
-               int unversioned;
-               if ((error = !contains_unversioned(
-                       &unversioned, data->index, wd->path)) < 0)
-                   return error;
-
-               if (unversioned)
-                   return git_iterator_advance_into(wditem, workdir);
-
-               notify = GIT_CHECKOUT_NOTIFY_DIRTY;
-               remove = ((data->strategy & GIT_CHECKOUT_FORCE) != 0);
-           }
+           if (e != NULL && data->diff->pfxcomp(e->path, wd->path) == 0)
+               return git_iterator_advance_into(wditem, workdir);
        }
    }

@pks-t pks-t force-pushed the pks/forced-checkout branch from ba98327 to f2d5ded Compare June 6, 2017 08:10
If the `GIT_CHECKOUT_FORCE` flag is given to any of the `git_checkout`
invocations, we remove files which were previously staged. But while
doing so, we unfortunately also remove unstaged files in a directory
which contains at least one staged file, resulting in potential data
loss.

The issue resides in our logic which tries to determine which action to
excute for a tree entry. We currently assume that a directory is safe to
be deleted as soon as it contains at least one versioned entry - which
is ovbiously wrong. Instead, we first have to determine whether the
directory contains any unversioned files and, if so, advance into the
directory itself. As such, the directory will not be deleted, but all
versioned entries inside of the directory will be reset correctly.

This commit also adds two tests to verify behavior.
@pks-t
Copy link
Member Author

pks-t commented Jun 6, 2017

This is certainly a much saner fix than my additional loop here. Playing around with it a bit I didn't find any unexpected edge cases, i.e. it seems to do what it should and also feels correct. But given this code part is quite complicated I'm not as sure as I'd like to be.

I'm a bit on the edge. While my approach is certainly a dirty hack in comparison to your fix, it should at least do much less harm if there were any additional edge cases we forgot about. Dunno, I just feel unsure about including it in the upcoming release without much exposure to testing. But given that it fixes a bug leading to data loss I guess we don't really have any choice here, do we?

@ethomson
Copy link
Member

ethomson commented Jun 8, 2017

So, I beat my head against this some more and I'm pretty convinced that the simpler case is okay.

So the case is that there's an item in the working directory but not in the checkout target. (Hence we're in the checkout_action_wd_only.) And the workdir item is a directory, not a file (wd->mode == GIT_FILEMODE_TREE).

The old code (that we've been shipping for years) - the code with the faulty assumption - looks to see if there's an index entry beneath that directory. Eg, it looks to see if we have a workdir entry foo and an index entry foo/bar.txt. If this is not the case, then the working directory must have precious files in that directory. This part is okay. The part that's not okay is if there is an index entry foo/bar.txt. It just blows away the whole damned directory.

That's not cool.

Instead, by simply pushing the directory itself onto the stack and iterating each entry, we will deal with the files one by one - whether they're in the index (and can be force removed) or not (and are precious).

In particular, the edge case that I was concerned about was leaving empty directories around, but when we queue a delete, we'll remove empty directories that are left behind.

So this was a bad optimization, assuming that we didn't need to git_iterator_advance_into if there was any index entry in the folder. That's wrong - we could have optimized this iff all folder entries are in the index. But that's not what this test does, and AFAIK, there's no better or easier way to deal with this than just walking them.

@pks-t
Copy link
Member Author

pks-t commented Jun 8, 2017

Okay, so let's bite the dust and use your fix then. Shall I simply merge in your change into my commit and credit you in the commit message or do you want to create a commit for yourself?

@ethomson
Copy link
Member

ethomson commented Jun 8, 2017

Cool - I opened #4260 with this change.

@pks-t
Copy link
Member Author

pks-t commented Jun 9, 2017

Closing in favor of #4260.

@pks-t pks-t closed this Jun 9, 2017
@pks-t pks-t deleted the pks/forced-checkout branch June 12, 2017 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants