-
Notifications
You must be signed in to change notification settings - Fork 533
Diff parsing fails when filenames contain unicode characters #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@cyclotron3k if you are still interested, can you see if this issue was fixed with the 1.6.0.pre1 version? I think this issue might be fixed by #405. |
Unfortunately it doesn't seem to have helped:
|
@cyclotron3k, over the weekend I was able to take a look at this and found the problem. It doesn't look like a problem with encoding, but a bug in how the diff output is parsed. I believe this patch will make it work:
However, there are a couple of things to figure out before a PR can be submitted: |
Take this with a pinch of salt (because I many not have applied the patch correctly), but I'm not sure if that regex alteration is going to work. I tried in a I tried with This may work a bit better: |
Taking a step back, I have a problem with the reproduction recipe that you gave in the original post. Maybe this is a clue to why we are seeing different output. I am on a Mac running MacOS 10.15.2, using zsh (also tried in bash). The
but the touch does not:
This works:
|
On my computer, I get different output depending on the value of
|
All that said, I think your proposed regex works for both cases (see the irb transcript below). I'll prepare a patch based on this regex.
|
Interesting. So your testing has revealed a few things... My test-case was a bit faulty. That But it didn't matter because the problem is not git escaping unicode characters, it's git automatically escaping filenames (and paths) that include pretty much any non-alphanumeric character, including those backslashes. I tried the |
My git version is: One of the things I have come to realize is that this gem is at the mercy of each individual user's git configuration. It would be nice to normalize the configuration used each time this gem calls the |
@cyclotron3k what is are your values for the LANG and LC_* environment variables when git is run? https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html Here is what I have:
|
Edit: I'm doing all my testing using the official |
I suspect that our testing of the effect of Do you see a difference when you run these commands:
vs.
|
No difference 🤷♂️
|
@cyclotron3k I created a PR to fix this issue in #440. Can you please checkout this code and run I suspect that, although the test works on my machine and on the Travis CI build, it might not work in your environment. If it does, then we are golden. |
When LC_* and LANG envars are set to UTF-8, all tests pass (albeit with a bunch of 2.7 related warnings), and when UTF-8 is not set, I get one failure:
|
I also tested in 2.6, and I got a failure the first time it ran (I think because user.email and user.email were not set). But then there were no failures on subsequent runs. I noticed the following had been set globally
|
A friendly reminder that this issue had no activity for 60 days. |
I believe this issue was addressed with #504 |
Uh oh!
There was an error while loading. Please reload this page.
Subject of the issue
Unicode characters in filenames breaks diffing
Your environment
Steps to reproduce
Expected behaviour
The diff to be produced as usual
Actual behaviour
NoMethodError: undefined method `[]' for nil:NilClass
The first line of a diff is usually something like
git --diff a/somefile b/somefile
, but when the filename contains a unicode character, the filenames are escaped, (e.g.diff --git "a/my_other_file_\\xE2\\x98\\xA0" "b/my_other_file_\\xE2\\x98\\xA0"
) and this regex fails to detect the line.The text was updated successfully, but these errors were encountered: