-
Notifications
You must be signed in to change notification settings - Fork 533
Do not let unexpected command output encoding raise an exception #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Any chance of seeing this merged? Just ran into this, and we've resorted to forking the repo and keeping this branch up-to-date |
Getting the same issue too. It's affecting Danger from being able to get the list of changed files when one of the changed files (correctly) contains invalid UTF-8. |
I would also like to see this fixed, as we are running into the same problem. |
👍 for seeing this merged sooner rather than later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue: I don't like the fact of adding LGPL-licensed libraries as mandatory dependencies of ruby-git
, which is in itself MIT licensed. It's not a huge problem since LGPL doesn't affect the whole dependency graph (as GPL would do), but I'm still not overly fond of it.
I guess there is no other (easy) way to handle the charset detection?
So what can we do about it? Also, |
Hello @perlun and @Kaspik, sorry to take so long to respond. I haven't had much luck finding a suitable replacement for Also, I haven't found a vulnerability reported for Is there somewhere else I should check? Assuming that there is no vulnerability, would you accept this PR as is? I could look into making the What do you think? |
Sorry @jcouball, this time I'm the one who's being a bit slow here... 🙈
Maybe it's not the end of the world with this added dependency after all. If you rebase the MR, I'll approve it. |
@jcouball Hey! Are you planning to rebase this PR? :) |
…ing does not raise an exception Signed-off-by: James Couball <jcouball@yahoo.com>
Thanks! Looks good to me, now it's up to @perlun :) |
@perlun, can you please merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort. 👍
@jcouball Approved, sorry for the unnecessary delay from my part. Please merge yourself so the commit history gets more correct. |
@perlun, what is the plan / process to create a new release? |
…ing does not raise an exception (ruby-git#405) Signed-off-by: James Couball <jcouball@yahoo.com> Signed-off-by: Al Snow <jasnow@hotmail.com>
There is not much of a "plan" at the moment unfortunately (I'm not actually using the gem myself any more), but the process is roughly this, using the $ rake -T
rake build # Build git-1.5.0.gem into the pkg directory
rake clean # Remove any temporary products
rake clobber # Remove any generated files
rake install # Build and install git-1.5.0.gem into system gems
rake install:local # Build and install git-1.5.0.gem into system gems without network access
rake release[remote] # Create tag v1.5.0 and build and push git-1.5.0.gem to rubygems.org
rake test # Run Unit Tests Inspired by this guide: https://bundler.io/v1.12/guides/creating_gem.html#releasing-the-gem, I'd describe the steps like this:
@jcouball, if we can settle on a version number, I can probably help out with the actual release in this case. For bonus points, you could document the steps above in a file in the repo so we remember it until next time. (which makes things simpler for everybody) |
Can't validate for now, let's release 1.6.0 then it will have seen there |
…ing does not raise an exception (ruby-git#405) Signed-off-by: James Couball <jcouball@yahoo.com>Signed-off-by: Agora Security <github@agora-security.com>
According to ruby-git/ruby-git#405, it seems to close the issue reported at ruby-git/ruby-git#326. This have been merged then released in 1.7.0. It finally removes the deprecation warning: lib/monkey_patches.rb:13: warning: Using the last argument as keyword parameters is deprecated
Signed-off-by: James Couball jcouball@yahoo.com
Your checklist for this pull request
🚨Please review the guidelines for contributing to this repository.
Description
ruby-git relies on shelling out to the
git
command to interact with agit
repository. When thegit
command has output that is not in the default Ruby encoding (usually UTF-8),ruby-git
may raise an exception when the string containing the command output is used.This change makes
ruby-git
check if the command output string contains any invalid code points and if it does, attempts to guess at the actual encoding of the command output on a line-by-line basis. Each line is converted from the actual encoding to Ruby's default encoding (usually UTF-8). Unknown code points are replaced by a special character.This solution contrasts with PR #404 which simply tries to replace all invalid code points with a special character using
String#scrub
.Background
git diff
(and other git commands) can have output that has mixed encodings. Not only doesgit
output its own text in the default system/shell encoding, but when text from a repository file is output, it is not in any particular encoding:git
just passes the bytes from the file to the output. These files are stored in whatever encoding the user wrote them in.In Ruby,
popen
andbacktick
collect the output from the command they run as bytes. They shove these bytes into the string returned. They make no attempt to guess the actual encoding of the string returned.In later versions of Ruby (maybe 1.9+), the encoding of the string returned from
popen
orbacktick
is ALWAYS set toEncoding.default_external
. This is usually ‘UTF-8’. Ruby uses the value of the commandlocale charmap
for the initial value ofEncoding.default_external
.popen
andbacktick
use this encoding even though the actual encoding of the command output is not known.The bottom line is that the actual encoding of the string returned from
popen
orbacktick
is unknowable. Sometimes it is a happy coincidence that the actual encoding matches what is retuned byString#encoding
. WhenEncoding.default_external
is ‘UTF-8’ this coincidence happens more often than not.I think the best that can be done is to sanitize the command output as it is read in
Git::Lib#run_command
in an attempt to “guard the gates”. That way the rest of the code in this gem can assume the command output has a valid encoding and avoid unexpected exceptions being raised.What has changed?
git.gemspec
: added a runtime dependency on rchardet to guess the encoding of command output in the case where the command output contains invalid code points.lib/git/lib.rb
: changeGit::Lib#run_command
to ensure that the string that collects command line output is converted to the String encoding used by Ruby and only has valid code points for that encoding.lib/git/diff.rb
: remove encoding handling since this is now handled inGit::Lib#run_command
tests/units/test_diff_non_default_encoding.rb
: Added two tests: (1) diff changes to a file that has iso-8859-7-greek encoding and (2) diff changes to a file where the encoding was changed from euc-jp-japanese to euc-kr-koreantests/files/encoding/**/*
: a repository used in the tests above