Skip to content

Do not let unexpected command output encoding raise an exception #405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 11, 2019
Merged

Do not let unexpected command output encoding raise an exception #405

merged 1 commit into from
Dec 11, 2019

Conversation

jcouball
Copy link
Member

@jcouball jcouball commented Feb 28, 2019

Signed-off-by: James Couball jcouball@yahoo.com

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

  • Ensure all commits include DCO sign-off.
  • Ensure that your contributions pass unit testing.
  • Ensure that your contributions contain documentation if applicable.

Description

ruby-git relies on shelling out to the git command to interact with a git repository. When the git command has output that is not in the default Ruby encoding (usually UTF-8), ruby-git may raise an exception when the string containing the command output is used.

This change makes ruby-git check if the command output string contains any invalid code points and if it does, attempts to guess at the actual encoding of the command output on a line-by-line basis. Each line is converted from the actual encoding to Ruby's default encoding (usually UTF-8). Unknown code points are replaced by a special character.

This solution contrasts with PR #404 which simply tries to replace all invalid code points with a special character using String#scrub.

Background

git diff (and other git commands) can have output that has mixed encodings. Not only does git output its own text in the default system/shell encoding, but when text from a repository file is output, it is not in any particular encoding: git just passes the bytes from the file to the output. These files are stored in whatever encoding the user wrote them in.

In Ruby, popen and backtick collect the output from the command they run as bytes. They shove these bytes into the string returned. They make no attempt to guess the actual encoding of the string returned.

In later versions of Ruby (maybe 1.9+), the encoding of the string returned from popen or backtick is ALWAYS set to Encoding.default_external. This is usually ‘UTF-8’. Ruby uses the value of the command locale charmap for the initial value of Encoding.default_external. popen and backtick use this encoding even though the actual encoding of the command output is not known.

The bottom line is that the actual encoding of the string returned from popen or backtick is unknowable. Sometimes it is a happy coincidence that the actual encoding matches what is retuned by String#encoding. When Encoding.default_external is ‘UTF-8’ this coincidence happens more often than not.

I think the best that can be done is to sanitize the command output as it is read in Git::Lib#run_command in an attempt to “guard the gates”. That way the rest of the code in this gem can assume the command output has a valid encoding and avoid unexpected exceptions being raised.

What has changed?

  • git.gemspec: added a runtime dependency on rchardet to guess the encoding of command output in the case where the command output contains invalid code points.
  • lib/git/lib.rb: change Git::Lib#run_command to ensure that the string that collects command line output is converted to the String encoding used by Ruby and only has valid code points for that encoding.
  • lib/git/diff.rb: remove encoding handling since this is now handled in Git::Lib#run_command
  • tests/units/test_diff_non_default_encoding.rb: Added two tests: (1) diff changes to a file that has iso-8859-7-greek encoding and (2) diff changes to a file where the encoding was changed from euc-jp-japanese to euc-kr-korean
  • tests/files/encoding/**/*: a repository used in the tests above

@stale
Copy link

stale bot commented May 8, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label May 8, 2019
@benasher44
Copy link

Any chance of seeing this merged? Just ran into this, and we've resorted to forking the repo and keeping this branch up-to-date

@stale stale bot removed the wontfix label May 14, 2019
@agentk
Copy link

agentk commented May 22, 2019

Getting the same issue too. It's affecting Danger from being able to get the list of changed files when one of the changed files (correctly) contains invalid UTF-8.

@genepaul
Copy link

genepaul commented Jun 6, 2019

I would also like to see this fixed, as we are running into the same problem.

@kielgillard
Copy link

👍 for seeing this merged sooner rather than later.

@Kaspik
Copy link

Kaspik commented Aug 28, 2019

@tarcinil / @perlun ? Pretty please!

Copy link
Contributor

@perlun perlun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue: I don't like the fact of adding LGPL-licensed libraries as mandatory dependencies of ruby-git, which is in itself MIT licensed. It's not a huge problem since LGPL doesn't affect the whole dependency graph (as GPL would do), but I'm still not overly fond of it.

I guess there is no other (easy) way to handle the charset detection?

@Kaspik
Copy link

Kaspik commented Sep 11, 2019

So what can we do about it? Also, rchardet has some Snyk Vulnerability reported.

@perlun
Copy link
Contributor

perlun commented Sep 20, 2019

@Kaspik

So what can we do about it?

We can kindly ask @jcouball to see if there are other ways to achieve (reasonably) the same goal without the added rchardet dependency.

@jcouball
Copy link
Member Author

Hello @perlun and @Kaspik, sorry to take so long to respond.

I haven't had much luck finding a suitable replacement for rchardet. The choices are very few and not very well maintained.

Also, I haven't found a vulnerability reported for rchardet here:
https://snyk.io/vuln/search?q=rchardet&type=rubygems

Is there somewhere else I should check?

Assuming that there is no vulnerability, would you accept this PR as is? I could look into making the rchardet dependency optional... this might be good intermediate step to making this functionality default.

What do you think?

@perlun
Copy link
Contributor

perlun commented Nov 13, 2019

Sorry @jcouball, this time I'm the one who's being a bit slow here... 🙈

Assuming that there is no vulnerability, would you accept this PR as is? I could look into making the rchardet dependency optional... this might be good intermediate step to making this functionality default.

Maybe it's not the end of the world with this added dependency after all. If you rebase the MR, I'll approve it.

@Kaspik
Copy link

Kaspik commented Nov 20, 2019

@jcouball Hey! Are you planning to rebase this PR? :)

…ing does not raise an exception

Signed-off-by: James Couball <jcouball@yahoo.com>
@jcouball
Copy link
Member Author

Hello @perlun and @Kaspik, I have rebased this PR.

@Kaspik
Copy link

Kaspik commented Nov 25, 2019

Thanks! Looks good to me, now it's up to @perlun :)

@jcouball jcouball changed the title Don't let unexpected command output encoding raise an exception Do not let unexpected command output encoding raise an exception Nov 25, 2019
@jcouball
Copy link
Member Author

jcouball commented Dec 2, 2019

@perlun, can you please merge this?

Copy link
Contributor

@perlun perlun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort. 👍

@perlun
Copy link
Contributor

perlun commented Dec 7, 2019

@jcouball Approved, sorry for the unnecessary delay from my part. Please merge yourself so the commit history gets more correct.

@jcouball jcouball merged commit f5cd6a6 into ruby-git:master Dec 11, 2019
@jcouball
Copy link
Member Author

@perlun, what is the plan / process to create a new release?

jasnow pushed a commit to jasnow/ruby-git that referenced this pull request Dec 22, 2019
…ing does not raise an exception (ruby-git#405)

Signed-off-by: James Couball <jcouball@yahoo.com>
Signed-off-by: Al Snow <jasnow@hotmail.com>
@perlun
Copy link
Contributor

perlun commented Dec 25, 2019

@perlun, what is the plan / process to create a new release?

There is not much of a "plan" at the moment unfortunately (I'm not actually using the gem myself any more), but the process is roughly this, using the release[remote] Rake task below:

$ rake -T
rake build            # Build git-1.5.0.gem into the pkg directory
rake clean            # Remove any temporary products
rake clobber          # Remove any generated files
rake install          # Build and install git-1.5.0.gem into system gems
rake install:local    # Build and install git-1.5.0.gem into system gems without network access
rake release[remote]  # Create tag v1.5.0 and build and push git-1.5.0.gem to rubygems.org
rake test             # Run Unit Tests

Inspired by this guide: https://bundler.io/v1.12/guides/creating_gem.html#releasing-the-gem, I'd describe the steps like this:

  • Bump the version number in lib/git/version.rb - definitely use Semantic Versioning here. Are we aiming for a 1.6 or 2.0 release here, WDYT?
  • Commit this change. If you have git-extras installed, you can do git release v1.6.0 or similar.
  • Run the rake release[remote] task - this presumes you are set up as a Rubygems maintainer for this gem.
  • Create a GitHub release for the new tag and add the changelog. This is typically created using changelog-rs which makes the process reasonably simple, as long as the commit messages are good enough and all branches are squash-merged into master.

@jcouball, if we can settle on a version number, I can probably help out with the actual release in this case. For bonus points, you could document the steps above in a file in the repo so we remember it until next time. (which makes things simpler for everybody)

@majioa
Copy link

majioa commented Jan 21, 2020

Can't validate for now, let's release 1.6.0 then it will have seen there

AgoraSecurity pushed a commit to AgoraSecurity/ruby-git that referenced this pull request Feb 21, 2020
…ing does not raise an exception (ruby-git#405)

Signed-off-by: James Couball <jcouball@yahoo.com>Signed-off-by: Agora Security <github@agora-security.com>
neomilium added a commit to opus-codium/modulesync that referenced this pull request Dec 20, 2020
According to ruby-git/ruby-git#405, it seems to close the issue reported at ruby-git/ruby-git#326.

This have been merged then released in 1.7.0.

It finally removes the deprecation warning:
lib/monkey_patches.rb:13: warning: Using the last argument as keyword parameters is deprecated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants