Skip to content

Commit 5856b70

Browse files
Assume strings are utf8 by default and fallback on encoding if necessary
Before this patch, we always assumed the string was ascii-8bit which does not work correctly with real utf8 strings (french accented chars or chineese chars). This patch tries to use the string as utf8 and fallback if necessary to the the ascii-8bit assumption if necessary. This should help with #295 and might replace the need for #301. Change-Id: Idac63fa10e5aefafa1eb99a6be4138cac5f90ea0
1 parent a223fcf commit 5856b70

File tree

1 file changed

+20
-5
lines changed

1 file changed

+20
-5
lines changed

lib/git/lib.rb

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -859,13 +859,28 @@ def meets_required_version?
859859

860860
def command_lines(cmd, opts = [], chdir = true, redirect = '')
861861
cmd_op = command(cmd, opts, chdir)
862-
op = cmd_op.encode("UTF-8", "binary", {
863-
:invalid => :replace,
864-
:undef => :replace
865-
})
866-
op.split("\n")
862+
split_utf8(cmd_op)
867863
end
868864

865+
def split_utf8(s)
866+
# ruby can think the string is utf8 but any action on the strings chars
867+
# will trigger errors. Here we try to split utf8 strings.
868+
# It it isn't utf8 or if it fails, we fallback to assume the input is
869+
# "binary" encoding
870+
result = begin
871+
s.encoding == Encoding::UTF_8 && s.split("\n")
872+
rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError
873+
nil
874+
end
875+
876+
result ||= s.encode("UTF-8", "binary", {
877+
:invalid => :replace,
878+
:undef => :replace,
879+
}).split("\n")
880+
result
881+
end
882+
883+
869884
# Takes the current git's system ENV variables and store them.
870885
def store_git_system_env_variables
871886
@git_system_env_variables = {}

0 commit comments

Comments
 (0)