Skip to content

Diff parsing fails when filenames contain unicode characters #418

Closed
@cyclotron3k

Description

@cyclotron3k

Subject of the issue

Unicode characters in filenames breaks diffing

Your environment

  • git version 2.20.1
  • git (1.5.0)
  • ruby 2.6.4p104 (2019-08-28 revision 67798) [x86_64-linux]

Steps to reproduce

$ mkdir -p /tmp/test && cd /tmp/test
$ git init
$ printf '\xE2\x98\xA0' # print a skull and crossbones symbol
$ touch 'some_file'
$ git add .
$ git commit -m"init"
$ touch 'my_other_file_\xE2\x98\xA0'
$ git add .
$ git commit -m"commit"
git = Git.init '/tmp/test'
git.diff('@^').each { |x| ... }

Expected behaviour

The diff to be produced as usual

Actual behaviour

NoMethodError: undefined method `[]' for nil:NilClass

The first line of a diff is usually something like git --diff a/somefile b/somefile, but when the filename contains a unicode character, the filenames are escaped, (e.g. diff --git "a/my_other_file_\\xE2\\x98\\xA0" "b/my_other_file_\\xE2\\x98\\xA0") and this regex fails to detect the line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions