Skip to content

Unicode path woes (on Windows?) #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SonOfLilit opened this issue Mar 30, 2014 · 5 comments
Closed

Unicode path woes (on Windows?) #147

SonOfLilit opened this issue Mar 30, 2014 · 5 comments

Comments

@SonOfLilit
Copy link

@with_rw_repo('0.1.6')
def test_add_unicode(self, rw_repo):
    filename = u"שלום.txt"

    file_path = os.path.join(rw_repo.working_dir, filename)
    open(file_path, "wb").write('something')

    rw_repo.index.add([filename])
    self.assert_entries([filename])

I get a UnicodeEncodeError in IndexFileSHA1Writer.write, and if I pass filename.encode('utf8') then it simply cannot find the file. I guess I could figure out what the current locale encoding is and pass the correct thing, but... thats a terrible solution, and only solves the problem when the local encoding is able to encode the filename. Also, I'm sure it would create different problems.

@with_rw_repo('0.1.6')
def test_add_unicode(self, rw_repo):
    filename = u"שלום.txt"

    file_path = os.path.join(rw_repo.working_dir, filename)
    open(file_path, "wb").write('something')

    rw_repo.git.add(rw_repo.working_dir)
    rw_repo.index.commit('message')

Gives something like:

File "e:\projects\galago\gitimporter\tests.py", line 50, in add_files_to_repo
repo.index.commit(message)
File "c:\Users\User.virtualenvs\galago\lib\site-packages\gitpython-0.3.2.rc1-py2.7.egg\git\index\base.py", line 887, in commit
return Commit.create_from_tree(self.repo, tree, message, parent_commits, head)
File "c:\Users\User.virtualenvs\galago\lib\site-packages\gitpython-0.3.2.rc1-py2.7.egg\git\objects\commit.py", line 358, in create_from_tree
master = git.refs.Head.create(repo, repo.head.ref, new_commit, logmsg="commit (initial): %s" % message)
File "c:\Users\User.virtualenvs\galago\lib\site-packages\gitpython-0.3.2.rc1-py2.7.egg\git\refs\symbolic.py", line 505, in create
return cls._create(repo, path, cls._resolve_ref_on_create, reference, force, logmsg)
File "c:\Users\User.virtualenvs\galago\lib\site-packages\gitpython-0.3.2.rc1-py2.7.egg\git\refs\symbolic.py", line 472, in _create
ref.set_reference(target, logmsg)
File "c:\Users\User.virtualenvs\galago\lib\site-packages\gitpython-0.3.2.rc1-py2.7.egg\git\refs\symbolic.py", line 307, in set_reference
self.log_append(oldbinsha, logmsg)
File "c:\Users\User.virtualenvs\galago\lib\site-packages\gitpython-0.3.2.rc1-py2.7.egg\git\refs\symbolic.py", line 360, in log_append
message)
File "c:\Users\User.virtualenvs\galago\lib\site-packages\gitpython-0.3.2.rc1-py2.7.egg\git\refs\log.py", line 257, in append_entry
fd.write(repr(entry))

Windows 7, Python 2.7, GitPython 3.2.1.rc1

@Byron Byron added this to the v0.3.4 - python 3 support milestone Nov 19, 2014
@Byron
Copy link
Member

Byron commented Nov 19, 2014

I will have a look at this one when porting to py3, as it will require proper string/bytes handling.

@Byron
Copy link
Member

Byron commented Jan 5, 2015

On OSX, I couldn't reproduce the issue.
At least the test is now part of the official test suite - will recheck on windows another time.
Something that comes to my mind is that windows might be using yet another encoding, and git-python tries to interpret it as UTF8, maybe sys.getfilesystemencoding() should be used to do the right thing.

Byron added a commit that referenced this issue Jan 6, 2015
Applied a few more fixes to commit implementation, possibly not the last
@Byron
Copy link
Member

Byron commented Jan 6, 2015

I am able to reproduce this issue on windows, here is what I get:

Traceback (most recent call last):
  File "Q:\bdep-oss\lib\git-python\0.3\noarch\git\test\lib\helper.py", line 114, in repo_creator
    return func(self, rw_repo)
  File "Q:\bdep-oss\lib\git-python\0.3\noarch\git\test\test_base.py", line 121, in test_add_unicode
    rw_repo.git.add(rw_repo.working_dir)
  File "Q:\bdep-oss\lib\git-python\0.3\noarch\git\cmd.py", line 251, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "Q:\bdep-oss\lib\git-python\0.3\noarch\git\cmd.py", line 534, in _call_process
    return self.execute(make_call(), **_kwargs)
  File "Q:\bdep-oss\lib\git-python\0.3\noarch\git\cmd.py", line 414, in execute
    raise GitCommandError(command, status, stderr_value)
GitCommandError: 'git add c:\users\byron\appdata\local\temp\tmpybsohnnon_bare_test_add_unicode' returned with exit code 128
stderr: 'fatal: unable to stat '????.txt': No such file or directory'

Apparently the encoding goes all crazy, lets see how the fix looks like.

@Byron
Copy link
Member

Byron commented Jan 6, 2015

screen shot 2015-01-06 at 16 41 01

This is how it looks like, if I try to look at the file through a git bash. The file, however, looks good in the explorer (see bottom most file in following image)
screen shot 2015-01-06 at 16 42 18

Byron added a commit that referenced this issue Jan 6, 2015
Also added code to show how to deal with #147
@Byron
Copy link
Member

Byron commented Jan 6, 2015

Please have a look at this altered test case for how to workaround the issue.

Interestingly, python starts to properly deal with unicode in filenames in python 3 - please have a look at the image showing how 2.7 fails, and 3.4 succeeds in the same task.
screen shot 2015-01-06 at 17 07 35

This concludes this issue, as I believe python 3 should be used if unicode is a concern, and git-pythons own facilities should be used to add unicode paths as the git comamndline tool seems to have issues with it.

@Byron Byron closed this as completed Jan 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants