Skip to content

bpo-40049: Check if symlink exists when extracting from tarfile #19187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

jonnyhsu
Copy link

@jonnyhsu jonnyhsu commented Mar 27, 2020

When extracting a tarfile, os.symlink() will raise an exception if the symlink already exists. This will cause the entire tarfile to be scanned for the destination files, thinking that the platform does not support symlinks. On a normal file this goes unnoticed, but when processing stream data it will raise a StreamError because it needs to seek backwards to resume extraction where it left off.

https://bugs.python.org/issue40049

@jonnyhsu jonnyhsu requested a review from ethanfurman as a code owner March 27, 2020 01:35
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@jonnyhsu

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@jonnyhsu
Copy link
Author

jonnyhsu commented Jun 4, 2020

Is there anyone available to review this?

Copy link
Contributor

@ZackerySpytz ZackerySpytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this should have a unit test.

@jonnyhsu
Copy link
Author

jonnyhsu commented Jun 6, 2020

@ZackerySpytz thanks for taking a look. I've added a unit test and confirmed that it fails on master and passes on this branch with both Windows 10 and Ubuntu.

@taleinat
Copy link
Contributor

Closing and re-opening to restart the Travis-CI check.

@taleinat taleinat closed this Sep 18, 2020
@taleinat taleinat reopened this Sep 18, 2020
@taleinat
Copy link
Contributor

taleinat commented Sep 18, 2020

GNU tar (v1.30, Ubuntu 20.04) does indeed overwrite files with symlinks upon extracting, while both ln -s and os.symlink do not. Assuming this is common behavior for tar, I agree that the appropriate behavior would be to overwrite as suggested here.

Copy link
Contributor

@taleinat taleinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but it seems to me that the root issue here is not the backwards seek, but the simply incorrect behavior of not overwriting existing files with symlinks.

If you agree, @jonnyhsu, please change the reasoning in the test comment and the NEWS entry accordingly, and I'd be happy to merge this.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@@ -2232,6 +2232,8 @@ def makelink(self, tarinfo, targetpath):
try:
# For systems that support symbolic and hard links.
if tarinfo.issym():
if os.path.lexists(targetpath):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include a short comment here explaining why this is needed? Something along the lines of "tar should overwrite existing files with symlinks, but os.symlink raises an exception rather than overwriting."

@taleinat
Copy link
Contributor

Thanks for the PR, @jonnyhsu, but I'm closing this in favor of PR GH-21409, which restores the original fix for this issue which was lost in a bad merge.

@taleinat taleinat closed this Sep 19, 2020
@jonnyhsu
Copy link
Author

Thanks for the PR, @jonnyhsu, but I'm closing this in favor of PR GH-21409, which restores the original fix for this issue which was lost in a bad merge.

Got it, thanks for sorting it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants