-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-117779: Fix reading duplicated entries in zipfile by name #129254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-117779: Fix reading duplicated entries in zipfile by name #129254
Conversation
serhiy-storchaka
commented
Jan 24, 2025
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: inconsistent handling of duplicate ZipFile entries #117779
I wonder how far should we backport this change. #110016 was a security fix and was backported to 3.8. |
c683cf7
to
b8a9cea
Compare
I didn't expect it either. And I suspect the creation of those duplicate entries to be a bug, but they do clearly exist in the wild.
I'm slightly unsure about this fix though. For one, it breaks the workaround we added to IMO it would be admittedly more complex but also more consistent -- and avoid any errors processing these ZIP files -- to explicitly handle entries with duplicate offsets: letting all of them have the same (edited for clarity) |
Sorry for the delay in responding. I experimented with different solutions and wrote new tests. To me, this is a ZIP bomb. Even if it does not allow to fill the disk space by decompressing, it still spend CPU time for uncompressing the same data. It may consume a lot of memory if the decompressed entries are store in a list instead of a dict. But since ZIP archives with duplicated entries encountered in practice, created not by malicious, but just bad software, perhaps we could lower the severity level from error to warning. Warnings can be converted to exceptions, if you want to make your program safer. @gpshead, what is your opinion on this? |
I like the warning idea. |
A warning seems reasonable, yeah. But I dislike the "except the last one" edge case where it still breaks, surely that can be avoided? |
IMO a warning indeed makes perfect sense but I think entries with the same |
Warnings or exceptions are emitted for some abnormal ZIP files from ancient times, but it is not possible to detect all anomalies, and even if it is possible, it is not always practical. For example, in a "quote-overlap" zip bomb, the most nested file can be read without error. It is not a large issue if one of entries with the same offset can be read without error or warning. For consistency with how non-overlapped files with the same name are handled, it should be the last entry. |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
…ythonGH-129254) (cherry picked from commit 0f04f24) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ythonGH-129254) (cherry picked from commit 0f04f24) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-132263 is a backport of this pull request to the 3.13 branch. |
GH-132264 is a backport of this pull request to the 3.12 branch. |