-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-51067: Add remove() in ZipInfo #19358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
import threading | ||
import time | ||
import contextlib | ||
from operator import attrgetter | ||
|
||
try: | ||
import zlib # We may need its compression method | ||
|
@@ -1632,6 +1633,29 @@ def extractall(self, path=None, members=None, pwd=None): | |
|
||
for zipinfo in members: | ||
self._extract_member(zipinfo, path, pwd) | ||
|
||
def remove(self, member): | ||
"""Remove a file from the archive. The archive must be open with mode 'a'""" | ||
|
||
if self.mode != 'a': | ||
raise RuntimeError("remove() requires mode 'a'") | ||
if not self.fp: | ||
raise ValueError( | ||
"Attempt to write to ZIP archive that was already closed") | ||
if self._writing: | ||
raise ValueError( | ||
"Can't write to ZIP archive while an open writing handle exists." | ||
) | ||
|
||
# Make sure we have an info object | ||
if isinstance(member, ZipInfo): | ||
# 'member' is already an info object | ||
zinfo = member | ||
else: | ||
# get the info object | ||
zinfo = self.getinfo(member) | ||
|
||
return self._remove_member(zinfo) | ||
|
||
@classmethod | ||
def _sanitize_windows_name(cls, arcname, pathsep): | ||
|
@@ -1689,6 +1713,52 @@ def _extract_member(self, member, targetpath, pwd): | |
shutil.copyfileobj(source, target) | ||
|
||
return targetpath | ||
|
||
def _remove_member(self, member): | ||
# get a sorted filelist by header offset, in case the dir order | ||
# doesn't match the actual entry order | ||
fp = self.fp | ||
entry_offset = 0 | ||
filelist = sorted(self.filelist, key=attrgetter('header_offset')) | ||
for i in range(len(filelist)): | ||
info = filelist[i] | ||
# find the target member | ||
if info.header_offset < member.header_offset: | ||
continue | ||
|
||
# get the total size of the entry | ||
entry_size = None | ||
if i == len(filelist) - 1: | ||
entry_size = self.start_dir - info.header_offset | ||
else: | ||
entry_size = filelist[i + 1].header_offset - info.header_offset | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there anything in zipfile spec which says the header offsets have to be contiguous? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't matter anyway. The implementation moves all bytes between the start of a section and the start of the next section. If there were to be random bytes between the end of a section and the start of the next section, they will be moved too. |
||
|
||
# found the member, set the entry offset | ||
if member == info: | ||
entry_offset = entry_size | ||
continue | ||
|
||
# Move entry | ||
# read the actual entry data | ||
fp.seek(info.header_offset) | ||
entry_data = fp.read(entry_size) | ||
|
||
# update the header | ||
info.header_offset -= entry_offset | ||
|
||
# write the entry to the new position | ||
fp.seek(info.header_offset) | ||
fp.write(entry_data) | ||
fp.flush() | ||
|
||
# update state | ||
self.start_dir -= entry_offset | ||
self.filelist.remove(member) | ||
del self.NameToInfo[member.filename] | ||
self._didModify = True | ||
|
||
# seek to the start of the central dir | ||
fp.seek(self.start_dir) | ||
|
||
def _writecheck(self, zinfo): | ||
"""Check for errors before writing a file to the archive.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe. The
ZipInfo
instances inself.filelist
will not compare equal with this zinfo even if the filename is matching - they compare by identity.Additionally, if
remove
is able to accept aZipInfo
instance, then there needs to be handling of the possibility that the instance is not present in the zipfile at all.I recommend to document that member is a string, and then use
self.getinfo(member)
, avoiding these issues.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing by identity when passing a zinfo is a feature and is consistent to the behavior of
writestr
etc. I don't think this need to change.It actually is a problem if the zinfo/filename does not exist in the zipfile at all, my revision has added a check to prevent that.