Skip to content

gh-112346: Bugfix: Remove faster codepath from gzip.compress as it introduces behavioral inconsistencies #114116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions Doc/library/gzip.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,17 +188,15 @@ The module defines the following items:

Compress the *data*, returning a :class:`bytes` object containing
the compressed data. *compresslevel* and *mtime* have the same meaning as in
the :class:`GzipFile` constructor above. When *mtime* is set to ``0``, this
function is equivalent to :func:`zlib.compress` with *wbits* set to ``31``.
The zlib function is faster.
the :class:`GzipFile` constructor above.
:func:`zlib.compress` with *wbits* set to ``31`` is faster.

.. versionadded:: 3.2
.. versionchanged:: 3.8
Added the *mtime* parameter for reproducible output.
.. versionchanged:: 3.11
Speed is improved by compressing all data at once instead of in a
streamed fashion. Calls with *mtime* set to ``0`` are delegated to
:func:`zlib.compress` for better speed.
streamed fashion.

.. function:: decompress(data)

Expand Down
4 changes: 0 additions & 4 deletions Lib/gzip.py
Original file line number Diff line number Diff line change
Expand Up @@ -608,10 +608,6 @@ def compress(data, compresslevel=_COMPRESS_LEVEL_BEST, *, mtime=None):
mtime can be used to set the modification time. The modification time is
set to the current time by default.
"""
if mtime == 0:
# Use zlib as it creates the header with 0 mtime by default.
# This is faster and with less overhead.
return zlib.compress(data, level=compresslevel, wbits=31)
header = _create_simple_gzip_header(compresslevel, mtime)
trailer = struct.pack("<LL", zlib.crc32(data), (len(data) & 0xffffffff))
# Wbits=-15 creates a raw deflate block.
Expand Down
15 changes: 13 additions & 2 deletions Lib/test/test_gzip.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,14 +714,25 @@ def test_compress_mtime(self):
self.assertEqual(f.mtime, mtime)

def test_compress_correct_level(self):
# gzip.compress calls with mtime == 0 take a different code path.
for mtime in (0, 42):
with self.subTest(mtime=mtime):
nocompress = gzip.compress(data1, compresslevel=0, mtime=mtime)
yescompress = gzip.compress(data1, compresslevel=1, mtime=mtime)
yescompress = gzip.compress(data1, compresslevel=1,
mtime=mtime)
self.assertIn(data1, nocompress)
self.assertNotIn(data1, yescompress)

def test_issue112346(self):
# The OS byte should be 255, this should not change between Python versions.
for mtime in (0, 42):
with self.subTest(mtime=mtime):
compress = gzip.compress(data1, compresslevel=1, mtime=mtime)
self.assertEqual(
struct.unpack("<IxB", compress[4:10]),
(mtime, 255),
"Gzip header does not properly set either mtime or OS byte."
)

def test_decompress(self):
for data in (data1, data2):
buf = io.BytesIO()
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The OS byte in gzip headers is now always set to 255 when using
gzip.compress.