Skip to content

gh-135252: Document Zstandard integration across zipfile, shutil, and tarfile #135311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions Doc/library/compression.zstd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -523,8 +523,14 @@ Advanced parameter control
.. attribute:: compression_level

A high-level means of setting other compression parameters that affect
the speed and ratio of compressing data. Setting the level to zero uses
:attr:`COMPRESSION_LEVEL_DEFAULT`.
the speed and ratio of compressing data.

Regular compression levels are greater than ``0``. Values greater than
``20`` are considered "ultra" compression and require more memory than
other levels. Negative values can be used to trade off faster compression
for worse compression ratios.

Setting the level to zero uses :attr:`COMPRESSION_LEVEL_DEFAULT`.

.. attribute:: window_log

Expand Down
11 changes: 8 additions & 3 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -607,7 +607,8 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
*format* is the archive format: one of
"zip" (if the :mod:`zlib` module is available), "tar", "gztar" (if the
:mod:`zlib` module is available), "bztar" (if the :mod:`bz2` module is
available), or "xztar" (if the :mod:`lzma` module is available).
available), "xztar" (if the :mod:`lzma` module is available), or "zstdtar"
(if the :mod:`compression.zstd` module is available).

*root_dir* is a directory that will be the root directory of the
archive, all paths in the archive will be relative to it; for example,
Expand Down Expand Up @@ -662,6 +663,8 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
- *gztar*: gzip'ed tar-file (if the :mod:`zlib` module is available).
- *bztar*: bzip2'ed tar-file (if the :mod:`bz2` module is available).
- *xztar*: xz'ed tar-file (if the :mod:`lzma` module is available).
- *zstdtar*: Zstandard compressed tar-file (if the :mod:`compression.zstd`
module is available).

You can register new formats or provide your own archiver for any existing
formats, by using :func:`register_archive_format`.
Expand Down Expand Up @@ -705,8 +708,8 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
*extract_dir* is the name of the target directory where the archive is
unpacked. If not provided, the current working directory is used.

*format* is the archive format: one of "zip", "tar", "gztar", "bztar", or
"xztar". Or any other format registered with
*format* is the archive format: one of "zip", "tar", "gztar", "bztar",
"xztar", or "zstdtar". Or any other format registered with
:func:`register_unpack_format`. If not provided, :func:`unpack_archive`
will use the archive file name extension and see if an unpacker was
registered for that extension. In case none is found,
Expand Down Expand Up @@ -778,6 +781,8 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
- *gztar*: gzip'ed tar-file (if the :mod:`zlib` module is available).
- *bztar*: bzip2'ed tar-file (if the :mod:`bz2` module is available).
- *xztar*: xz'ed tar-file (if the :mod:`lzma` module is available).
- *zstdtar*: Zstandard compressed tar-file (if the :mod:`compression.zstd`
module is available).

You can register new formats or provide your own unpacker for any existing
formats, by using :func:`register_unpack_format`.
Expand Down
31 changes: 29 additions & 2 deletions Doc/library/tarfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ higher-level functions in :ref:`shutil <archiving-operations>`.

Some facts and figures:

* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
if the respective modules are available.
* reads and writes :mod:`gzip`, :mod:`bz2`, :mod:`compression.zstd`, and
:mod:`lzma` compressed archives if the respective modules are available.

* read/write support for the POSIX.1-1988 (ustar) format.

Expand Down Expand Up @@ -47,6 +47,10 @@ Some facts and figures:
or paths outside of the destination. Previously, the filter strategy
was equivalent to :func:`fully_trusted <fully_trusted_filter>`.

.. versionchanged:: 3.14

Added support for Zstandard compression using :mod:`compression.zstd`.

.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

Return a :class:`TarFile` object for the pathname *name*. For detailed
Expand All @@ -71,6 +75,8 @@ Some facts and figures:
+------------------+---------------------------------------------+
| ``'r:xz'`` | Open for reading with lzma compression. |
+------------------+---------------------------------------------+
| ``'r:zst'`` | Open for reading with Zstandard compression.|
+------------------+---------------------------------------------+
| ``'x'`` or | Create a tarfile exclusively without |
| ``'x:'`` | compression. |
| | Raise a :exc:`FileExistsError` exception |
Expand All @@ -88,6 +94,10 @@ Some facts and figures:
| | Raise a :exc:`FileExistsError` exception |
| | if it already exists. |
+------------------+---------------------------------------------+
| ``'x:zst'`` | Create a tarfile with Zstandard compression.|
| | Raise a :exc:`FileExistsError` exception |
| | if it already exists. |
+------------------+---------------------------------------------+
| ``'a' or 'a:'`` | Open for appending with no compression. The |
| | file is created if it does not exist. |
+------------------+---------------------------------------------+
Expand All @@ -99,6 +109,8 @@ Some facts and figures:
+------------------+---------------------------------------------+
| ``'w:xz'`` | Open for lzma compressed writing. |
+------------------+---------------------------------------------+
| ``'w:zst'`` | Open for Zstandard compressed writing. |
+------------------+---------------------------------------------+

Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
is not suitable to open a certain (compressed) file for reading,
Expand All @@ -115,6 +127,15 @@ Some facts and figures:
For modes ``'w:xz'``, ``'x:xz'`` and ``'w|xz'``, :func:`tarfile.open` accepts the
keyword argument *preset* to specify the compression level of the file.

For modes ``'w:zst'``, ``'x:zst'`` and ``'w|zst'``, :func:`tarfile.open`
accepts the keyword argument *level* to specify the compression level of
the file. The keyword argument *options* may also be passed, providing
advanced Zstandard compression parameters described by
:class:`~compression.zstd.CompressionParameter`. The keyword argument
*zstd_dict* can be passed to provide a :class:`~compression.zstd.ZstdDict`,
a Zstandard dictionary used to improve compression of smaller amounts of
data.

For special purposes, there is a second format for *mode*:
``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
object that processes its data as a stream of blocks. No random seeking will
Expand Down Expand Up @@ -146,6 +167,9 @@ Some facts and figures:
| ``'r|xz'`` | Open an lzma compressed *stream* for |
| | reading. |
+-------------+--------------------------------------------+
| ``'r|zst'`` | Open a Zstandard compressed *stream* for |
| | reading. |
+-------------+--------------------------------------------+
| ``'w|'`` | Open an uncompressed *stream* for writing. |
+-------------+--------------------------------------------+
| ``'w|gz'`` | Open a gzip compressed *stream* for |
Expand All @@ -157,6 +181,9 @@ Some facts and figures:
| ``'w|xz'`` | Open an lzma compressed *stream* for |
| | writing. |
+-------------+--------------------------------------------+
| ``'w|zst'`` | Open a Zstandard compressed *stream* for |
| | writing. |
+-------------+--------------------------------------------+

.. versionchanged:: 3.5
The ``'x'`` (exclusive creation) mode was added.
Expand Down
44 changes: 32 additions & 12 deletions Doc/library/zipfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,14 +129,28 @@ The module defines the following items:

.. versionadded:: 3.3

.. data:: ZIP_ZSTANDARD

The numeric constant for Zstandard compression. This requires the
:mod:`compression.zstd` module.

.. note::

The ZIP file format specification has included support for bzip2 compression
since 2001, and for LZMA compression since 2006. However, some tools
(including older Python releases) do not support these compression
methods, and may either refuse to process the ZIP file altogether,
or fail to extract individual files.
In APPNOTE 6.3.7, the method ID ``20`` was assigned to Zstandard
compression. This was changed in APPNOTE 6.3.8 to method ID ``93`` to
avoid conflicts, with method ID ``20`` being deprecated. For
compatibility, the :mod:`!zipfile` module reads both method IDs but will
only write data with method ID ``93``.

.. versionadded:: 3.14

.. note::

The ZIP file format specification has included support for bzip2 compression
since 2001, for LZMA compression since 2006, and Zstandard compression since
2020. However, some tools (including older Python releases) do not support
these compression methods, and may either refuse to process the ZIP file
altogether, or fail to extract individual files.

.. seealso::

Expand Down Expand Up @@ -176,10 +190,11 @@ ZipFile Objects

*compression* is the ZIP compression method to use when writing the archive,
and should be :const:`ZIP_STORED`, :const:`ZIP_DEFLATED`,
:const:`ZIP_BZIP2` or :const:`ZIP_LZMA`; unrecognized
values will cause :exc:`NotImplementedError` to be raised. If
:const:`ZIP_DEFLATED`, :const:`ZIP_BZIP2` or :const:`ZIP_LZMA` is specified
but the corresponding module (:mod:`zlib`, :mod:`bz2` or :mod:`lzma`) is not
:const:`ZIP_BZIP2`, :const:`ZIP_LZMA`, or :const:`ZIP_ZSTANDARD`;
unrecognized values will cause :exc:`NotImplementedError` to be raised. If
:const:`ZIP_DEFLATED`, :const:`ZIP_BZIP2`, :const:`ZIP_LZMA`, or
:const:`ZIP_ZSTANDARD` is specified but the corresponding module
(:mod:`zlib`, :mod:`bz2`, :mod:`lzma`, or :mod:`compression.zstd`) is not
available, :exc:`RuntimeError` is raised. The default is :const:`ZIP_STORED`.

If *allowZip64* is ``True`` (the default) zipfile will create ZIP files that
Expand All @@ -194,6 +209,10 @@ ZipFile Objects
(see :class:`zlib <zlib.compressobj>` for more information).
When using :const:`ZIP_BZIP2` integers ``1`` through ``9`` are accepted
(see :class:`bz2 <bz2.BZ2File>` for more information).
When using :const:`ZIP_ZSTANDARD` integers ``-131072`` through ``22`` are
commonly accepted (see
:attr:`CompressionParameter.compression_level <compression.zstd.CompressionParameter.compression_level>`
for more on retrieving valid values and their meaning).

The *strict_timestamps* argument, when set to ``False``, allows to
zip files older than 1980-01-01 at the cost of setting the
Expand Down Expand Up @@ -415,9 +434,10 @@ ZipFile Objects
read or append. *pwd* is the password used for encrypted files as a :class:`bytes`
object and, if specified, overrides the default password set with :meth:`setpassword`.
Calling :meth:`read` on a ZipFile that uses a compression method other than
:const:`ZIP_STORED`, :const:`ZIP_DEFLATED`, :const:`ZIP_BZIP2` or
:const:`ZIP_LZMA` will raise a :exc:`NotImplementedError`. An error will also
be raised if the corresponding compression module is not available.
:const:`ZIP_STORED`, :const:`ZIP_DEFLATED`, :const:`ZIP_BZIP2`,
:const:`ZIP_LZMA`, or :const:`ZIP_ZSTANDARD` will raise a
:exc:`NotImplementedError`. An error will also be raised if the
corresponding compression module is not available.

.. versionchanged:: 3.6
Calling :meth:`read` on a closed ZipFile will raise a :exc:`ValueError`.
Expand Down
Loading