Skip to content

[3.10] gh-135034: Normalize link targets in tarfile, add os.path.realpath(strict='allow_missing') (GH-135037) #135070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 29 additions & 4 deletions Doc/library/os.path.rst
Original file line number Diff line number Diff line change
Expand Up @@ -352,10 +352,26 @@ the :mod:`glob` module.)
links encountered in the path (if they are supported by the operating
system).

If a path doesn't exist or a symlink loop is encountered, and *strict* is
``True``, :exc:`OSError` is raised. If *strict* is ``False``, the path is
resolved as far as possible and any remainder is appended without checking
whether it exists.
By default, the path is evaluated up to the first component that does not
exist, is a symlink loop, or whose evaluation raises :exc:`OSError`.
All such components are appended unchanged to the existing part of the path.

Some errors that are handled this way include "access denied", "not a
directory", or "bad argument to internal function". Thus, the
resulting path may be missing or inaccessible, may still contain
links or loops, and may traverse non-directories.

This behavior can be modified by keyword arguments:

If *strict* is ``True``, the first error encountered when evaluating the path is
re-raised.
In particular, :exc:`FileNotFoundError` is raised if *path* does not exist,
or another :exc:`OSError` if it is otherwise inaccessible.

If *strict* is :py:data:`os.path.ALLOW_MISSING`, errors other than
:exc:`FileNotFoundError` are re-raised (as with ``strict=True``).
Thus, the returned path will not contain any symbolic links, but the named
file and some of its parent directories may be missing.

.. note::
This function emulates the operating system's procedure for making a path
Expand All @@ -374,6 +390,15 @@ the :mod:`glob` module.)
.. versionchanged:: 3.10
The *strict* parameter was added.

.. versionchanged:: next
The :py:data:`~os.path.ALLOW_MISSING` value for the *strict* parameter
was added.

.. data:: ALLOW_MISSING

Special value used for the *strict* argument in :func:`realpath`.

.. versionadded:: next

.. function:: relpath(path, start=os.curdir)

Expand Down
20 changes: 20 additions & 0 deletions Doc/library/tarfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,15 @@ The :mod:`tarfile` module defines the following exceptions:
Raised to refuse extracting a symbolic link pointing outside the destination
directory.

.. exception:: LinkFallbackError

Raised to refuse emulating a link (hard or symbolic) by extracting another
archive member, when that member would be rejected by the filter location.
The exception that was raised to reject the replacement member is available
as :attr:`!BaseException.__context__`.

.. versionadded:: next


The following constants are available at the module level:

Expand Down Expand Up @@ -954,6 +963,12 @@ reused in custom filters:
Implements the ``'data'`` filter.
In addition to what ``tar_filter`` does:

- Normalize link targets (:attr:`TarInfo.linkname`) using
:func:`os.path.normpath`.
Note that this removes internal ``..`` components, which may change the
meaning of the link if the path in :attr:`!TarInfo.linkname` traverses
symbolic links.

- :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
that link to absolute paths, or ones that link outside the destination.

Expand Down Expand Up @@ -982,6 +997,10 @@ reused in custom filters:

Return the modified ``TarInfo`` member.

.. versionchanged:: next

Link targets are now normalized.


.. _tarfile-extraction-refuse:

Expand All @@ -1008,6 +1027,7 @@ Here is an incomplete list of things to consider:
* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
to prevent e.g. exploiting pre-existing links, and to make it easier to
clean up after a failed extraction.
* Disallow symbolic links if you do not need the functionality.
* When working with untrusted data, use external (e.g. OS-level) limits on
disk, memory and CPU usage.
* Check filenames against an allow-list of characters
Expand Down
34 changes: 34 additions & 0 deletions Doc/whatsnew/3.10.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2394,3 +2394,37 @@ email
check if the *strict* paramater is available.
(Contributed by Thomas Dwyer and Victor Stinner for :gh:`102988` to improve
the CVE-2023-27043 fix.)


Notable changes in 3.10.18
==========================

os.path
-------

* The *strict* parameter to :func:`os.path.realpath` accepts a new value,
:data:`os.path.ALLOW_MISSING`.
If used, errors other than :exc:`FileNotFoundError` will be re-raised;
the resulting path can be missing but it will be free of symlinks.
(Contributed by Petr Viktorin for CVE 2025-4517.)

tarfile
-------

* :func:`~tarfile.data_filter` now normalizes symbolic link targets in order to
avoid path traversal attacks.
(Contributed by Petr Viktorin in :gh:`127987` and CVE 2025-4138.)
* :func:`~tarfile.TarFile.extractall` now skips fixing up directory attributes
when a directory was removed or replaced by another kind of file.
(Contributed by Petr Viktorin in :gh:`127987` and CVE 2024-12718.)
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
now (re-)apply the extraction filter when substituting a link (hard or
symbolic) with a copy of another archive member, and when fixing up
directory attributes.
The former raises a new exception, :exc:`~tarfile.LinkFallbackError`.
(Contributed by Petr Viktorin for CVE 2025-4330 and CVE 2024-12718.)
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
no longer extract rejected members when
:func:`~tarfile.TarFile.errorlevel` is zero.
(Contributed by Matt Prodani and Petr Viktorin in :gh:`112887`
and CVE 2025-4435.)
11 changes: 10 additions & 1 deletion Lib/genericpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

__all__ = ['commonprefix', 'exists', 'getatime', 'getctime', 'getmtime',
'getsize', 'isdir', 'isfile', 'samefile', 'sameopenfile',
'samestat']
'samestat', 'ALLOW_MISSING']


# Does a path exist?
Expand Down Expand Up @@ -153,3 +153,12 @@ def _check_arg_types(funcname, *args):
f'os.PathLike object, not {s.__class__.__name__!r}') from None
if hasstr and hasbytes:
raise TypeError("Can't mix strings and bytes in path components") from None

# A singleton with a true boolean value.
@object.__new__
class ALLOW_MISSING:
"""Special value for use in realpath()."""
def __repr__(self):
return 'os.path.ALLOW_MISSING'
def __reduce__(self):
return self.__class__.__name__
35 changes: 23 additions & 12 deletions Lib/ntpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@
"ismount", "expanduser","expandvars","normpath","abspath",
"curdir","pardir","sep","pathsep","defpath","altsep",
"extsep","devnull","realpath","supports_unicode_filenames","relpath",
"samefile", "sameopenfile", "samestat", "commonpath"]
"samefile", "sameopenfile", "samestat", "commonpath",
"ALLOW_MISSING"]

def _get_bothseps(path):
if isinstance(path, bytes):
Expand Down Expand Up @@ -571,9 +572,10 @@ def abspath(path):
from nt import _getfinalpathname, readlink as _nt_readlink
except ImportError:
# realpath is a no-op on systems without _getfinalpathname support.
realpath = abspath
def realpath(path, *, strict=False):
return abspath(path)
else:
def _readlink_deep(path):
def _readlink_deep(path, ignored_error=OSError):
# These error codes indicate that we should stop reading links and
# return the path we currently have.
# 1: ERROR_INVALID_FUNCTION
Expand Down Expand Up @@ -606,7 +608,7 @@ def _readlink_deep(path):
path = old_path
break
path = normpath(join(dirname(old_path), path))
except OSError as ex:
except ignored_error as ex:
if ex.winerror in allowed_winerror:
break
raise
Expand All @@ -615,7 +617,7 @@ def _readlink_deep(path):
break
return path

def _getfinalpathname_nonstrict(path):
def _getfinalpathname_nonstrict(path, ignored_error=OSError):
# These error codes indicate that we should stop resolving the path
# and return the value we currently have.
# 1: ERROR_INVALID_FUNCTION
Expand All @@ -642,17 +644,18 @@ def _getfinalpathname_nonstrict(path):
try:
path = _getfinalpathname(path)
return join(path, tail) if tail else path
except OSError as ex:
except ignored_error as ex:
if ex.winerror not in allowed_winerror:
raise
try:
# The OS could not resolve this path fully, so we attempt
# to follow the link ourselves. If we succeed, join the tail
# and return.
new_path = _readlink_deep(path)
new_path = _readlink_deep(path,
ignored_error=ignored_error)
if new_path != path:
return join(new_path, tail) if tail else new_path
except OSError:
except ignored_error:
# If we fail to readlink(), let's keep traversing
pass
path, name = split(path)
Expand Down Expand Up @@ -683,16 +686,24 @@ def realpath(path, *, strict=False):
if normcase(path) == normcase(devnull):
return '\\\\.\\NUL'
had_prefix = path.startswith(prefix)

if strict is ALLOW_MISSING:
ignored_error = FileNotFoundError
strict = True
elif strict:
ignored_error = ()
else:
ignored_error = OSError

if not had_prefix and not isabs(path):
path = join(cwd, path)
try:
path = _getfinalpathname(path)
initial_winerror = 0
except OSError as ex:
if strict:
raise
except ignored_error as ex:
initial_winerror = ex.winerror
path = _getfinalpathname_nonstrict(path)
path = _getfinalpathname_nonstrict(path,
ignored_error=ignored_error)
# The path returned by _getfinalpathname will always start with \\?\ -
# strip off that prefix unless it was already provided on the original
# path.
Expand Down
15 changes: 11 additions & 4 deletions Lib/posixpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
"samefile","sameopenfile","samestat",
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
"devnull","realpath","supports_unicode_filenames","relpath",
"commonpath"]
"commonpath", "ALLOW_MISSING"]


def _get_sep(path):
Expand Down Expand Up @@ -407,6 +407,15 @@ def _joinrealpath(path, rest, strict, seen):
sep = '/'
curdir = '.'
pardir = '..'
getcwd = os.getcwd
if strict is ALLOW_MISSING:
ignored_error = FileNotFoundError
elif strict:
ignored_error = ()
else:
ignored_error = OSError

maxlinks = None

if isabs(rest):
rest = rest[1:]
Expand All @@ -429,9 +438,7 @@ def _joinrealpath(path, rest, strict, seen):
newpath = join(path, name)
try:
st = os.lstat(newpath)
except OSError:
if strict:
raise
except ignored_error:
is_link = False
else:
is_link = stat.S_ISLNK(st.st_mode)
Expand Down
Loading
Loading