Skip to content

Unbounded reads by zipfile may cause a MemoryError. #113977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
insistxc opened this issue Jan 12, 2024 · 6 comments
Closed

Unbounded reads by zipfile may cause a MemoryError. #113977

insistxc opened this issue Jan 12, 2024 · 6 comments
Assignees
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@insistxc
Copy link

insistxc commented Jan 12, 2024

Bug report

Bug description:

def _EndRecData(fpin):
    """Return data from the "End of Central Directory" record, or None.

    The data is a list of the nine items in the ZIP "End of central dir"
    record followed by a tenth item, the file seek offset of this record."""

    # Determine file size
    fpin.seek(0, 2)
    filesize = fpin.tell()

    # Check to see if this is ZIP file with no archive comment (the
    # "end of central directory" structure should be the last item in the
    # file if this is the case).
    try:
        fpin.seek(-sizeEndCentDir, 2)
    except OSError:
        return None
    data = fpin.read()
    if (len(data) == sizeEndCentDir and
        data[0:4] == stringEndArchive and
        data[-2:] == b"\000\000"):

image

When checking whether a file is a zip file, MemoryError was triggered, followed by OOM. After investigation, it was found that it was a read() read exception.

Through PDB debugging, it was found that a link file was read, which points to /proc/kcore, why does the existing zip file check not determine whether it is a zip file by reading the header byte (504B0304) of the file .

I think the existing judgment ZIP method does not limit the read reading. When reading a non -normal file, it may cause the system to collapse .

Hope to be resolved.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@insistxc insistxc added the type-bug An unexpected behavior, bug, or error label Jan 12, 2024
@ronaldoussoren
Copy link
Contributor

That code already tries to check if the file is a zipfile by reading the header at the end of the file. The MemoryError seems to indicate that seeking to the end of the file doesn't work as expected for /proc/kcore.

This function could be written a bit more defensively though, for example by using fpin.read(sizeEndCentDir+1) instead of fpin.read().

@ronaldoussoren ronaldoussoren added the stdlib Python modules in the Lib dir label Jan 12, 2024
@danifus
Copy link
Contributor

danifus commented Aug 5, 2024

This implements the suggested change: #122101

@cmaloney
Copy link
Contributor

cmaloney commented Aug 7, 2024

There's a secondary thing here, that unbounded read defaults to the size at open with #120755. In this bug that's more than the amount of RAM remaining on the machine, hence the MemoryError / OOM. That stashed size is currently invalidated on .truncate() but not on .seek() which zipfile uses. I'd like to make seek() clear the estimated size, but that change conflicts a lot with #121593, so have been holding off.

#122101 I think resolves this case for zipfile (and makes it more predictable, safer behavior generally) but I suspect the .seek() case will come up more in library code, so before python 3.14 would like to get that to invalidate the cached size generally.

@picnixz picnixz changed the title When checking whether a file is a zip file, a MemoryError was triggered. After investigation, it was found that it was a read() read exception. Unbounded reads by zipefile may cause a MemoryError. Nov 2, 2024
@picnixz picnixz added 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes labels Nov 2, 2024
@picnixz
Copy link
Member

picnixz commented Nov 2, 2024

Are (potential) DDoS like that considered as security issues or not? @gpshead

@gpshead gpshead changed the title Unbounded reads by zipefile may cause a MemoryError. Unbounded reads by zipfile may cause a MemoryError. Nov 3, 2024
@gpshead gpshead self-assigned this Nov 3, 2024
gpshead pushed a commit that referenced this issue Nov 3, 2024
GH-113977, GH-120754: Remove unbounded reads from zipfile

Read without a size may read an unbounded amount of data + allocate
unbounded size buffers. Move to capped size reads to prevent potential
issues.

Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 3, 2024
…pythonGH-122101)

pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile

Read without a size may read an unbounded amount of data + allocate
unbounded size buffers. Move to capped size reads to prevent potential
issues.

(cherry picked from commit 556dc9b)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 3, 2024
…pythonGH-122101)

pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile

Read without a size may read an unbounded amount of data + allocate
unbounded size buffers. Move to capped size reads to prevent potential
issues.

(cherry picked from commit 556dc9b)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
@gpshead
Copy link
Member

gpshead commented Nov 3, 2024

Are (potential) DDoS like that considered as security issues or not? @gpshead

They can be, but not particularly severe. I don't think this one is worth considering a security problem because the circumstances in which it can occur are extremely limited: Someone has to have opened a file that either isn't seekable or provides an egregious amount of data upon read after seeking to the end.

Just constructing that scenario in the first place is a sign of greater problems in the system.

@sethmlarson as FYI

@gpshead
Copy link
Member

gpshead commented Nov 3, 2024

Thanks for the nice bug report and PRs. The 3.12 and 3.13 back ports are also set to auto merge. Indeed, avoiding unbounded read assumptions is the right coding practice.

@gpshead gpshead closed this as completed Nov 3, 2024
gpshead pushed a commit that referenced this issue Nov 3, 2024
…122101) (#126347)

gh-113977, gh-120754: Remove unbounded reads from zipfile (GH-122101)

GH-113977, GH-120754: Remove unbounded reads from zipfile

Read without a size may read an unbounded amount of data + allocate
unbounded size buffers. Move to capped size reads to prevent potential
issues.

(cherry picked from commit 556dc9b)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
gpshead pushed a commit that referenced this issue Nov 3, 2024
…122101) (#126348)

gh-113977, gh-120754: Remove unbounded reads from zipfile (GH-122101)

GH-113977, GH-120754: Remove unbounded reads from zipfile

Read without a size may read an unbounded amount of data + allocate
unbounded size buffers. Move to capped size reads to prevent potential
issues.

(cherry picked from commit 556dc9b)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024
…pythonGH-122101)

pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile

Read without a size may read an unbounded amount of data + allocate
unbounded size buffers. Move to capped size reads to prevent potential
issues.

Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…pythonGH-122101)

pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile

Read without a size may read an unbounded amount of data + allocate
unbounded size buffers. Move to capped size reads to prevent potential
issues.

Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

6 participants