Skip to content

gh-137894: Fix segmentation fault in deeply nested filter() iterator chains #137896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

tangyuan0821
Copy link
Contributor

@tangyuan0821 tangyuan0821 commented Aug 18, 2025

Summary

Fixed a critical segmentation fault bug in Python 3.13 where creating deeply nested chains of filter() iterators would crash the interpreter instead of raising a proper Python exception. The fix adds a tp_clear method to the filter object to prevent stack overflow during garbage collection of cyclic references.

Problem

When creating deeply nested filter() iterator chains (e.g., applying filter() thousands of times in a loop), Python 3.13 would crash with a segmentation fault instead of raising a RecursionError. This violates Python's fundamental principle that user code should never be able to crash the interpreter.

Example problematic code:

i = filter(bool, range(1000000))
for _ in range(100000):
    i = filter(bool, i)
print(list(i))  # Segmentation fault in Python 3.13

Expected behavior: Should raise RecursionError or similar Python exception
Actual behavior: Segmentation fault (interpreter crash)

Root Cause

The filter object type in Python/bltinmodule.c was missing a tp_clear method implementation. During garbage collection of deeply nested filter iterator chains, the garbage collector would attempt to deallocate objects through recursive calls to filter_dealloc(), eventually exceeding the C stack limit and causing a segmentation fault.

Solution

Added a tp_clear method to the filter object type:

  1. Implemented filter_clear() function:

    static int
    filter_clear(filterobject *lz)
    {
        Py_CLEAR(lz->it);
        Py_CLEAR(lz->func);
        return 0;
    }
  2. Updated PyFilter_Type definition:

    // Changed from:
    0,                                  /* tp_clear */
    // To:
    (inquiry)filter_clear,              /* tp_clear */

The tp_clear method allows the garbage collector to break reference cycles safely without relying on deep recursive deallocation, preventing stack overflow crashes.

@bedevere-app

This comment was marked as resolved.

@StanFromIreland StanFromIreland changed the title [3.13]gh137894:Fix segmentation fault in deeply nested filter() iterator chains gh-137894: Fix segmentation fault in deeply nested filter() iterator chains Aug 18, 2025
@tangyuan0821 tangyuan0821 changed the title gh-137894: Fix segmentation fault in deeply nested filter() iterator chains [3.13]gh-137894: Fix segmentation fault in deeply nested filter() iterator chains Aug 18, 2025
@tangyuan0821 tangyuan0821 changed the title [3.13]gh-137894: Fix segmentation fault in deeply nested filter() iterator chains gh-137894: Fix segmentation fault in deeply nested filter() iterator chains Aug 18, 2025
@StanFromIreland
Copy link
Member

Please do not add [3.13] to the title, that is for PR's against the branch (which are backports).

@sergey-miryanov
Copy link
Contributor

Please add tests :)

@tangyuan0821
Copy link
Contributor Author

Please add tests :)

Done!

@ZeroIntensity ZeroIntensity added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Aug 18, 2025
@@ -1132,6 +1132,20 @@ def test_filter_dealloc(self):
del i
gc.collect()

@support.requires_resource('cpu')
Copy link
Member

@picnixz picnixz Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really need a CPU resource? (or maybe it's because of i = filter(bool, i)? how long does the test take? (seconds or minutes?)

@tangyuan0821 tangyuan0821 requested a review from picnixz August 18, 2025 10:17
@@ -1132,6 +1132,18 @@ def test_filter_dealloc(self):
del i
gc.collect()


Copy link
Member

@picnixz picnixz Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't simply remove things without answering my question. How long does the test take? does it take minutes, seconds or milliseconds? it it's more than 10 seconds or if the CPU is really strained, then we should keep the CPU resource. Otherwise we shouldn't.

Copy link
Contributor Author

@tangyuan0821 tangyuan0821 Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my performance test, the setup phase of this test (creating 100,000 nested filters) is very fast (approximately 0.02-0.03 seconds), but the execution phase (triggering recursion by calling list()) consumes a lot of time and CPU resources, and it may take several minutes to complete or trigger a RecursionError. Therefore, I think the@support.requires_resource('cpu')decorator should be retained, because although the setup of this test is fast, its actual execution will become a resource-intensive stress test.

@bedevere-app
Copy link

bedevere-app bot commented Aug 18, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't confirm that this fixes the bug. With this PR applied, the test case takes several minutes and is not interruptible through CTRL^C, and it seemingly doesn't raise a RecursionError either. All of the CI jobs are crashing on your test.

@skirpichev
Copy link
Contributor

@ZeroIntensity, see issue thread

@picnixz
Copy link
Member

picnixz commented Aug 18, 2025

Apparently, the issue was considered a non-issue as C stack exhaustion is hard to predict. Just adding a clear doesn't help as it's shown. As such, I think we can just close the issue and this PR altogether. For instance map still seems to suffer from this.

@picnixz
Copy link
Member

picnixz commented Aug 18, 2025

I closed the parent issue and so will close this PR as it doesn't solve the issue as observed by @ZeroIntensity

@picnixz picnixz closed this Aug 18, 2025
@tangyuan0821 tangyuan0821 deleted the 137894-08 branch August 18, 2025 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants