Skip to content

gh-135552: Make the GC clear weakrefs later #136189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Aug 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions InternalDocs/garbage_collector.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,15 +329,16 @@ Once the GC knows the list of unreachable objects, a very delicate process start
with the objective of completely destroying these objects. Roughly, the process
follows these steps in order:

1. Handle and clear weak references (if any). Weak references to unreachable objects
are set to `None`. If the weak reference has an associated callback, the callback
is enqueued to be called once the clearing of weak references is finished. We only
invoke callbacks for weak references that are themselves reachable. If both the weak
reference and the pointed-to object are unreachable we do not execute the callback.
This is partly for historical reasons: the callback could resurrect an unreachable
object and support for weak references predates support for object resurrection.
Ignoring the weak reference's callback is fine because both the object and the weakref
are going away, so it's legitimate to say the weak reference is going away first.
1. Handle weak references with callbacks (if any). If the weak reference has
an associated callback, the callback is enqueued to be called after the weak
reference is cleared. We only invoke callbacks for weak references that
are themselves reachable. If both the weak reference and the pointed-to
object are unreachable we do not execute the callback. This is partly for
historical reasons: the callback could resurrect an unreachable object
and support for weak references predates support for object resurrection.
Ignoring the weak reference's callback is fine because both the object and
the weakref are going away, so it's legitimate to say the weak reference is
going away first.
2. If an object has legacy finalizers (`tp_del` slot) move it to the
`gc.garbage` list.
3. Call the finalizers (`tp_finalize` slot) and mark the objects as already
Expand All @@ -346,7 +347,12 @@ follows these steps in order:
4. Deal with resurrected objects. If some objects have been resurrected, the GC
finds the new subset of objects that are still unreachable by running the cycle
detection algorithm again and continues with them.
5. Call the `tp_clear` slot of every object so all internal links are broken and
5. Clear any weak references that still refer to unreachable objects. The
`wr_object` attribute for these weakrefs are set to `None`. Note that some
of these weak references maybe have been newly created during the running of
finalizers in step 3. Also, clear any weak references that are part of the
unreachable set.
6. Call the `tp_clear` slot of every object so all internal links are broken and
the reference counts fall to 0, triggering the destruction of all unreachable
objects.

Expand Down
46 changes: 30 additions & 16 deletions Lib/test/test_finalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ def test_simple(self):
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
self.assertIs(wr(), None)
self.assertIsNone(wr())
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
Expand All @@ -188,12 +188,12 @@ def test_simple_resurrect(self):
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors(ids)
self.assertIsNot(wr(), None)
self.assertIsNotNone(wr())
self.clear_survivors()
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
self.assertIs(wr(), None)
self.assertIsNone(wr())

@support.cpython_only
def test_non_gc(self):
Expand Down Expand Up @@ -265,7 +265,7 @@ def test_simple(self):
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
self.assertIs(wr(), None)
self.assertIsNone(wr())
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
Expand All @@ -276,19 +276,24 @@ def test_simple_resurrect(self):
s = SelfCycleResurrector()
ids = [id(s)]
wr = weakref.ref(s)
wrc = weakref.ref(s, lambda x: None)
del s
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors(ids)
# XXX is this desirable?
self.assertIs(wr(), None)
# This used to be None because weakrefs were cleared before
# calling finalizers. Now they are cleared after.
self.assertIsNotNone(wr())
# A weakref with a callback is still cleared before calling
# finalizers.
self.assertIsNone(wrc())
# When trying to destroy the object a second time, __del__
# isn't called anymore (and the object isn't resurrected).
self.clear_survivors()
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
self.assertIs(wr(), None)
self.assertIsNone(wr())

def test_simple_suicide(self):
# Test the GC is able to deal with an object that kills its last
Expand All @@ -301,11 +306,11 @@ def test_simple_suicide(self):
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
self.assertIs(wr(), None)
self.assertIsNone(wr())
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors([])
self.assertIs(wr(), None)
self.assertIsNone(wr())


class ChainedBase:
Expand Down Expand Up @@ -378,18 +383,27 @@ def check_non_resurrecting_chain(self, classes):

def check_resurrecting_chain(self, classes):
N = len(classes)
def dummy_callback(ref):
pass
with SimpleBase.test():
nodes = self.build_chain(classes)
N = len(nodes)
ids = [id(s) for s in nodes]
survivor_ids = [id(s) for s in nodes if isinstance(s, SimpleResurrector)]
wrs = [weakref.ref(s) for s in nodes]
wrcs = [weakref.ref(s, dummy_callback) for s in nodes]
del nodes
gc.collect()
self.assert_del_calls(ids)
self.assert_survivors(survivor_ids)
# XXX desirable?
self.assertEqual([wr() for wr in wrs], [None] * N)
for wr in wrs:
# These values used to be None because weakrefs were cleared
# before calling finalizers. Now they are cleared after.
self.assertIsNotNone(wr())
for wr in wrcs:
# Weakrefs with callbacks are still cleared before calling
# finalizers.
self.assertIsNone(wr())
self.clear_survivors()
gc.collect()
self.assert_del_calls(ids)
Expand Down Expand Up @@ -491,7 +505,7 @@ def test_legacy(self):
self.assert_del_calls(ids)
self.assert_tp_del_calls(ids)
self.assert_survivors([])
self.assertIs(wr(), None)
self.assertIsNone(wr())
gc.collect()
self.assert_del_calls(ids)
self.assert_tp_del_calls(ids)
Expand All @@ -507,13 +521,13 @@ def test_legacy_resurrect(self):
self.assert_tp_del_calls(ids)
self.assert_survivors(ids)
# weakrefs are cleared before tp_del is called.
self.assertIs(wr(), None)
self.assertIsNone(wr())
self.clear_survivors()
gc.collect()
self.assert_del_calls(ids)
self.assert_tp_del_calls(ids * 2)
self.assert_survivors(ids)
self.assertIs(wr(), None)
self.assertIsNone(wr())

def test_legacy_self_cycle(self):
# Self-cycles with legacy finalizers end up in gc.garbage.
Expand All @@ -527,11 +541,11 @@ def test_legacy_self_cycle(self):
self.assert_tp_del_calls([])
self.assert_survivors([])
self.assert_garbage(ids)
self.assertIsNot(wr(), None)
self.assertIsNotNone(wr())
# Break the cycle to allow collection
gc.garbage[0].ref = None
self.assert_garbage([])
self.assertIs(wr(), None)
self.assertIsNone(wr())


if __name__ == "__main__":
Expand Down
27 changes: 24 additions & 3 deletions Lib/test/test_gc.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,26 @@ def func():
self.assertRegex(stdout, rb"""\A\s*func=None""")
self.assertFalse(stderr)

def test_datetime_weakref_cycle(self):
# https://github.com/python/cpython/issues/132413
# If the weakref used by the datetime extension gets cleared by the GC (due to being
# in an unreachable cycle) then datetime functions would crash (get_module_state()
# was returning a NULL pointer). This bug is fixed by clearing weakrefs without
# callbacks *after* running finalizers.
code = """if 1:
import _datetime
class C:
def __del__(self):
print('__del__ called')
_datetime.timedelta(days=1) # crash?

l = [C()]
l.append(l)
"""
rc, stdout, stderr = assert_python_ok("-c", code)
self.assertEqual(rc, 0)
self.assertEqual(stdout.strip(), b'__del__ called')

@refcount_test
def test_frame(self):
def f():
Expand Down Expand Up @@ -658,9 +678,8 @@ def callback(ignored):
gc.collect()
self.assertEqual(len(ouch), 2) # else the callbacks didn't run
for x in ouch:
# If the callback resurrected one of these guys, the instance
# would be damaged, with an empty __dict__.
self.assertEqual(x, None)
# The weakref should be cleared before executing the callback.
self.assertIsNone(x)

def test_bug21435(self):
# This is a poor test - its only virtue is that it happened to
Expand Down Expand Up @@ -1335,6 +1354,7 @@ def setUp(self):
def tearDown(self):
gc.disable()

@unittest.skipIf(Py_GIL_DISABLED, "requires GC generations or increments")
def test_bug1055820c(self):
# Corresponds to temp2c.py in the bug report. This is pretty
# elaborate.
Expand Down Expand Up @@ -1410,6 +1430,7 @@ def callback(ignored):
self.assertEqual(x, None)

@gc_threshold(1000, 0, 0)
@unittest.skipIf(Py_GIL_DISABLED, "requires GC generations or increments")
def test_bug1055820d(self):
# Corresponds to temp2d.py in the bug report. This is very much like
# test_bug1055820c, but uses a __del__ method instead of a weakref
Expand Down
22 changes: 18 additions & 4 deletions Lib/test/test_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -808,7 +808,12 @@ def test_closefd_attr(self):
def test_garbage_collection(self):
# FileIO objects are collected, and collecting them flushes
# all data to disk.
with warnings_helper.check_warnings(('', ResourceWarning)):
#
# Note that using warnings_helper.check_warnings() will keep the
# file alive due to the `source` argument to warn(). So, use
# catch_warnings() instead.
with warnings.catch_warnings():
warnings.simplefilter("ignore", ResourceWarning)
f = self.FileIO(os_helper.TESTFN, "wb")
f.write(b"abcxxx")
f.f = f
Expand Down Expand Up @@ -1809,7 +1814,11 @@ def test_garbage_collection(self):
# C BufferedReader objects are collected.
# The Python version has __del__, so it ends into gc.garbage instead
self.addCleanup(os_helper.unlink, os_helper.TESTFN)
with warnings_helper.check_warnings(('', ResourceWarning)):
# Note that using warnings_helper.check_warnings() will keep the
# file alive due to the `source` argument to warn(). So, use
# catch_warnings() instead.
with warnings.catch_warnings():
warnings.simplefilter("ignore", ResourceWarning)
rawio = self.FileIO(os_helper.TESTFN, "w+b")
f = self.tp(rawio)
f.f = f
Expand Down Expand Up @@ -2158,7 +2167,11 @@ def test_garbage_collection(self):
# all data to disk.
# The Python version has __del__, so it ends into gc.garbage instead
self.addCleanup(os_helper.unlink, os_helper.TESTFN)
with warnings_helper.check_warnings(('', ResourceWarning)):
# Note that using warnings_helper.check_warnings() will keep the
# file alive due to the `source` argument to warn(). So, use
# catch_warnings() instead.
with warnings.catch_warnings():
warnings.simplefilter("ignore", ResourceWarning)
rawio = self.FileIO(os_helper.TESTFN, "w+b")
f = self.tp(rawio)
f.write(b"123xxx")
Expand Down Expand Up @@ -4080,7 +4093,8 @@ def test_garbage_collection(self):
# C TextIOWrapper objects are collected, and collecting them flushes
# all data to disk.
# The Python version has __del__, so it ends in gc.garbage instead.
with warnings_helper.check_warnings(('', ResourceWarning)):
with warnings.catch_warnings():
warnings.simplefilter("ignore", ResourceWarning)
rawio = self.FileIO(os_helper.TESTFN, "wb")
b = self.BufferedWriter(rawio)
t = self.TextIOWrapper(b, encoding="ascii")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Fix a bug caused by the garbage collector clearing weakrefs too early. The
weakrefs in the ``tp_subclasses`` dictionary are needed in order to correctly
invalidate type caches (for example, by calling ``PyType_Modified()``).
Clearing weakrefs before calling finalizers causes the caches to not be
correctly invalidated. That can cause crashes since the caches can refer to
invalid objects. Defer the clearing of weakrefs without callbacks until after
finalizers are executed.
2 changes: 1 addition & 1 deletion Modules/_datetimemodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ clear_current_module(PyInterpreterState *interp, PyObject *expected)
if (PyDict_GetItemRef(dict, INTERP_KEY, &ref) < 0) {
goto error;
}
if (ref != NULL) {
if (ref != NULL && ref != Py_None) {
PyObject *current = NULL;
int rc = PyWeakref_GetRef(ref, &current);
/* We only need "current" for pointer comparison. */
Expand Down
10 changes: 10 additions & 0 deletions Modules/gc_weakref.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
Intro
=====

**************************************************************************
Note: this document was written long ago, before PEP 442 (safe object
finalization) was implemented. While that has changed some things, this
document is still mostly accurate. Just note that the rules being discussed
here apply to the unreachable set of objects *after* non-legacy finalizers
have been called. Also, the clearing of weakrefs has been changed to happen
later in the collection (after running finalizers but before tp_clear is
called).
**************************************************************************

The basic rule for dealing with weakref callbacks (and __del__ methods too,
for that matter) during cyclic gc:

Expand Down
Loading
Loading