Skip to content

Add section about the design of CPython's garbage collector #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jan 21, 2020
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
10ee4fd
Add section about the design of CPython's garbage collector
pablogsal Jan 20, 2020
8aac485
Fix several typos
pablogsal Jan 20, 2020
7776e14
Update garbage_collector.rst
pablogsal Jan 20, 2020
c72ce11
Fix more typos
pablogsal Jan 20, 2020
410487b
Update garbage_collector.rst
pablogsal Jan 20, 2020
b18da79
Update garbage_collector.rst
pablogsal Jan 20, 2020
25d7555
Apply suggestions from code review
pablogsal Jan 20, 2020
811235f
Fix more typos
pablogsal Jan 20, 2020
8b72c71
Update garbage_collector.rst
pablogsal Jan 20, 2020
3b84282
Update garbage_collector.rst
pablogsal Jan 20, 2020
0d3f322
Apply suggestions from code review
pablogsal Jan 21, 2020
d84a267
Apply suggestions from code review
pablogsal Jan 21, 2020
5ca46f7
Fix indentation and rework incomplete sentence
pablogsal Jan 21, 2020
404fcc5
Fix more indentation
pablogsal Jan 21, 2020
8fc0547
Rework sentence about _gc_prev
pablogsal Jan 21, 2020
e956478
Update garbage_collector.rst
pablogsal Jan 21, 2020
47190cd
Fix indentation
pablogsal Jan 21, 2020
722a99c
Fix quotes
pablogsal Jan 21, 2020
b40b030
Fix typo and indentation
pablogsal Jan 21, 2020
111b2bb
Update garbage_collector.rst
pablogsal Jan 21, 2020
2ec5002
Update garbage_collector.rst
pablogsal Jan 21, 2020
7c8e02a
Add author section
pablogsal Jan 21, 2020
e51d354
Add a warning section regarding tagged pointers
pablogsal Jan 21, 2020
411037f
Add reference to the memory layout
pablogsal Jan 21, 2020
4081607
Fix link to the generation section
pablogsal Jan 21, 2020
03178a0
Apply suggestions from code review
pablogsal Jan 21, 2020
6d01e46
Fix more typos
pablogsal Jan 21, 2020
4321593
Address Petr feedback and fix more typos
pablogsal Jan 21, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply suggestions from code review
Co-Authored-By: Tim Peters <tim.peters@gmail.com>
  • Loading branch information
pablogsal and tim-one authored Jan 21, 2020
commit d84a267128d448d7b47da8736f9af745b0eb2cc3
12 changes: 8 additions & 4 deletions garbage_collector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ to the object when called):
>>> sys.getrefcount(x)
2

The main problem present with the reference count schema is that reference counting
The main problem with the reference count schema is that reference counting
does not handle reference cycles. For instance, consider this code:

.. code-block:: python
Expand Down Expand Up @@ -115,7 +115,7 @@ that the objects cannot form reference cycles with only objects of its type or i
type is immutable, a ``tp_clear`` implementation must also be provided.


Identifiying reference cycles reference cycles
Identifiying reference cycles
----------------------------------------------

The algorithm that CPython uses to detect those reference cycles is
Expand All @@ -128,6 +128,8 @@ the interpreter create cycles everywhere. Some notable examples:

* Exceptions contain traceback objects that contain a list of frames that
contain the exception itself.
* Module-level functions reference the module's dict (which is needed to resolve globals),
which in turn contains an entry for the module-level function.
* Instances have references to their class which itself references its module, and the module
contains references to everything that is inside (and maybe other modules)
and this can lead back to the original instance.
Expand Down Expand Up @@ -181,7 +183,7 @@ The GC then iterates over all containers in the first list and decrements by one
this makes use of the ``tp_traverse`` slot in the container class (implemented
using the C API or inherited by a superclass) to know what objects are referenced by
each container. After all the objects have been scanned, only the objects that have
references from outside the “objects to scan” list will have ``gc_ref > 0``.
references from outside the “objects to scan” list will have ``gc_refs > 0``.

.. figure:: images/python-cyclic-gc-2-new-page.png

Expand Down Expand Up @@ -226,6 +228,8 @@ process in really a breadth first search over the object graph. Once all the obj
are scanned, the GC knows that all container objects in the tentatively unreachable
list are really unreachable and can thus be garbage collected.

Pragmatically, it's important to note that no recursion is required by any of this, and neither does it in any other way require additional memory proportional to the number of objects, number of pointers, or the lengths of pointer chains. Apart from ``O(1)`` storage for internal C needs, the objects themselves contain all the storage the GC algorithms require.

Why moving unreachable objects is better
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -303,7 +307,7 @@ it will be moved to the last generation (generation 2) where it will be
surveyed the least often.

Generations are collected when the number of objects that they contain reach some
predefined threshold which is unique of each generation and is lower than the older
predefined threshold which is unique for each generation and is lower than the older
generations are. These thresholds can be examined using the ``gc.get_threshold``
function:

Expand Down