Skip to content

Commit d8105aa

Browse files
committed
Fix several typos
1 parent 10ee4fd commit d8105aa

File tree

1 file changed

+95
-31
lines changed

1 file changed

+95
-31
lines changed

garbage_collector.rst

+95-31
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ handle reference cycles. For instance, consider this code
3838
3939
>>> container = []
4040
>>> container.append(container)
41+
>>> sys.getrefcount(container)
42+
3
4143
4244
>>> del container
4345
@@ -65,7 +67,7 @@ Normally the C structure supporting a regular Python object looks as follows:
6567
6668
6769
In order to support the garbage collector, the memory layout of objects is altered
68-
to acomodate extra information **before** the normal layout:
70+
to accommodate extra information **before** the normal layout:
6971

7072
.. code-block:: none
7173
@@ -111,7 +113,7 @@ implemented in the ``gc`` module. The garbage collector **only focuses**
111113
on cleaning container objects (i.e. objects that can contain a reference
112114
to one or more objects). These can be arrays, dictionaries, lists, custom
113115
class instances, classes in extension modules, etc. One could think that
114-
cycles are uncommon but the thuth is that many internal references needed by
116+
cycles are uncommon but the truth is that many internal references needed by
115117
the interpreter create cycles everywhere. Some notable examples:
116118

117119
* Exceptions contain traceback objects that contain a list of frames that
@@ -127,36 +129,38 @@ be identified first. This is done in the `deduce_unreachable() <https://github.c
127129
function. Inside this component, two double-linked lists are maintained: one list contains
128130
all objects to be scanned, and the other will contain all objects "tentatively" unreachable.
129131

130-
To understand how the algorith works, Let’s take the case of a circular linked list which has
132+
To understand how the algorithm works, Let’s take the case of a circular linked list which has
131133
one link referenced by a variable A, and one self-referencing object which is completely
132134
unreachable
133135

134136
.. code-block:: python
135137
138+
>>> import gc
136139
137-
class Link:
138-
def __init__(self, next_link=None):
139-
self.next_link = next_link
140+
>>> class Link:
141+
... def __init__(self, next_link=None):
142+
... self.next_link = next_link
140143
141-
link_3 = Link()
142-
link_2 = Link(link3)
143-
link_1 = Link(link2)
144-
link_3.next_link = link_1
144+
>>> link_3 = Link()
145+
>>> link_2 = Link(link3)
146+
>>> link_1 = Link(link2)
147+
>>> link_3.next_link = link_1
145148
146-
link_4 = Link()
147-
link_4.next_link = link_4
149+
>>> link_4 = Link()
150+
>>> link_4.next_link = link_4
148151
149-
import gc
150-
gc.collect()
152+
>>> del link_4
153+
>>> gc.collect()
154+
2
151155
152156
When the GC starts, it has all the container objects it wants to scan
153-
on a the first linked list. The objective is to move all the unreachable
157+
on the first linked list. The objective is to move all the unreachable
154158
objects. As generally most objects turn out to be reachable, is much more
155159
efficient to move the unreachable as this involves fewer pointer updates.
156160

157161
Every object that supports garbage collection will have a extra reference
158162
count field initialized to the reference count (``gc_ref`` in the figures)
159-
of that object when the algorithm starts. This is because the algorith needs
163+
of that object when the algorithm starts. This is because the algorithm needs
160164
to modify the reference count to do the computations and in this way the
161165
interpreter will not modify the real reference count field.
162166

@@ -165,7 +169,7 @@ interpreter will not modify the real reference count field.
165169
The GC then iterates over all containers in the first list and decrements by one the
166170
``gc_ref`` field of any other object that container it is referencing. For doing
167171
this it makes use of the ``tp_traverse`` slot in the container class (implemented
168-
using the C API or inherited by a superclass) to know what objects are refered by
172+
using the C API or inherited by a superclass) to know what objects are referenced by
169173
each container. After all the objects have been scanned, only the objects that have
170174
references from outside the “objects to scan” list will have ``gc_ref > 0``.
171175

@@ -176,11 +180,11 @@ This is because another object that is reachable from the outside (``gc_refs > 0
176180
can still have references to it. For instance, the ``link_2`` object in our example
177181
ended having ``gc_refs == 0`` but is referenced still by the ``link_1`` object that
178182
is reachable from the outside. To obtain the set of objects that are really
179-
unreachanle, the garbage collector scans again the container objects using the
180-
``tp_traverse`` slot with a diferent traverse function that marks objects with
183+
unreachable, the garbage collector scans again the container objects using the
184+
``tp_traverse`` slot with a different traverse function that marks objects with
181185
``gc_refs == 0`` as "tentatively unreachable" and then moves them to the
182186
tentatively unreachable list. The following image depicts the state of the lists in a
183-
moment when the GC processed the ``link 3`` and ``link 4`` objects but hasn’t
187+
moment when the GC processed the ``link 3`` and ``link 4`` objects but has not
184188
processed ``link 1`` and ``link 2`` yet.
185189

186190
.. figure:: images/python-cyclic-gc-3-new-page.png
@@ -193,12 +197,12 @@ already in what will become the reachable list):
193197

194198
When the GC encounters an object which is reachable (``gc_refs > 0``), it traverses
195199
its references using the ``tp_traverse`` slot to find all the objects that are
196-
reachable from it, marking moving them to the end oflist of reachable objects (where
200+
reachable from it, marking moving them to the end of the list of reachable objects (where
197201
they started originally) and setting its ``gc_refs`` field to 1. This is what happens
198202
to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From the
199203
state in the previous image and after examining the objects referred to by ``link1``
200204
the GC knows that ``link 3`` is reachable after all, so it is moved back to the
201-
original list and its ``gc_refs`` field is set to one so if the GC vitis it again, it
205+
original list and its ``gc_refs`` field is set to one so if the GC visits it again, it
202206
does not that is reachable. To avoid visiting a object twice, the GC marks all
203207
objects that are not visited yet with and once an object is processed is unmarked so
204208
the GC does not process it twice.
@@ -212,24 +216,49 @@ process in really a breadth first search over the object graph. Once all the obj
212216
are scanned, the GC knows that all container objects in the tentatively unreachable
213217
list are really unreachable and can thus be garbage collected.
214218

219+
Why moving unreachable objects is better
220+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221+
222+
It sounds logical to move the unreachable objects under the premise that most object
223+
are usually reachable, until you think about it: the reason it pays isn't actually
224+
obvious.
225+
226+
Suppose we create objects A, B, C in that order. They appear in the young generation
227+
in the same order. If B points to A, and C to B, and C is reachable from outside,
228+
then the adjusted refcounts after the first step of the algorith runs will be 0, 0,
229+
and 1 respectively because the only reachable object from the outside is C.
230+
231+
When the next step of the algorithm finds A, A is moved to the unreachable list. The
232+
same for B when it's first encountered. Then C is traversed, B is moved *back* to
233+
the reachable list. B is eventually traversed, and then A is moved back to the reachable
234+
list.
235+
236+
So instead of not moving at all, the reachable objects B and A are moved twice each.
237+
Why is this a win? A straightforward algorithm to move the reachable objects instead
238+
would move A, B, and C once each. The key is that this dance leaves the objects in
239+
order C, B, A - it's reversed from the original order. On all *subsequent* scans,
240+
none of them will move. Since most objects aren't in cycles, this can save an
241+
unbounded number of moves across an unbounded number of later collections. The only
242+
time the cost can be higher is the first time the chain is scanned.
243+
215244
Destroying unreachable objects
216245
------------------------------
217246

218247
Once the GC knows the list of unreachable objects, a very delicate process starts
219-
with the objective of completely destroying these objects. Roughtly, the process
248+
with the objective of completely destroying these objects. Roughly, the process
220249
follows these steps in order:
221250

222251
1. Handle and clean weak references (if any). If an object that is in the unreachable
223252
set is going to be destroyed and has weak references with callbacks, these
224-
callbacks need to be honored. This proces is **very** delicate as any error can
253+
callbacks need to be honored. This process is **very** delicate as any error can
225254
cause objects that will be in an inconsistent state to be resurrected or reached
226255
by some python functions invoked from the callbacks. To avoid this weak references
227256
that also are part of the unreachable set (the object and its weak reference
228257
are in a cycles that are unreachable) then the weak reference needs to be clean
229-
inmediately and the callback must not be executed so it does not trigger later
230-
when the ``tp_clear`` slot is called, causing havok. This is fine because both
258+
immediately and the callback must not be executed so it does not trigger later
259+
when the ``tp_clear`` slot is called, causing havoc. This is fine because both
231260
the object and the weakref are going away, so it's legitimate to pretend the
232-
weakref is going away first so the callback is never executed.
261+
weak reference is going away first so the callback is never executed.
233262

234263
2. If an object has legacy finalizers (``tp_del`` slot) move them to the
235264
``gc.garbage`` list.
@@ -240,7 +269,7 @@ follows these steps in order:
240269
finds the new subset of objects that are still unreachable by running the cycle
241270
detection algorithm again and continues with them.
242271
5. Call the ``tp_clear`` slot of every object so all internal links are broken and
243-
the reference counts fall to 0, triggering the destruction of all unreachabke
272+
the reference counts fall to 0, triggering the destruction of all unreachable
244273
objects.
245274

246275
Optimization: generations
@@ -253,9 +282,9 @@ creation. This has proven to be very close to the reality of many Python program
253282
many temporary objects are created and destroyed very fast. The older an object is
254283
the less likely is to become unreachable.
255284

256-
To take advatange of this fact, all container objects are segregated across
285+
To take advantage of this fact, all container objects are segregated across
257286
three spaces or "generations" (CPython currently uses 3 generations). Every new
258-
object starts in the firstgeneration (generation 0). The previous algorithm is
287+
object starts in the first generation (generation 0). The previous algorithm is
259288
executed only over the objects of a particular generation and if an object
260289
survives a collection of its generation it will be moved to the next one
261290
(generation 1), where it will it will be surveyed for collection less often. If
@@ -279,6 +308,41 @@ The content of these generations can be examined using the
279308
``gc.get_objects(generation=NUM)`` function and collections can be triggered
280309
specifically in a generation by calling ``gc.collect(generation=NUM)``.
281310

311+
.. code-block:: python
312+
313+
>>> import gc
314+
>>> class MyObj:
315+
... pass
316+
...
317+
318+
# Move everything to the last generation so its easier to inspect
319+
# the younger generations.
320+
321+
>>> gc.collect()
322+
0
323+
324+
# Create a reference cycle
325+
326+
>>> x = MyObj()
327+
>>> x.self = x
328+
329+
# Initially the object is in the younguest generation
330+
331+
>>> gc.get_objects(generation=0)
332+
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
333+
334+
# After a collection of the younguest generation the object
335+
# moves to the next generation
336+
337+
>>> gc.collect(generation=0)
338+
0
339+
>>> gc.get_objects(generation=0)
340+
[]
341+
>>> gc.get_objects(generation=1)
342+
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
343+
344+
345+
282346
Collecting the oldest generation
283347
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
284348

@@ -302,7 +366,7 @@ Optimization: reusing fields to save memory
302366
-------------------------------------------
303367

304368
In order to save memory, the two linked list pointers in every object with gc
305-
support are reused for several pourposes. This is a common optimization known
369+
support are reused for several purposes. This is a common optimization known
306370
as "fat pointers" or "tagged pointers": pointers that carry additional data,
307371
"folded" into the pointer, meaning stored inline in the data representing the
308372
address, taking advantage of certain properties of memory addressing. This is

0 commit comments

Comments
 (0)