@@ -38,6 +38,8 @@ handle reference cycles. For instance, consider this code
38
38
39
39
>> > container = []
40
40
>> > container.append(container)
41
+ >> > sys.getrefcount(container)
42
+ 3
41
43
42
44
>> > del container
43
45
@@ -65,7 +67,7 @@ Normally the C structure supporting a regular Python object looks as follows:
65
67
66
68
67
69
In order to support the garbage collector, the memory layout of objects is altered
68
- to acomodate extra information **before ** the normal layout:
70
+ to accommodate extra information **before ** the normal layout:
69
71
70
72
.. code-block :: none
71
73
@@ -111,7 +113,7 @@ implemented in the ``gc`` module. The garbage collector **only focuses**
111
113
on cleaning container objects (i.e. objects that can contain a reference
112
114
to one or more objects). These can be arrays, dictionaries, lists, custom
113
115
class instances, classes in extension modules, etc. One could think that
114
- cycles are uncommon but the thuth is that many internal references needed by
116
+ cycles are uncommon but the truth is that many internal references needed by
115
117
the interpreter create cycles everywhere. Some notable examples:
116
118
117
119
* Exceptions contain traceback objects that contain a list of frames that
@@ -127,36 +129,38 @@ be identified first. This is done in the `deduce_unreachable() <https://github.c
127
129
function. Inside this component, two double-linked lists are maintained: one list contains
128
130
all objects to be scanned, and the other will contain all objects "tentatively" unreachable.
129
131
130
- To understand how the algorith works, Let’s take the case of a circular linked list which has
132
+ To understand how the algorithm works, Let’s take the case of a circular linked list which has
131
133
one link referenced by a variable A, and one self-referencing object which is completely
132
134
unreachable
133
135
134
136
.. code-block :: python
135
137
138
+ >> > import gc
136
139
137
- class Link :
138
- def __init__ (self , next_link = None ):
139
- self .next_link = next_link
140
+ >> > class Link :
141
+ ... def __init__ (self , next_link = None ):
142
+ ... self .next_link = next_link
140
143
141
- link_3 = Link()
142
- link_2 = Link(link3)
143
- link_1 = Link(link2)
144
- link_3.next_link = link_1
144
+ >> > link_3 = Link()
145
+ >> > link_2 = Link(link3)
146
+ >> > link_1 = Link(link2)
147
+ >> > link_3.next_link = link_1
145
148
146
- link_4 = Link()
147
- link_4.next_link = link_4
149
+ >> > link_4 = Link()
150
+ >> > link_4.next_link = link_4
148
151
149
- import gc
150
- gc.collect()
152
+ >> > del link_4
153
+ >> > gc.collect()
154
+ 2
151
155
152
156
When the GC starts, it has all the container objects it wants to scan
153
- on a the first linked list. The objective is to move all the unreachable
157
+ on the first linked list. The objective is to move all the unreachable
154
158
objects. As generally most objects turn out to be reachable, is much more
155
159
efficient to move the unreachable as this involves fewer pointer updates.
156
160
157
161
Every object that supports garbage collection will have a extra reference
158
162
count field initialized to the reference count (``gc_ref `` in the figures)
159
- of that object when the algorithm starts. This is because the algorith needs
163
+ of that object when the algorithm starts. This is because the algorithm needs
160
164
to modify the reference count to do the computations and in this way the
161
165
interpreter will not modify the real reference count field.
162
166
@@ -165,7 +169,7 @@ interpreter will not modify the real reference count field.
165
169
The GC then iterates over all containers in the first list and decrements by one the
166
170
``gc_ref `` field of any other object that container it is referencing. For doing
167
171
this it makes use of the ``tp_traverse `` slot in the container class (implemented
168
- using the C API or inherited by a superclass) to know what objects are refered by
172
+ using the C API or inherited by a superclass) to know what objects are referenced by
169
173
each container. After all the objects have been scanned, only the objects that have
170
174
references from outside the “objects to scan” list will have ``gc_ref > 0 ``.
171
175
@@ -176,11 +180,11 @@ This is because another object that is reachable from the outside (``gc_refs > 0
176
180
can still have references to it. For instance, the ``link_2 `` object in our example
177
181
ended having ``gc_refs == 0 `` but is referenced still by the ``link_1 `` object that
178
182
is reachable from the outside. To obtain the set of objects that are really
179
- unreachanle , the garbage collector scans again the container objects using the
180
- ``tp_traverse `` slot with a diferent traverse function that marks objects with
183
+ unreachable , the garbage collector scans again the container objects using the
184
+ ``tp_traverse `` slot with a different traverse function that marks objects with
181
185
``gc_refs == 0 `` as "tentatively unreachable" and then moves them to the
182
186
tentatively unreachable list. The following image depicts the state of the lists in a
183
- moment when the GC processed the ``link 3 `` and ``link 4 `` objects but hasn’t
187
+ moment when the GC processed the ``link 3 `` and ``link 4 `` objects but has not
184
188
processed ``link 1 `` and ``link 2 `` yet.
185
189
186
190
.. figure :: images/python-cyclic-gc-3-new-page.png
@@ -193,12 +197,12 @@ already in what will become the reachable list):
193
197
194
198
When the GC encounters an object which is reachable (``gc_refs > 0 ``), it traverses
195
199
its references using the ``tp_traverse `` slot to find all the objects that are
196
- reachable from it, marking moving them to the end oflist of reachable objects (where
200
+ reachable from it, marking moving them to the end of the list of reachable objects (where
197
201
they started originally) and setting its ``gc_refs `` field to 1. This is what happens
198
202
to ``link 2 `` and ``link 3 `` below as they are reachable from ``link 1 ``. From the
199
203
state in the previous image and after examining the objects referred to by ``link1 ``
200
204
the GC knows that ``link 3 `` is reachable after all, so it is moved back to the
201
- original list and its ``gc_refs `` field is set to one so if the GC vitis it again, it
205
+ original list and its ``gc_refs `` field is set to one so if the GC visits it again, it
202
206
does not that is reachable. To avoid visiting a object twice, the GC marks all
203
207
objects that are not visited yet with and once an object is processed is unmarked so
204
208
the GC does not process it twice.
@@ -212,24 +216,49 @@ process in really a breadth first search over the object graph. Once all the obj
212
216
are scanned, the GC knows that all container objects in the tentatively unreachable
213
217
list are really unreachable and can thus be garbage collected.
214
218
219
+ Why moving unreachable objects is better
220
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221
+
222
+ It sounds logical to move the unreachable objects under the premise that most object
223
+ are usually reachable, until you think about it: the reason it pays isn't actually
224
+ obvious.
225
+
226
+ Suppose we create objects A, B, C in that order. They appear in the young generation
227
+ in the same order. If B points to A, and C to B, and C is reachable from outside,
228
+ then the adjusted refcounts after the first step of the algorith runs will be 0, 0,
229
+ and 1 respectively because the only reachable object from the outside is C.
230
+
231
+ When the next step of the algorithm finds A, A is moved to the unreachable list. The
232
+ same for B when it's first encountered. Then C is traversed, B is moved *back * to
233
+ the reachable list. B is eventually traversed, and then A is moved back to the reachable
234
+ list.
235
+
236
+ So instead of not moving at all, the reachable objects B and A are moved twice each.
237
+ Why is this a win? A straightforward algorithm to move the reachable objects instead
238
+ would move A, B, and C once each. The key is that this dance leaves the objects in
239
+ order C, B, A - it's reversed from the original order. On all *subsequent * scans,
240
+ none of them will move. Since most objects aren't in cycles, this can save an
241
+ unbounded number of moves across an unbounded number of later collections. The only
242
+ time the cost can be higher is the first time the chain is scanned.
243
+
215
244
Destroying unreachable objects
216
245
------------------------------
217
246
218
247
Once the GC knows the list of unreachable objects, a very delicate process starts
219
- with the objective of completely destroying these objects. Roughtly , the process
248
+ with the objective of completely destroying these objects. Roughly , the process
220
249
follows these steps in order:
221
250
222
251
1. Handle and clean weak references (if any). If an object that is in the unreachable
223
252
set is going to be destroyed and has weak references with callbacks, these
224
- callbacks need to be honored. This proces is **very ** delicate as any error can
253
+ callbacks need to be honored. This process is **very ** delicate as any error can
225
254
cause objects that will be in an inconsistent state to be resurrected or reached
226
255
by some python functions invoked from the callbacks. To avoid this weak references
227
256
that also are part of the unreachable set (the object and its weak reference
228
257
are in a cycles that are unreachable) then the weak reference needs to be clean
229
- inmediately and the callback must not be executed so it does not trigger later
230
- when the ``tp_clear `` slot is called, causing havok . This is fine because both
258
+ immediately and the callback must not be executed so it does not trigger later
259
+ when the ``tp_clear `` slot is called, causing havoc . This is fine because both
231
260
the object and the weakref are going away, so it's legitimate to pretend the
232
- weakref is going away first so the callback is never executed.
261
+ weak reference is going away first so the callback is never executed.
233
262
234
263
2. If an object has legacy finalizers (``tp_del `` slot) move them to the
235
264
``gc.garbage `` list.
@@ -240,7 +269,7 @@ follows these steps in order:
240
269
finds the new subset of objects that are still unreachable by running the cycle
241
270
detection algorithm again and continues with them.
242
271
5. Call the ``tp_clear `` slot of every object so all internal links are broken and
243
- the reference counts fall to 0, triggering the destruction of all unreachabke
272
+ the reference counts fall to 0, triggering the destruction of all unreachable
244
273
objects.
245
274
246
275
Optimization: generations
@@ -253,9 +282,9 @@ creation. This has proven to be very close to the reality of many Python program
253
282
many temporary objects are created and destroyed very fast. The older an object is
254
283
the less likely is to become unreachable.
255
284
256
- To take advatange of this fact, all container objects are segregated across
285
+ To take advantage of this fact, all container objects are segregated across
257
286
three spaces or "generations" (CPython currently uses 3 generations). Every new
258
- object starts in the firstgeneration (generation 0). The previous algorithm is
287
+ object starts in the first generation (generation 0). The previous algorithm is
259
288
executed only over the objects of a particular generation and if an object
260
289
survives a collection of its generation it will be moved to the next one
261
290
(generation 1), where it will it will be surveyed for collection less often. If
@@ -279,6 +308,41 @@ The content of these generations can be examined using the
279
308
``gc.get_objects(generation=NUM) `` function and collections can be triggered
280
309
specifically in a generation by calling ``gc.collect(generation=NUM) ``.
281
310
311
+ .. code-block :: python
312
+
313
+ >> > import gc
314
+ >> > class MyObj :
315
+ ... pass
316
+ ...
317
+
318
+ # Move everything to the last generation so its easier to inspect
319
+ # the younger generations.
320
+
321
+ >> > gc.collect()
322
+ 0
323
+
324
+ # Create a reference cycle
325
+
326
+ >> > x = MyObj()
327
+ >> > x.self = x
328
+
329
+ # Initially the object is in the younguest generation
330
+
331
+ >> > gc.get_objects(generation = 0 )
332
+ [... , < __main__.MyObj object at 0x 7fbcc12a3400> , ... ]
333
+
334
+ # After a collection of the younguest generation the object
335
+ # moves to the next generation
336
+
337
+ >> > gc.collect(generation = 0 )
338
+ 0
339
+ >> > gc.get_objects(generation = 0 )
340
+ []
341
+ >> > gc.get_objects(generation = 1 )
342
+ [... , < __main__.MyObj object at 0x 7fbcc12a3400> , ... ]
343
+
344
+
345
+
282
346
Collecting the oldest generation
283
347
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
284
348
@@ -302,7 +366,7 @@ Optimization: reusing fields to save memory
302
366
-------------------------------------------
303
367
304
368
In order to save memory, the two linked list pointers in every object with gc
305
- support are reused for several pourposes . This is a common optimization known
369
+ support are reused for several purposes . This is a common optimization known
306
370
as "fat pointers" or "tagged pointers": pointers that carry additional data,
307
371
"folded" into the pointer, meaning stored inline in the data representing the
308
372
address, taking advantage of certain properties of memory addressing. This is
0 commit comments