Skip to content

[RFC] GC: Both Stop-the-World and On-the-fly #4158

Open
@discord9

Description

@discord9

Draft PR: #4180

Summary

I propose we implement a cycle collector for our rustPython RefCount garbage collection system based on Bacon etc.'s work in this two paper:

  • Concurrent Cycle Collection in Reference Counted Systems, this paper come up with a sync(stop the world) cycle collector and a concurrent version of it, although I suggest only impl the sync version of it, for concurrent version is harder to implement and still limited in preformance which is not as good as the second paper we are introducing
  • An efficient on-the-fly cycle collection, this paper based on the sync cycle collector in previous paper, adding a "sliding view" algorithm to it, which is like a "copy on write" for Object so during tracing the GC can see a fixed version of object dependence graph while mutator can still freely mutate object whatever they want.

Detailed Explanation

The synchronous cycle collector is easier to impl, I suggest we implement a synchronous verison of cycle collector for proof of Concept, then modfiy it to a On-the-Fly version.

Synchronous Cycle Collection

The algo in "Concurrent Cycle Collection in Reference Counted Systems" is O(N + E) wrost-case time for collection at most.

For an object T in its' header there need fields denoted buffered(T), color(T),and RC(T). Where:

  • buffer is used to place potential roots of cyclic garbage, buffered(T) denote whether a object is in the buffer.
  • color is explain in this table(Red and Orange is only used in concurrent version):
Color Meaning
Black In use or free
Gray Possible member of cycle
White Member of garbage cycle
Purple Possible root of cycle
Green Acyclic
Red Candidate cycle undergoing $\Sigma$-computation
Orange Candidate cycle awaiting epoch boundary
  • RC is just a reference count(which doesn't even need to be atomic for a sync version).

Pseudocode and Explanation

In this section I just ref to paper because it explain much better than me, the basic idea is to first scan for possible cycles, then check for actual cycles:

Our synchronous algorithm is similar to Lins’ algorithm: when reference counts are decremented, we place potential roots of cyclic garbage into abuffer called Roots. Periodically,we process this buffer and look for cycles by subtracting internal reference counts.

There are two major changes that makethe algorithm linear time: first of all, we add a buffered flag to every object, which isused to prevent the same object being added to the root buffer more than once per cycle collection. This in turn places alinear bound on the size of the buffer.

Secondly,we analyze the entire transitiveclosure of Roots as asingle graph, rather than as aset of graphs. This means that the complexity of the algorithm islimited by the size of that transitiveclosure, which in turn islimited by $N + E$ (since Roots is bounded by $N$ by the use of the buffered flag). Of course, in practice we hope that the transitiveclosure will be significantly smaller.

In practice we found that the first change (the use of the buffered flag) made almost no difference in the running time of the algorithm; however,the second change (analyzing the entire graph at once) made an enormous difference in run-time. When we applied Lins’ algorithm unmodified to large programs, garbage collection delays extended into minutes.

There is also concurrent cycle collection, but that is not the main point today. Because the on-the-fly version is better both in performance and ease to implement.

Efficient On-the-Fly Cycle Collection

The basic idea is to use a "sliding view" method(which is a bit like "Copy on write"), and create a stable snapshot by adding a "LogPointer" to record the original status of object to create a virtual snapshot of the heap for cycle collection.

A crucial problem with repeated scanning arises when concurrent program threads modify the objects graph during the scan. This means that the collector cannot trust a scan to repeat the very same structure that a previous scan has traversed. Furthermore, as modifications occur concurrently with the scan, each specific scan cannot be guaranteed to view a consistent snapshot of the objects graph at any specific point in time. Such problems are the source of the two drawbacks of Bacon and Rajan’s on-the-fly cycle collector: a practical drawback and a theoretical one.

Data Struct

We can use work in #2908 for object layout, and add two new feature:"gc" and "on_the_fly_cc" which depend on "gc" feature:

#![cfg(feature = "gc")]

struct PyGcObject {
    header: PyGcHeader,
    obj: PyObject, //or directly use PyInner<Erased>
}

struct PyGcHeader {
    ref_cnt: RefCount,
    color: Color,
    #[cfg(feature = "on_the_fly_cc")]
    log_ptr: Option<LogPointer> // for on-the-fly dirty bit and pointer
}

And add a GcTrace trait should be implement on every type require GC.
edited: in#4180 add a field to PyInner instead which is clearer and easier to impl.

pub trait GcTrace {
    /// call tracer_fn for every GcOjbect owned by a dyn GcTrace Object
    /// # API Contract
    /// must make sure that every owned object is called with tracer_fn, or garbage collect won't act correctly.
    fn trace(&self, tracer_fn: &mut TracerFn);
}

/// A `TracerFn` is a callback function that is invoked for each `PyGcObjectRef` owned
/// by an instance of something.
pub type TracerFn<'a> = dyn FnMut(&mut PyGcObjectRef) + 'a;

Drawbacks, Rationale, and Alternatives

Alternative: impl #2908's CPython GC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest for comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions