Skip to content

Object representation #371

Open
Open
@JohnBSmith

Description

@JohnBSmith

Your representation of objects is of particular bad design.

Let's examine how much memory we need to store an integer on the stack frame or in a vector. You need to store PyObjectRef, that's Rc<RefCell<PyObject>>, that's one pointer (64 bit), one strong reference counter (64 bit), one weak reference counter (64 bit), one borrow counter (64 bit). The strong and weak reference counters and the borrow counter are to be initialized each time a pointer is created.

The type PyObject consists of payload: PyObjectPayload and typ: Option<PyObjectRef>. That's again a pointer (64 bit), the Option::None is stored as a pointer to NULL, as PyObjectRef is non-NULL. The type PyObjectPayload is a large enum, 8 up to 64 bit for a tag, and the variant Integer stores BigInt. BigInt consists of sign: Sign (8 up to 64 bit) and data: BigUint (Vec<BigDigit>) (64 bit pointer, 64 bit capacity, 64 bit size). A BigDigit is an u32.

That's 70 Byte up to 84 Byte for each integer. There are three(!) pointer indirections and eleven(!) memory slots to be initialized. Fragmentation of the allocatator will result in catastrophically bad cache locality. At object destruction, the PyObjectPayload enum is branched (hopefully by a branchtable), but as the number of variants is large, it will be a subroutine call (otherwise the branchtable is shadowed by another branch).

Now consider the following object representation:

trait RefTrait {}
type DynRef = dyn RefTrait;
// You can create your own vtables instead of using dyn,
// given there is an unsolveable issue with dyn.

enum PyObject {
    None,
    Bool(bool),
    Int(i64),
    Float(f64),
    Complex(Complex64), // (f64,f64)
    Ref(Rc<RefCell<DynRef>>)
}

The tag stores 8 up to 64 bit. The maximal variants are (f64,f64) and (Rc,VTablePointer), that's two times 64 bit. This results in 17 Byte up to 24 Byte. Only two memory slots need to be initialized. No memory allocations. Best cache locality.

At object destruction, only one branch is needed to determine whether the polymorphic destructor of variant Ref is to be called or not. This branch should be inlined.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-designAbout RustPython's own implementation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions