Skip to content

Use an array instead of a dict for local variables #2355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 5, 2020

Conversation

coolreader18
Copy link
Member

@coolreader18 coolreader18 commented Dec 2, 2020

Benchmark #1: ./rustpython-norm benchmarks/pystone.py 50000
  Time (mean ± σ):      4.727 s ±  0.266 s    [User: 4.686 s, System: 0.015 s]
  Range (min … max):    4.416 s …  5.126 s    10 runs

Benchmark #2: ./rustpython-opt benchmarks/pystone.py 50000
  Time (mean ± σ):      3.875 s ±  0.211 s    [User: 3.847 s, System: 0.007 s]
  Range (min … max):    3.601 s …  4.261 s    10 runs

Summary
  './rustpython-opt benchmarks/pystone.py 50000' ran
    1.22 ± 0.10 times faster than './rustpython-norm benchmarks/pystone.py 50000'

Didn't get as much of a speedup as I was hoping for, but this should improve memory usage. Specifically, I was running into a problem where closures defined in a function would never get deallocated so it would really slow down after like 50 iterations. I think this should fix that, since now it's just a cell. It entire scope being kept around, including the function itself (locals()['bar'].outer_locals is locals()), making a reference cycle that would never dealloc.

def foo(big_object):
    def bar():
        use(big_object)
    # before, just because of bar, big_object would never be deallocated

for i in range(1000):
    inp = "x" * 10000
    foo(inp)

I wasn't able to get it to cause a slowdown on my machine with the master branch version of the binary, but the real code that this was happening with runs in wasm, so I assume that has something to do with it.

Edit: Oh, duh, here:

def foo(arg):
    def bar():
        pass

class Foo:
    def __del__(self):
        print("deleting")

for i in range(1000):
    foo(Foo())
$ benchmarks/rustpython-norm testt.py 
$ benchmarks/rustpython-opt testt.py 
deleting
deleting
deleting
deleting
deleting
deleting
deleting
deleting
deleting
<snip>

@@ -668,17 +669,24 @@ impl<C: Constant> CodeObject<C> {
}

pub fn map_bag<Bag: ConstantBag>(self, bag: &Bag) -> CodeObject<Bag::Constant> {
let map_names = |names: Box<[C::Name]>| {
names
.into_vec()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
impl<T> StackStack<T> {
pub fn append<F, R>(&mut self, x: &mut T, f: F) -> R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function itself seems like not appending something but the given f do that, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't totally sure on the naming, cause this is kinda a weird data structure. It calls f() in the context of x being appended to the stack; f can append more once inside that context but append is the one that actually calls push and pop. 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to call this but at lease append gives wrong sense of the role of the method. Giving something weird or long name might be better than append.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to with_append and added a doc comment.

@@ -275,6 +284,20 @@ impl<T: Clone> Dict<T> {
Ok(ret)
}

pub fn get2<K: DictKey>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get2 doesn't look like the best name for this.
This is a sort of chained get and only used by frame.rs. I think these things implies this is not an expected feature as a method.
How about exposing _get_inner as get_raw(or somewhat common term in hash map libraries) as its low-level api and making a helper funciton in frame.rs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it get_chain; I could make it pub(crate) if you want, but I think it's best to keep the dict implementation in the dict implementation- builtins::dict already has all the logic regarding exact instance vs subclass checking, so I think it makes sense to keep it there rather than move it to frame.

let scope = if self.code.flags.contains(bytecode::CodeFlags::NEW_LOCALS) {
scope.new_child_scope(&vm.ctx)
let locals = if self.code.flags.contains(bytecode::CodeFlags::NEW_LOCALS) {
vm.ctx.new_dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if locals is given but the flag is NEW_LOCALS? locals looks like not used in this case.
Is this possible scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cpython code for this basically ignores the passed locals if that happens:

    /* Most functions have CO_NEWLOCALS and CO_OPTIMIZED set. */
    if ((code->co_flags & (CO_NEWLOCALS | CO_OPTIMIZED)) ==
        (CO_NEWLOCALS | CO_OPTIMIZED))
        ; /* f_locals = NULL; will be set by PyFrame_FastToLocals() */
    else if (code->co_flags & CO_NEWLOCALS) {
        locals = PyDict_New();
        if (locals == NULL) {
            Py_DECREF(f);
            return NULL;
        }
        f->f_locals = locals;
    }
    else {
        if (locals == NULL)
            locals = globals;
        Py_INCREF(locals);
        f->f_locals = locals;
    }

Comment on lines +294 to +298
if let Some(x) = self._get_inner(vm, key, hash)? {
Ok(Some(x))
} else {
other._get_inner(vm, key, hash)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(x) = self._get_inner(vm, key, hash)? {
Ok(Some(x))
} else {
other._get_inner(vm, key, hash)
}
self._get_inner(vm, key, hash).or_else(|| other._get_inner(vm, key, hash))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_inner returns Result<Option>, I think if we wanted to use combinators it would be a bunch of transpose() calls so I figured this would be the cleanest way to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yep. I missed Result. But not sure about transpose() because Result is unwrappable.

How about this?

Ok(self._get_inner(vm, key, hash)?.or_else(|| other._get_inner(vm, key, hash)?))

Copy link
Member Author

@coolreader18 coolreader18 Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ? inside the or_else wouldn't return from the outer function, it would return from the closure

@coolreader18 coolreader18 force-pushed the coolreader18/funcvars branch from ed81105 to 217f2c3 Compare December 4, 2020 20:54
@coolreader18 coolreader18 force-pushed the coolreader18/funcvars branch from 217f2c3 to 04d50da Compare December 5, 2020 22:32
@coolreader18 coolreader18 force-pushed the coolreader18/funcvars branch from 04d50da to 34bb5f0 Compare December 5, 2020 22:37
@coolreader18 coolreader18 merged commit 51c2ec5 into master Dec 5, 2020
@coolreader18 coolreader18 deleted the coolreader18/funcvars branch December 5, 2020 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants