Refactor benchmarking suite to use Criterion and add microbenchmarks #2367

orf · 2020-12-11T00:33:04Z

This PR refactors the benchmark suite to use Criterion (https://docs.rs/criterion/0.3.3/criterion/).

I've completely re-written the benchmarks, which I don't think where running correctly before. Now every file inside benchmarks/ is both parsed and executed by rustpython + cpython and the results compared. Adding a new benchmark is as simple as adding a file to that directory, and any exceptions while executing the code are correctly displayed to the console rather than just panicking.

I've also added an extra benchmark that runs pystone three times with a different step count for both cpython and rustpython and compares the results.

I added the ability to run microbenchmarks quickly and easily. For example to benchmark context manager exceptions you could add:

from contextlib import contextmanager

@contextmanager
def try_catch(*args, **kwargs):
    try:
        yield
    except RuntimeError:
        pass

# ---

with try_catch():
    raise RuntimeError()

Only the part after the # --- will be benchmarked. Here are the results:

We could then also use this github action to run them as part of the CI steps in merge requests.

orf · 2020-12-11T17:51:22Z

Pystone comparison:

Parsing:

File comparisons:

And the full report:

criterion.zip

coolreader18

Wow, this is really great!! Definitely much cleaner than our old setup. I pushed a commit to still allow running pystone outside the context of the benchmarker, otherwise I think this is good to go!

coolreader18 · 2020-12-13T23:28:33Z

The main thing I use benchmarks for is determining whether a change I make is actually faster, and by how much. Usually, I just cargo build --release on master and move it to benches/rustpython-norm, then I do the same on my feature branch and move it to benches/rustpython-opt, and then I run something along the lines of hyperfine -L X norm,opt './rustpython-{X} benchmarks/pystone.py' -w2. Do you know if criterion has anything for that, benchmarking 2 versions of the program against each other?

orf · 2020-12-13T23:29:18Z

Thank you 😊! I wasn't really sure what to add to the micro-benchmarks so I kind of just guessed, but I figured we can add more in the future easily enough.

The benchmarks are not perfect right now - the cpython branch included parsing overhead while the rustpython one does not. If you're happy then we could merge this and work on fixing that later, I don't think the overhead is that significant.

orf · 2020-12-13T23:42:00Z

The main thing I use benchmarks for is determining whether a change I make is actually faster, and by how much. Usually, I just cargo build --release on master and move it to benches/rustpython-norm, then I do the same on my feature branch and move it to benches/rustpython-opt, and then I run something along the lines of hyperfine -L X norm,opt './rustpython-{X} benchmarks/pystone.py' -w2. Do you know if criterion has anything for that, benchmarking 2 versions of the program against each other?

It acts similar to the standard bench library. You'd run the benchmarks on master, then again on your branch. It would tell you the changes in throughout or latency between the two. Any subsequent changes you make would also be benchmarked.

But honestly I'm not sure how it would work with your specific workflow, but I do think that it's more extensible and provides a more intuitive way of exploring results than the the standard bench library.

Plus it doesn't need nighty, so... 🎉🎉

coolreader18 · 2020-12-14T01:28:14Z

Yeah, that definitely sucks that there's no easy API in the cpython crate to compile and then run a code object. Maybe we could pass it as a local variable to a script that runs exec(code), or maybe we could just use the lower level APIs from cpython.

Yeah, I think criterion is definitely better than the nightly benching apis, but I wasn't able to use those the way I do for this and I probably won't be able to use criterion either. Although, maybe I could compile with cargo bench --no-run and then compare those binaries' outputs.

coolreader18 · 2020-12-14T01:29:00Z

Thanks for contributing!

mireille-raad · 2020-12-16T18:22:49Z

If it you want, i can create a web page on rustpython.github.io called "benchmarks".

If you can automate the benchmark to run and spit out some images in a directory in https://github.com/RustPython/rustpython.github.io , then it would be trivial to show those on a webpage.

is this possible/neat or unnecessary (and demotivating)?

youknowone · 2020-12-21T11:06:06Z

@mireille-raad I think that will be a fun feature. So one upvote. But it also can be demotivating too :p

coolreader18 · 2020-12-21T18:38:04Z

I also think that would be a good idea it would be a good way of tracking performance improvements we make over time. In retrospect, it would've been nice to collect data on the perf changes I've been making over the last couple months, but hindsight is 20/20 🙃

mireille-raad · 2021-01-08T16:19:31Z

well, let's do this then :) I will start a new issue :D

orf added 11 commits December 11, 2020 00:28

Refactor benchmarking suite to use Criterion

5788e7b

Remove dependency

3a85967

Reduce to 30k

a788df4

Remove pystone second run

6a841c0

Switch to 500

184f9aa

Fix strings

9f04f37

Improve benchmarks

a605b93

fmt

1e4fc5c

Exclude benchmarks from flake8

e745805

Fix loading

c447ffd

Switch benchmarkid

bfd05ee

Add microbenchmarks

dde8af1

orf changed the title ~~Refactor benchmarking suite to use Criterion~~ Refactor benchmarking suite to use Criterion and add microbenchmarks Dec 13, 2020

orf and others added 4 commits December 13, 2020 17:07

Add microbenchmarks

eb22ba1

Fix micro benchmarks

7818fd2

Ignore benches

d36e150

Allow running pystone standalone

b221dbb

coolreader18 approved these changes Dec 13, 2020

View reviewed changes

coolreader18 merged commit c9a332c into RustPython:master Dec 14, 2020

orf deleted the criterion branch December 14, 2020 13:24

mireille-raad mentioned this pull request Dec 16, 2020

Micro-benchmarks for performance critical operations #644

Closed

mireille-raad mentioned this pull request Jan 8, 2021

Automate generation of benchmarks and display them on website #2396

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor benchmarking suite to use Criterion and add microbenchmarks #2367

Refactor benchmarking suite to use Criterion and add microbenchmarks #2367

Uh oh!

orf commented Dec 11, 2020 •

edited

Loading

Uh oh!

orf commented Dec 11, 2020 •

edited

Loading

Uh oh!

coolreader18 left a comment

Uh oh!

coolreader18 commented Dec 13, 2020

Uh oh!

orf commented Dec 13, 2020

Uh oh!

orf commented Dec 13, 2020

Uh oh!

coolreader18 commented Dec 14, 2020

Uh oh!

coolreader18 commented Dec 14, 2020

Uh oh!

mireille-raad commented Dec 16, 2020

Uh oh!

youknowone commented Dec 21, 2020

Uh oh!

coolreader18 commented Dec 21, 2020

Uh oh!

mireille-raad commented Jan 8, 2021

Uh oh!

Uh oh!

Refactor benchmarking suite to use Criterion and add microbenchmarks #2367

Refactor benchmarking suite to use Criterion and add microbenchmarks #2367

Uh oh!

Conversation

orf commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orf commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coolreader18 left a comment

Choose a reason for hiding this comment

Uh oh!

coolreader18 commented Dec 13, 2020

Uh oh!

orf commented Dec 13, 2020

Uh oh!

orf commented Dec 13, 2020

Uh oh!

coolreader18 commented Dec 14, 2020

Uh oh!

coolreader18 commented Dec 14, 2020

Uh oh!

mireille-raad commented Dec 16, 2020

Uh oh!

youknowone commented Dec 21, 2020

Uh oh!

coolreader18 commented Dec 21, 2020

Uh oh!

mireille-raad commented Jan 8, 2021

Uh oh!

Uh oh!

orf commented Dec 11, 2020 •

edited

Loading

orf commented Dec 11, 2020 •

edited

Loading