Skip to content

BENCH: Add a benchmark comparing block to copy in the 3D case #11965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 19, 2018

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented Sep 16, 2018

Enhances the existing block3D benchmark to show the performance difference between block and direct memory of each individual blocks.

The way the test is written, a secondary plot on the same graph showing block should show copy allowing us to easily compare the two.

This should help motivate the design for an improved blocking algorithm for case where users are blocking 3D cubes.

===== ============ =============
--               mode           
----- --------------------------
n      block          copy    
===== ============ =============
1    39.9±0.1μs   3.44±0.02μs 
10   209±200μs     38.6±0.6μs 
100   1.06±0.03s     390±6ms   
===== ============ =============

@eric-wieser
Copy link
Member

There are already benchmarks for block in another file - can extend those instead of making a new suite?

@hmaarrfk
Copy link
Contributor Author

Sorry, I'm really bad at navigating this codebase. Thanks!.

@eric-wieser
Copy link
Member

I'd recommend that before adding a new benchmark / test, you search the existing benchmarks / tests folders for places that already call the function you want to profile/test.

@hmaarrfk
Copy link
Contributor Author

I had found the ones for masked arrays, but not the ones for dense arrays. :/

if mode == 'block':
np.block(self.block)
else:
np.concatenate(self.arr_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal here of testing concatenate within an np.block test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why use a mode parameter rather than a separate benchmark function? Does this give a more useful asv result if the asv name is the same?

Some comments answering those questions in the code might be a good thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly so that we can directly compare the performance of block and concatenate on the same graph. I forget where, but you can see some graphs. Therefore, when creating the new algorithm, we should be able to stop optimizing once we are close to the performance of concatenate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good reason to me

# allows us to directly compare the benchmark of block
# to that of `concatenate` with the ASV framework.
# block and concatenate will be plotted on the same graph
# as opposed to being displayed as seperate benchmarks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: separate

@hmaarrfk
Copy link
Contributor Author

Thanks for having me extend this benchmark @eric-wieser I think we can essentially use this benchmark's algorithm with a reshape to get block done in 1 shot.

@eric-wieser
Copy link
Member

I think we can essentially use this benchmark's algorithm with a reshape

I find it very unlikely that that would work. You're right that we could do it in one allocation, but it will require some careful juggling of slices.

@eric-wieser
Copy link
Member

@pv: Is there any easy way to make this continue the existing benchmark graphs, rather than resulting in a benchmark with a new name?

@pv
Copy link
Member

pv commented Sep 16, 2018

I don't think there is.

@hmaarrfk
Copy link
Contributor Author

Apparently there is a way to build history for a nee benchmark. But I’m not too sure how

@charris charris changed the title Add a benchmark for concatenate and block ENH: Add a benchmarks for concatenate and block Sep 17, 2018
@hmaarrfk
Copy link
Contributor Author

@eric-wieser I would almost like to change this from calling concatenate to calling arr.copy() on all the arrays provided.

The goal is to show that blocking is a memory bandwidth limited algorithm.

Anyway, PR #11971 gets pretty close to the copy limit.

===== ============= ========== ========== ========== ==========
--                                     dtype                   
------------------- -------------------------------------------
n        mode       uint8      uint16     uint32     uint64  
===== ============= ========== ========== ========== ==========
1       block      180±0μs    172±0μs    183±0μs    171±0μs  
1    concatenate   47.7±0μs   50.1±0μs   47.2±0μs   45.9±0μs 
1        copy      30.4±0μs   34.6±0μs   31.5±0μs   36.1±0μs 

10      block      253±0μs    277±0μs    390±0μs    582±0μs  
10   concatenate   88.9±0μs   128±0μs    217±0μs    404±0μs  
10       copy      65.7±0μs   96.0±0μs   150±0μs    287±0μs  

50      block      6.77±0ms   14.6±0ms   27.6±0ms   52.3±0ms 
50   concatenate   6.31±0ms   12.7±0ms   26.8±0ms   53.7±0ms 
50       copy      6.50±0ms   13.2±0ms   27.1±0ms   54.6±0ms 

75      block      23.5±0ms   45.4±0ms   91.1±0ms   181±0ms  
75   concatenate   22.8±0ms   46.5±0ms   91.8±0ms   181±0ms  
75       copy      29.7±0ms   52.2±0ms   95.7±0ms   188±0ms  

100      block      57.8±0ms   121±0ms    208±0ms    394±0ms  
100   concatenate   55.8±0ms   110±0ms    236±0ms    420±0ms  
100       copy      63.2±0ms   115±0ms    212±0ms    426±0ms  

150      block      181±0ms    352±0ms    700±0ms    1.44±0s  
150   concatenate   181±0ms    350±0ms    707±0ms    1.41±0s  
150       copy      183±0ms    355±0ms    706±0ms    1.47±0s  

200      block      435±0ms    851±0ms    1.62±0s    3.45±0s  
200   concatenate   439±0ms    881±0ms    1.73±0s    3.50±0s  
200       copy      442±0ms    890±0ms    1.75±0s    3.54±0s  
===== ============= ========== ========== ========== ==========

@mattip
Copy link
Member

mattip commented Sep 21, 2018

Is this ready to be merged?

@hmaarrfk hmaarrfk changed the title ENH: Add a benchmarks for concatenate and block ENH: Add a benchmark comparing block to copy in the 3D case Sep 21, 2018
@eric-wieser
Copy link
Member

@mattip: My worry here is that we break contiguity in our benchmark history by doing this - something we deliberately considered when we introduced the Block3D test.

@hmaarrfk
Copy link
Contributor Author

@eric-wieser why is there going to be a new name for the benchmark? Do additional parameters trigger a new name?

@eric-wieser
Copy link
Member

@hmaarrfk: Yes, I believe they do

@hmaarrfk
Copy link
Contributor Author

:(

@eric-wieser
Copy link
Member

An easy workaround would be to just expose the test twice, once under the old name and once under the new name. Failing that, we might want to look into patching asv to allow a params -> name function to be provided.

@hmaarrfk
Copy link
Contributor Author

What happens when we add a new benchmark? Is history created automatically or will it start from "this new commit"

@hmaarrfk
Copy link
Contributor Author

@eric-wieser, If we don't want to create a new benchmark, we would need to run the command:

asv run -E conda:3.6 -b Block.time_3d EXISTING 

to rebuild the history for all the commits that currently have benchmarks. I'm not sure if that is a possiility.

@charris charris changed the title ENH: Add a benchmark comparing block to copy in the 3D case BENCH: Add a benchmark comparing block to copy in the 3D case Oct 10, 2018
@mattip mattip force-pushed the bench_concatenate branch from c2086e7 to f37b0c6 Compare October 19, 2018 08:12
@mattip
Copy link
Member

mattip commented Oct 19, 2018

rebased off master

@mattip
Copy link
Member

mattip commented Oct 19, 2018

As discussed at the Oct 17 weekly status meeting, the general opinion was that breaking continuity is not a PR blocker. It is relatively easy to rerun newer benchmarks on older code to determine a baseline and compare across the gap, especially since this PR only touches the benchmark itself.

@mattip mattip merged commit 8d7b7b5 into numpy:master Oct 19, 2018
@mattip
Copy link
Member

mattip commented Oct 19, 2018

Thanks @hmaarrfk

@hmaarrfk hmaarrfk deleted the bench_concatenate branch November 5, 2018 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants