Skip to content

MAINT Adapt scheduling of PairwiseDistancesRadiusNeighborhood #22829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Mar 14, 2022

Reference Issues/PRs

Follow-up of #22320

What does this implement/fix? Explain your changes.

Test various configuration of scheduling for OpenMP and pick the best one using this script with mimalloc.

Results

main

6eb25ca8db

Raw results
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50     49.852703               0
1           2   100000  100000          50     25.524960               0
2           4   100000  100000          50     13.016662               0
3           8   100000  100000          50      6.593470               0
4          16   100000  100000          50      3.460975               0
5          32   100000  100000          50      1.926909               0
6          64   100000  100000          50      1.190385               0
7         128   100000  100000          50      3.591503               0
8           1   100000  100000         100     81.334735               0
9           2   100000  100000         100     42.853891               0
10          4   100000  100000         100     21.615465               0
11          8   100000  100000         100     10.993820               0
12         16   100000  100000         100      5.942740               0
13         32   100000  100000         100      3.217021               0
14         64   100000  100000         100      2.211893               0
15        128   100000  100000         100      4.553800               0
16          1   100000  100000         500    266.815465               0
17          2   100000  100000         500    136.234599               0
18          4   100000  100000         500     68.475497               0
19          8   100000  100000         500     34.828070               0
20         16   100000  100000         500     18.272350               0
21         32   100000  100000         500     10.241434               0
22         64   100000  100000         500      6.827677               0
23        128   100000  100000         500     12.496416               0
bce219e

bce219e

Raw results
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50     49.647817               0
1           2   100000  100000          50     25.303432               0
2           4   100000  100000          50     13.012075               0
3           8   100000  100000          50      6.557708               0
4          16   100000  100000          50      3.449857               0
5          32   100000  100000          50      1.995586               0
6          64   100000  100000          50      1.379598               0
7         128   100000  100000          50      3.237672               0
8           1   100000  100000         100     81.184791               0
9           2   100000  100000         100     43.860247               0
10          4   100000  100000         100     21.843178               0
11          8   100000  100000         100     10.951101               0
12         16   100000  100000         100      5.777386               0
13         32   100000  100000         100      3.126215               0
14         64   100000  100000         100      2.094214               0
15        128   100000  100000         100      4.396689               0
16          1   100000  100000         500    266.283220               0
17          2   100000  100000         500    136.463786               0
18          4   100000  100000         500     68.596914               0
19          8   100000  100000         500     34.902744               0
20         16   100000  100000         500     18.462613               0
21         32   100000  100000         500     10.252221               0
22         64   100000  100000         500      7.240885               0
23        128   100000  100000         500     12.582132               0

@jeremiedbb
Copy link
Member

jeremiedbb commented Mar 14, 2022

Your dataset is generated uniformly. I'd expect that in that case all tasks should be roughly the same and 'static' being overprforming. Could you run the same benchmark with a dataset with non-uniform density, i.e. regions with high density and regions with low density ?

@ogrisel
Copy link
Member

ogrisel commented Mar 14, 2022

make_blobs should work for this.

@jjerphan
Copy link
Member Author

Using bigger dataset ((n_train_samples, n_test_samples) = (1_000_000, 10_000)) generated with make_blobs using this new script show that limited difference between the two commits.

Results

main

6eb25ca8db

Raw results
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000  100000          50    465.744674               0
1           2  1000000  100000          50    238.000873               0
2           4  1000000  100000          50    122.582661               0
3           8  1000000  100000          50     62.315836               0
4          16  1000000  100000          50     33.552833               0
5          32  1000000  100000          50     18.391456               0
6          64  1000000  100000          50     11.659778               0
7         128  1000000  100000          50     15.090411               0
8           1  1000000  100000         100    683.785338               0
9           2  1000000  100000         100    356.785669               0
10          4  1000000  100000         100    180.691720               0
11          8  1000000  100000         100     91.533770               0
12         16  1000000  100000         100     47.840042               0
13         32  1000000  100000         100     25.850661               0
14         64  1000000  100000         100     16.302002               0
15        128  1000000  100000         100     21.486514               0
16          1  1000000  100000         500   2638.878539               0
17          2  1000000  100000         500   1345.797336               0
18          4  1000000  100000         500    676.665076               0
19          8  1000000  100000         500    343.195362               0
20         16  1000000  100000         500    180.101859               0
21         32  1000000  100000         500     99.880551               0
22         64  1000000  100000         500     65.878373               0
23        128  1000000  100000         500     76.451764               0
bce219e

bce219e

Raw results
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000  100000          50    426.701481               0
1           2  1000000  100000          50    217.417825               0
2           4  1000000  100000          50    110.665541               0
3           8  1000000  100000          50     56.944343               0
4          16  1000000  100000          50     29.309760               0
5          32  1000000  100000          50     16.003685               0
6          64  1000000  100000          50     10.157650               0
7         128  1000000  100000          50     13.362762               0
8           1  1000000  100000         100    756.731398               0
9           2  1000000  100000         100    403.638701               0
10          4  1000000  100000         100    197.599104               0
11          8  1000000  100000         100    101.587471               0
12         16  1000000  100000         100     53.660125               0
13         32  1000000  100000         100     28.843284               0
14         64  1000000  100000         100     18.140341               0
15        128  1000000  100000         100     22.790443               0
16          1  1000000  100000         500   2723.716804               0
17          2  1000000  100000         500   1378.848296               0
18          4  1000000  100000         500    695.409763               0
19          8  1000000  100000         500    354.146897               0
20         16  1000000  100000         500    186.446191               0
21         32  1000000  100000         500    104.105621               0
22         64  1000000  100000         500     65.748025               0
23        128  1000000  100000         500     71.299621               0

Since it scales similarly to PairwiseDistancesArgKmin, I think it's fair to use static scheduling every time. WDYT?

@ogrisel
Copy link
Member

ogrisel commented Apr 1, 2022

So no significant difference even for non-uniform distances.

+1 for using static to make the execution more deterministic between machine with different number of threads by avoid non-commutative rounding errors in the reduction operations.

@jjerphan jjerphan marked this pull request as ready for review April 1, 2022 14:42
@jeremiedbb
Copy link
Member

Also the benchmarks are a little bit in favor of "static" so I'm +1 as well

@jeremiedbb jeremiedbb merged commit 6d17cd2 into scikit-learn:main Apr 1, 2022
@jjerphan jjerphan deleted the pdr/radius-neighborhood-improve-scheduling branch April 1, 2022 15:37
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants