MAINT Adapt scheduling of `PairwiseDistancesRadiusNeighborhood` #22829

jjerphan · 2022-03-14T11:01:44Z

Reference Issues/PRs

Follow-up of #22320

What does this implement/fix? Explain your changes.

Test various configuration of scheduling for OpenMP and pick the best one using this script with mimalloc.

Results

main

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50     49.852703               0
1           2   100000  100000          50     25.524960               0
2           4   100000  100000          50     13.016662               0
3           8   100000  100000          50      6.593470               0
4          16   100000  100000          50      3.460975               0
5          32   100000  100000          50      1.926909               0
6          64   100000  100000          50      1.190385               0
7         128   100000  100000          50      3.591503               0
8           1   100000  100000         100     81.334735               0
9           2   100000  100000         100     42.853891               0
10          4   100000  100000         100     21.615465               0
11          8   100000  100000         100     10.993820               0
12         16   100000  100000         100      5.942740               0
13         32   100000  100000         100      3.217021               0
14         64   100000  100000         100      2.211893               0
15        128   100000  100000         100      4.553800               0
16          1   100000  100000         500    266.815465               0
17          2   100000  100000         500    136.234599               0
18          4   100000  100000         500     68.475497               0
19          8   100000  100000         500     34.828070               0
20         16   100000  100000         500     18.272350               0
21         32   100000  100000         500     10.241434               0
22         64   100000  100000         500      6.827677               0
23        128   100000  100000         500     12.496416               0

`bce219e`

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50     49.647817               0
1           2   100000  100000          50     25.303432               0
2           4   100000  100000          50     13.012075               0
3           8   100000  100000          50      6.557708               0
4          16   100000  100000          50      3.449857               0
5          32   100000  100000          50      1.995586               0
6          64   100000  100000          50      1.379598               0
7         128   100000  100000          50      3.237672               0
8           1   100000  100000         100     81.184791               0
9           2   100000  100000         100     43.860247               0
10          4   100000  100000         100     21.843178               0
11          8   100000  100000         100     10.951101               0
12         16   100000  100000         100      5.777386               0
13         32   100000  100000         100      3.126215               0
14         64   100000  100000         100      2.094214               0
15        128   100000  100000         100      4.396689               0
16          1   100000  100000         500    266.283220               0
17          2   100000  100000         500    136.463786               0
18          4   100000  100000         500     68.596914               0
19          8   100000  100000         500     34.902744               0
20         16   100000  100000         500     18.462613               0
21         32   100000  100000         500     10.252221               0
22         64   100000  100000         500      7.240885               0
23        128   100000  100000         500     12.582132               0

jeremiedbb · 2022-03-14T13:25:50Z

Your dataset is generated uniformly. I'd expect that in that case all tasks should be roughly the same and 'static' being overprforming. Could you run the same benchmark with a dataset with non-uniform density, i.e. regions with high density and regions with low density ?

ogrisel · 2022-03-14T14:03:34Z

make_blobs should work for this.

jjerphan · 2022-03-18T16:17:54Z

Using bigger dataset ((n_train_samples, n_test_samples) = (1_000_000, 10_000)) generated with make_blobs using this new script show that limited difference between the two commits.

Results

main

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000  100000          50    465.744674               0
1           2  1000000  100000          50    238.000873               0
2           4  1000000  100000          50    122.582661               0
3           8  1000000  100000          50     62.315836               0
4          16  1000000  100000          50     33.552833               0
5          32  1000000  100000          50     18.391456               0
6          64  1000000  100000          50     11.659778               0
7         128  1000000  100000          50     15.090411               0
8           1  1000000  100000         100    683.785338               0
9           2  1000000  100000         100    356.785669               0
10          4  1000000  100000         100    180.691720               0
11          8  1000000  100000         100     91.533770               0
12         16  1000000  100000         100     47.840042               0
13         32  1000000  100000         100     25.850661               0
14         64  1000000  100000         100     16.302002               0
15        128  1000000  100000         100     21.486514               0
16          1  1000000  100000         500   2638.878539               0
17          2  1000000  100000         500   1345.797336               0
18          4  1000000  100000         500    676.665076               0
19          8  1000000  100000         500    343.195362               0
20         16  1000000  100000         500    180.101859               0
21         32  1000000  100000         500     99.880551               0
22         64  1000000  100000         500     65.878373               0
23        128  1000000  100000         500     76.451764               0

`bce219e`

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000  100000          50    426.701481               0
1           2  1000000  100000          50    217.417825               0
2           4  1000000  100000          50    110.665541               0
3           8  1000000  100000          50     56.944343               0
4          16  1000000  100000          50     29.309760               0
5          32  1000000  100000          50     16.003685               0
6          64  1000000  100000          50     10.157650               0
7         128  1000000  100000          50     13.362762               0
8           1  1000000  100000         100    756.731398               0
9           2  1000000  100000         100    403.638701               0
10          4  1000000  100000         100    197.599104               0
11          8  1000000  100000         100    101.587471               0
12         16  1000000  100000         100     53.660125               0
13         32  1000000  100000         100     28.843284               0
14         64  1000000  100000         100     18.140341               0
15        128  1000000  100000         100     22.790443               0
16          1  1000000  100000         500   2723.716804               0
17          2  1000000  100000         500   1378.848296               0
18          4  1000000  100000         500    695.409763               0
19          8  1000000  100000         500    354.146897               0
20         16  1000000  100000         500    186.446191               0
21         32  1000000  100000         500    104.105621               0
22         64  1000000  100000         500     65.748025               0
23        128  1000000  100000         500     71.299621               0

Since it scales similarly to PairwiseDistancesArgKmin, I think it's fair to use static scheduling every time. WDYT?

ogrisel · 2022-04-01T14:40:22Z

So no significant difference even for non-uniform distances.

+1 for using static to make the execution more deterministic between machine with different number of threads by avoid non-commutative rounding errors in the reduction operations.

jeremiedbb · 2022-04-01T14:42:19Z

Also the benchmarks are a little bit in favor of "static" so I'm +1 as well

…ius neighborhood (scikit-learn#22829)

MAINT Convert all OpenMP scheduling to 'static'

bce219e

github-actions bot added module:metrics cython labels Mar 14, 2022

jjerphan mentioned this pull request Mar 14, 2022

PERF PairwiseDistancesReductions initial work #22587

Closed

ogrisel approved these changes Apr 1, 2022

View reviewed changes

jjerphan marked this pull request as ready for review April 1, 2022 14:42

jeremiedbb merged commit 6d17cd2 into scikit-learn:main Apr 1, 2022

jjerphan deleted the pdr/radius-neighborhood-improve-scheduling branch April 1, 2022 15:37

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022

MAINT Convert OpenMP scheduling to 'static' in pairwise distances rad…

72e8d48

…ius neighborhood (scikit-learn#22829)

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Apr 29, 2022

MAINT Convert OpenMP scheduling to 'static' in pairwise distances rad…

4a19af6

…ius neighborhood (scikit-learn#22829)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Adapt scheduling of `PairwiseDistancesRadiusNeighborhood` #22829

MAINT Adapt scheduling of `PairwiseDistancesRadiusNeighborhood` #22829

Uh oh!

jjerphan commented Mar 14, 2022 •

edited

Loading

Uh oh!

jeremiedbb commented Mar 14, 2022 •

edited

Loading

Uh oh!

ogrisel commented Mar 14, 2022

Uh oh!

jjerphan commented Mar 18, 2022

Uh oh!

ogrisel commented Apr 1, 2022

Uh oh!

jeremiedbb commented Apr 1, 2022

Uh oh!

Uh oh!

Uh oh!

MAINT Adapt scheduling of PairwiseDistancesRadiusNeighborhood #22829

MAINT Adapt scheduling of PairwiseDistancesRadiusNeighborhood #22829

Uh oh!

Conversation

jjerphan commented Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Results

main

bce219e

Uh oh!

jeremiedbb commented Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Mar 14, 2022

Uh oh!

jjerphan commented Mar 18, 2022

Results

main

bce219e

Uh oh!

ogrisel commented Apr 1, 2022

Uh oh!

jeremiedbb commented Apr 1, 2022

Uh oh!

Uh oh!

MAINT Adapt scheduling of `PairwiseDistancesRadiusNeighborhood` #22829

MAINT Adapt scheduling of `PairwiseDistancesRadiusNeighborhood` #22829

jjerphan commented Mar 14, 2022 •

edited

Loading

`bce219e`

jeremiedbb commented Mar 14, 2022 •

edited

Loading

`bce219e`