ENH Pairwise Distances ArgKmin #21462

jjerphan · 2021-10-26T08:12:15Z

Reference Issues/PRs

Supersedes #20254.

Context

Reductions over pairwise distances (argmin, argkmin, threshold filtering, cumulative sum, etc.) are one of the foundations used by many algorithms within scikit-learn.

Various strategies exist to compute them differently, including some using chunks (see sklearn/metrics/pairwise.py).

However, those are still mainly implemented at the level of Python leaving some rooms for manœuvre, especially for potential optimizations which can be made at a lower level.

Proposals of this PR

This PR proposes:

introducing PairwiseDistancesReduction, an abstraction allowing computing reductions efficiently in parallel
introducing PairwiseDistancesArgkmin, a first implementation of PairwiseDistancesReduction for k-Nearest Neighbors search
introduces FastEuclideanPairwiseDistancesArgKmin a specialization of PairwiseDistancesArgkmin for the euclidean and squared euclidean distances which uses the GEMM trick
introducing DatasetsPair, an abstraction to wrap a pair of two datasets and compute their vectors pairwise distances
DenseDenseDatasetsPair, a first implementation of DatasetsPair for pair of two dense datasets
adapting existing interfaces implementations to use PairwiseDistancesArgkmin when and where possible
adapting existing tests and introducing new ones for existing and for the new implementations

Results

Scalability results (I beg readers' pardon for my plotting skills)

1.0 implementation do not scale well (not even sublinearly).

This PR implementation does scales linearly up to 64 threads.

Results are obtained via this script.

This PR @ `19dd7ca`

(n_train, n_test, k) = (1 000 000, 10 000, 10)

(n_train, n_test, k) = (100 000, 100 000, 10)

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50     53.319188               0
1           2   100000  100000          50     27.118464               0
2           4   100000  100000          50     13.899768               0
3           8   100000  100000          50      6.993012               0
4          16   100000  100000          50      3.801222               0
5          32   100000  100000          50      2.243082               0
6          64   100000  100000          50      1.370284               0
7         128   100000  100000          50      2.007565               0
8           1   100000  100000         100     77.488982               0
9           2   100000  100000         100     39.340334               0
10          4   100000  100000         100     20.142855               0
11          8   100000  100000         100     10.352673               0
12         16   100000  100000         100      5.556708               0
13         32   100000  100000         100      3.059276               0
14         64   100000  100000         100      2.183617               0
15        128   100000  100000         100      3.138607               0
16          1   100000  100000         500    265.572455               0
17          2   100000  100000         500    133.960014               0
18          4   100000  100000         500     68.229143               0
19          8   100000  100000         500     35.094786               0
20         16   100000  100000         500     18.780773               0
21         32   100000  100000         500     10.413006               0
22         64   100000  100000         500      6.948780               0
23        128   100000  100000         500     11.336498               0
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000   10000          50     54.181024               0
1           2  1000000   10000          50     27.885327               0
2           4  1000000   10000          50     14.291922               0
3           8  1000000   10000          50      7.112637               0
4          16  1000000   10000          50      3.886164               0
5          32  1000000   10000          50      2.156768               0
6          64  1000000   10000          50      1.375637               0
7         128  1000000   10000          50      1.563360               0
8           1  1000000   10000         100     77.440228               0
9           2  1000000   10000         100     39.710930               0
10          4  1000000   10000         100     20.238788               0
11          8  1000000   10000         100     10.565105               0
12         16  1000000   10000         100      5.549846               0
13         32  1000000   10000         100      3.070918               0
14         64  1000000   10000         100      1.955702               0
15        128  1000000   10000         100      2.145065               0
16          1  1000000   10000         500    266.191423               0
17          2  1000000   10000         500    136.932209               0
18          4  1000000   10000         500     70.186863               0
19          8  1000000   10000         500     35.348363               0
20         16  1000000   10000         500     18.673352               0
21         32  1000000   10000         500     10.232006               0
22         64  1000000   10000         500      7.026281               0
23        128  1000000   10000         500      7.118796               0

1.0 (`9b03375`)

(n_train, n_test, k) = (1 000 000, 10 000, 10)

(n_train, n_test, k) = (100 000, 100 000, 10)

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50    155.998673               0
1           2   100000  100000          50    178.260460               0
2           4   100000  100000          50    162.936344               0
3           8   100000  100000          50    157.636250               0
4          16   100000  100000          50    159.273397               0
5          32   100000  100000          50    160.395220               0
6          64   100000  100000          50    163.525406               0
7         128   100000  100000          50    197.601314               0
8           1   100000  100000         100    176.557523               0
9           2   100000  100000         100    186.601761               0
10          4   100000  100000         100    168.457660               0
11          8   100000  100000         100    158.684159               0
12         16   100000  100000         100    160.464695               0
13         32   100000  100000         100    162.835237               0
14         64   100000  100000         100    167.569254               0
15        128   100000  100000         100    181.630751               0
16          1   100000  100000         500    348.425143               0
17          2   100000  100000         500    277.428042               0
18          4   100000  100000         500    215.084920               0
19          8   100000  100000         500    187.254540               0
20         16   100000  100000         500    179.741570               0
21         32   100000  100000         500    174.659939               0
22         64   100000  100000         500    190.489453               0
23        128   100000  100000         500    227.535006               0
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000   10000          50    155.084480               0
1           2  1000000   10000          50    189.289008               0
2           4  1000000   10000          50    174.326572               0
3           8  1000000   10000          50    169.130710               0
4          16  1000000   10000          50    169.489740               0
5          32  1000000   10000          50    169.501073               0
6          64  1000000   10000          50    176.410910               0
7         128  1000000   10000          50    188.036263               0
8           1  1000000   10000         100    198.885173               0
9           2  1000000   10000         100    217.164905               0
10          4  1000000   10000         100    194.695082               0
11          8  1000000   10000         100    184.083839               0
12         16  1000000   10000         100    184.801623               0
13         32  1000000   10000         100    178.922246               0
14         64  1000000   10000         100    189.154418               0
15        128  1000000   10000         100    201.639427               0
16          1  1000000   10000         500    425.073250               0
17          2  1000000   10000         500    331.314108               0
18          4  1000000   10000         500    254.123434               0
19          8  1000000   10000         500    203.032605               0
20         16  1000000   10000         500    194.956694               0
21         32  1000000   10000         500    194.276188               0
22         64  1000000   10000         500    190.638631               0
23        128  1000000   10000         500    191.943951               0

Benchmarks information

Machine specification

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           2
NUMA node(s):        2
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               49
Model name:          AMD EPYC 7742 64-Core Processor
Stepping:            0
CPU MHz:             3388.360
BogoMIPS:            4491.59
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-63,128-191
NUMA node1 CPU(s):   64-127,192-255
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

Benchmarks results between 1.0 (`e7fb5b8`) and this PR @ `19dd7ca` (via `5f3f389`)

Between ×1.2 and ×20 speed-ups in normal configurations.
Small regression when using 1 core on small datasets.

1 core

       before           after         ratio
     [e7fb5b8c]       [19dd7cab]
     <main>           <benchmarks/pairwise-distances-argkmin~6>
+         286±5ms          374±5ms     1.31  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_locally_linear_embedding(1000, 1000, 100)                                                                                                     
-         1.22±0s       1.10±0.01s     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)                                                                                                                      
-      9.59±0.03s       8.28±0.06s     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_locally_linear_embedding(1000, 100000, 100)                                                                                                   
-      20.4±0.04s       16.6±0.05s     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 100000, 100)                                                                                                                     
-      6.40±0.02s       5.21±0.02s     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)                                                                                                                     
-         241±2ms          195±3ms     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 10000, 100)                                                                                                                       
-       124±0.9ms         90.9±1ms     0.73  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)                                                                                                   
-      1.22±0.01s         874±10ms     0.72  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)                                                                                              
-      12.1±0.03s       8.64±0.01s     0.71  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)                                                                                             
-      12.2±0.02s       8.68±0.01s     0.71  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)                                                                                                 
-      1.44±0.01s       1.02±0.01s     0.71  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 100000, 100)                                                                                                                      
-      1.23±0.01s          868±8ms     0.71  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)                                                                                                  
-         126±1ms         87.7±2ms     0.70  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)                                                                                               
-         133±1ms         90.6±1ms     0.68  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)                                                                                               
-         133±1ms         90.5±1ms     0.68  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)                                                                                                   
-      1.34±0.01s          905±7ms     0.68  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)                                                                                              
-      1.34±0.01s          894±7ms     0.67  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)                                                                                                  
-      22.9±0.4ms       12.6±0.4ms     0.55  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)                                                                                                            
-         2.35±0s       1.11±0.01s     0.47  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)                                                                                                          
-         232±2ms          109±2ms     0.47  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)                                                                                                           
-         230±2ms         92.2±2ms     0.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)                                                                                                           
-      22.5±0.02s       8.96±0.03s     0.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)                                                                                                         
-      2.25±0.01s          896±8ms     0.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

2 cores

       before           after         ratio
     [e7fb5b8c]       [19dd7cab]
     <main>           <pairwise-distances-argkmin>
-      1.51±0.01s       1.36±0.01s     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-      10.5±0.01s       9.35±0.02s     0.89  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 10000, 100)
-         17.0±0s          14.9±0s     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_locally_linear_embedding(1000, 100000, 100)
-      9.21±0.01s       7.65±0.04s     0.83  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-         317±1ms        241±0.7ms     0.76  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 10000, 100)
-     12.6±0.04ms      9.33±0.08ms     0.74  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-     12.7±0.05ms      9.32±0.08ms     0.74  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      21.9±0.01s       15.5±0.01s     0.71  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 100000, 100)
-         1.52±0s          880±1ms     0.58  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 100000, 100)
-         1.27±0s          704±7ms     0.55  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-       133±0.4ms       72.9±0.8ms     0.55  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-         1.27±0s          699±8ms     0.55  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-       133±0.1ms       72.7±0.8ms     0.55  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      12.7±0.01s          6.92±0s     0.54  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-         12.7±0s       6.85±0.01s     0.54  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-       138±0.2ms       74.1±0.4ms     0.54  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-       139±0.2ms       74.0±0.1ms     0.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-         1.35±0s          704±2ms     0.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         1.35±0s        702±0.8ms     0.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-     21.6±0.04ms      11.2±0.06ms     0.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-         2.24±0s          805±3ms     0.36  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-       229±0.9ms      81.7±0.08ms     0.36  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-       234±0.1ms       76.3±0.4ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         2.22±0s          717±8ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-         22.1±0s          7.00±0s     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

4 cores

To come soon.

8 cores

       before           after         ratio
     [e7fb5b8c]       [19dd7cab]
     <main>           <pairwise-distances-argkmin>
-         1.47±0s       1.31±0.01s     0.89  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-         16.6±0s          14.6±0s     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_locally_linear_embedding(1000, 100000, 100)
-         9.62±0s       8.28±0.01s     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 10000, 100)
-      8.63±0.01s          7.07±0s     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-       283±0.8ms        202±0.3ms     0.71  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 10000, 100)
-    10.00±0.01ms      6.50±0.01ms     0.65  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-    10.00±0.01ms      6.40±0.01ms     0.64  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      17.4±0.01s          9.97±0s     0.57  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 100000, 100)
-      19.9±0.2ms      8.07±0.03ms     0.41  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-         1.12±0s          371±1ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 100000, 100)
-      97.9±0.2ms      22.8±0.02ms     0.23  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      97.4±0.2ms      22.6±0.02ms     0.23  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-       870±0.6ms       197±0.05ms     0.23  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-       869±0.7ms          195±1ms     0.22  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-       106±0.1ms      22.7±0.04ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-       106±0.1ms      22.5±0.03ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-       963±0.9ms        203±0.5ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-       963±0.6ms        203±0.6ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-       203±0.2ms      25.4±0.03ms     0.13  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-         1.85±0s        232±0.1ms     0.13  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-       206±0.2ms      25.3±0.06ms     0.12  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.81±0s       201±0.05ms     0.11  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

16 cores

       before           after         ratio
     [e7fb5b8c]       [19dd7cab]
     <main>           <pairwise-distances-argkmin>
-         16.4±0s       14.5±0.01s     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_locally_linear_embedding(1000, 100000, 100)
-      9.57±0.02s       8.13±0.01s     0.85  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 10000, 100)
-       2.84±0.3s       2.39±0.01s     0.84  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
-      8.63±0.02s       7.05±0.03s     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-      2.73±0.03s        2.18±0.1s     0.80  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 1000, 100)
-         285±4ms        224±0.6ms     0.79  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 10000, 100)
-       111±0.7ms        85.0±20ms     0.77  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-      2.72±0.03s       2.07±0.03s     0.76  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 10000, 100)
-       2.78±0.3s       2.10±0.02s     0.75  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 100000, 100)
-     10.3±0.01ms      6.84±0.02ms     0.66  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-       17.2±0.1s       9.04±0.02s     0.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 100000, 100)
-         105±3ms       45.5±0.2ms     0.43  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-     20.4±0.07ms      8.38±0.03ms     0.41  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-         206±3ms       58.8±0.2ms     0.29  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-       101±0.5ms       27.0±0.1ms     0.27  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-      1.09±0.01s        278±0.4ms     0.25  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 100000, 100)
-         101±1ms      15.4±0.05ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-         849±7ms        123±0.1ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-        919±40ms        124±0.4ms     0.13  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         924±4ms        112±0.6ms     0.12  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-        931±10ms        111±0.4ms     0.12  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-         209±1ms      17.7±0.05ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-      1.79±0.02s        142±0.3ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-      1.85±0.05s        129±0.1ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

20 cores

       before           after         ratio
     [e7fb5b8c]       [19dd7cab]
     <main>           <pairwise-distances-argkmin>
-      16.3±0.02s          14.4±0s     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_locally_linear_embedding(1000, 100000, 100)
-      9.13±0.02s       8.05±0.03s     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 1000, 100)
-         8.59±0s       6.96±0.02s     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-       279±0.7ms        225±0.2ms     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 10000, 100)
-      10.0±0.05s       8.05±0.01s     0.80  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 10000, 100)
-     9.38±0.02ms      6.92±0.02ms     0.74  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-     9.42±0.02ms       6.65±0.2ms     0.71  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-       17.2±0.2s       8.75±0.01s     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(10000, 100000, 100)
-       102±0.3ms       46.1±0.3ms     0.45  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-       104±0.6ms       46.3±0.1ms     0.45  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-      19.7±0.1ms      8.47±0.04ms     0.43  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-       200±0.8ms       59.3±0.2ms     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-         1.09±0s        264±0.5ms     0.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 100000, 100)
-      93.5±0.7ms      11.5±0.04ms     0.12  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      94.2±0.4ms      11.2±0.05ms     0.12  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-        844±10ms       88.1±0.4ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-        864±20ms       88.4±0.2ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-        963±20ms       94.9±0.7ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         932±2ms       90.7±0.2ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-       202±0.3ms      13.8±0.06ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-      1.87±0.02s        108±0.3ms     0.06  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-      1.84±0.04s        105±0.5ms     0.06  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Benchmarks information

Machine specification

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Stepping:            4
CPU MHz:             1000.223
BogoMIPS:            4600.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            16896K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsave

Any other comments?

#20254 originally proposed more changes.

Those changes have been removed from the branch (in fa424a4 and 5678666) but will be introduced in follow-up PRs.

Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

Introduce PairwiseDistancesReduction.is_usable_for encapsulating the test for branching logic. Also removed unused code due to new implementation.

StdVectorSentinel makes a proper life-cycle management for std::vectors' buffers possible. Duplication seems needed as fused types can't be used as attributes. It's possible to obtain a missing symbol (`_ZSt28__throw_bad_array_new_lengthv`) at runtime. This is unrelated to the implementation here, and there are issues reporting the problem, e.g.: cython/cython#4218. A temporary workaround: stan-dev/pystan#294 (comment)

Also removes _finalise_compute.

Add RadiusNeighborhood as a PairwiseDistancesReduction

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Probably this should get its own test.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

This was mentionned by one of Oliviers' review but I forgot it. Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

…cikit-learn#21501) This modify the test configuration so that it makes sense for when a sole sample is provided for MeanShift. This test was passing previously for this configuration but was not supposed to. The new implementation strategy for kneighbors which uses PairwiseDistancesArgKmin (see scikit-learn#21462) is numerically stabler for this case, motivating this modication. Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

…21501) This modify the test configuration so that it makes sense for when a sole sample is provided for MeanShift. This test was passing previously for this configuration but was not supposed to. The new implementation strategy for kneighbors which uses PairwiseDistancesArgKmin (see #21462) is numerically stabler for this case, motivating this modication. Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

ogrisel · 2022-02-11T15:34:53Z

I did some quick benchmark with a k-NN classifier on my Apple M1 laptop:

`main`

single thread (OMP_NUM_THREADS=1):

n_samples_train: 1000000, n_samples_test: 1000, acc: 0.994, duration: 18.120 s

all available cores (default)

n_samples_train: 1000000, n_samples_test: 1000, acc: 0.994, duration: 14.243 s

`#21462`

single thread (OMP_NUM_THREADS=1):

n_samples_train: 1000000, n_samples_test: 1000, acc: 0.994, duration: 5.670 s

all available cores (default):

n_samples_train: 1000000, n_samples_test: 1000, acc: 0.994, duration: 1.196 s

This machine just has 4 performance cores and 4 efficiency cores, so a speed up of ~12x is a nice surprise :) As expected this PR both improves both the hardware scalability (more efficient use of the cores) and the CPU cache (to avoid io bottlenecks).

ogrisel

I fixed the merged conflicts but I realized I made a mistake in the ordering of the sections of the changelog. Let me fix it quickly.

sklearn/neighbors/tests/test_neighbors.py

lorentzenchr · 2022-02-17T09:51:19Z

#22134 is merged

ogrisel · 2022-02-17T10:33:17Z

Ah sorry for commenting here. I mixed it up with #22134.

jjerphan and others added 30 commits July 9, 2021 17:23

Remove duplicate code to vector-to-ndarray coercion

40e369d

Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

Use consistent names

a3de08a

Plug RadiusNeighborhood in RadiusNeighborsMixin.radius_neighbors

81b1b1b

Introduce PairwiseDistancesReduction.is_usable_for encapsulating the test for branching logic. Also removed unused code due to new implementation.

Correctly free vectors using del

b8fe6e1

Remove temporary consistency test

9df4047

Add comments

7c713a1

Also removes _finalise_compute.

Merge pull request #3 from jjerphan/pairwise_aggregation_cython-radius

7ea6daa

Add RadiusNeighborhood as a PairwiseDistancesReduction

Merge branch 'main' into pairwise_aggregation_cython

8e54572

Revert to 'euclidean' when 'fast_sqeuclidean' can't be used

1d3336d

Use 'fast_sqeuclidean' for Birch internals

d96b163

Fix PairwiseDistancesReduction.is_usable

f660dbf

Make array C-ordered for test

bb24d95

Annotate with cython.final when relevant

6eea1aa

Black contains all the color that I like

15c4150

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Use relative imports

a9706d6

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Use method for flatten

0935975

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Correct cross-referencing for metrics.DistanceMetric

e2b5398

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Precise that p is the parameter used by 'minkowski'

ce1ccdc

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Prefer assert_allclose over assert_array_equal

a9fe71f

Prefer csr_matrix.toarray over csr_matrix.A

1593dab

Rework test for correct behavior regarding the radius

01c1294

Probably this should get its own test.

Inline heap pushes

4b6a041

Mirror ValueError for incorrectly set sort_results and return_distances

c26b583

Parametrise test_radius_neighbors_graph_sparse

5bcea9f

Lighten and correct test

474804c

Allow other dtypes than np.float64

e84ff1b

Parametrise test_kneighbors_graph_sparse

ac2ce70

fixup! Remove uncalled snippet

9f612a1

Mark a test case as xfail for test_fast_sqeuclidean_correctness

74240bd

jjerphan and others added 11 commits December 21, 2021 17:27

DOC Add and correct comments

a6ab85f

Update sklearn/metrics/_pairwise_distances_reduction.pyx

7435885

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

fixup! DOC Add and correct comments

27329c1

FIX Change strategy default value to None

8754998

MAINT Rename n_threads variables and document them.

9cda95f

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

DOC Improve remark regarding support for discrete distance metrics

5d77467

This was mentionned by one of Oliviers' review but I forgot it. Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

MAINT Remove useless private _compute method

af5a991

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Merge pull request #5 from jjerphan/pairwise-distances-argkmin-raii

8a0771a

Merge branch 'main' into pairwise-distances-argkmin

dd1dde3

Remove duplicated submodules registration

decb029

fixup! Remove duplicated submodules registration

527555d

This was referenced Dec 23, 2021

MAINT Introduce Pairwise Distances Reductions private submodule #22064

Merged

MAINT Introduce FastEuclideanPairwiseArgKmin #22065

Merged

jjerphan mentioned this pull request Jan 6, 2022

ENH Introduce PairwiseDistancesReduction and PairwiseDistancesArgKmin (feature branch) #22134

Merged

10 tasks

jjerphan mentioned this pull request Jan 14, 2022

MAINT Plug PairwiseDistancesArgKmin as a back-end #22214

Closed

Merge branch 'main' into pairwise-distances-argkmin

c265944

jjerphan mentioned this pull request Jan 24, 2022

MAINT Plug PairwiseDistancesArgKmin as a back-end #22288

Merged

Merge branch 'main' into pairwise-distances-argkmin

25446ef

ogrisel reviewed Feb 11, 2022

View reviewed changes

sklearn/neighbors/tests/test_neighbors.py Show resolved Hide resolved

ogrisel added 2 commits February 11, 2022 16:51

Move changelog entry at the top

b11109c

Merge branch 'main' into pairwise-distances-argkmin

5d6c9c2

lorentzenchr closed this Feb 17, 2022

jjerphan deleted the pairwise-distances-argkmin branch February 20, 2022 11:37

jjerphan mentioned this pull request Feb 23, 2022

POC 32bit datasets support for PairwiseDistancesReduction #22590

Closed

3 tasks

jjerphan mentioned this pull request Jun 5, 2024

FIX use correct formulation for ExpSineSquared kernel #20070

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Pairwise Distances ArgKmin #21462

ENH Pairwise Distances ArgKmin #21462

jjerphan commented Oct 26, 2021 •

edited

Loading

ogrisel commented Feb 11, 2022 •

edited

Loading

ogrisel left a comment

lorentzenchr commented Feb 17, 2022

ogrisel commented Feb 17, 2022

ENH Pairwise Distances ArgKmin #21462

ENH Pairwise Distances ArgKmin #21462

Conversation

jjerphan commented Oct 26, 2021 • edited Loading

Reference Issues/PRs

Context

Proposals of this PR

Results

Scalability results (I beg readers' pardon for my plotting skills)

This PR @ 19dd7ca

1.0 (9b03375)

Benchmarks information

Benchmarks results between 1.0 (e7fb5b8) and this PR @ 19dd7ca (via 5f3f389)

Benchmarks information

Any other comments?

ogrisel commented Feb 11, 2022 • edited Loading

main

#21462

ogrisel left a comment

Choose a reason for hiding this comment

lorentzenchr commented Feb 17, 2022

ogrisel commented Feb 17, 2022

jjerphan commented Oct 26, 2021 •

edited

Loading

This PR @ `19dd7ca`

1.0 (`9b03375`)

Benchmarks results between 1.0 (`e7fb5b8`) and this PR @ `19dd7ca` (via `5f3f389`)

ogrisel commented Feb 11, 2022 •

edited

Loading

`main`

`#21462`