ENH Introduce `PairwiseDistancesRadiusNeighborhood` #22320

jjerphan · 2022-01-28T14:27:41Z

Reference Issues/PRs

Comes after #22134.

What does this implement/fix? Explain your changes.

This introduces two new concrete PairwiseDistancesReductions for radius-neighborhood querying.

Any other comments?

It uses resizable buffers as provided via std::vectors, with some adaptation to return them as numpy arrays safely. Though it suffers from concurrent reallocation in threads (namely when vectors’ buffers are being reallocated).

This concurrent reallocation causes some drops in performance as calls to malloc(3) (used under the hood for the buffers’ reallocations) lock by default in the compilers’ standard libraries’ implementations. Concretely, for the new PairwiseDistancesRadiusNeighborhood, the speed-up only can get up to 4 contrarily to 20 for PairwiseDistancesArgKmin which does not know such problems.

A possible solution to alleviate this problem would be to use another implementation of malloc(3) such as mimalloc‘s.

Finally, and contrarily to k-nn querying which is static, radius neighbours querying is dynamic, hence we might want to adapt the scheduling of prange loops with dynamic or guided. IMO, this should be done in another PR.

Hardware Scalability on 1.0 (`9b03375`) using this script

The current implementation on 1.0 does not scale because:

is tied to CPython GIL due to many small PyObjects (this is especially the case when n_features is small)
reallocates a lot of temporary numpy arrays is various threads (which is costly and comes with concurrency)
performs a lot of back and forth movement of data between the RAM and the caches

Hardware Scalability for this PR @ `aeb79ca` using this script

With mimalloc, this PR implementation does scales linearly up to 8 threads and sublinearly up to 64 threads.

Without mimalloc, performances are worse and variable.

Without mimalloc:

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1    10000   10000          50      1.858132               0
1           2    10000   10000          50      0.943718               0
2           4    10000   10000          50      0.595347               0
3           8    10000   10000          50      0.350731               0
4          16    10000   10000          50      0.522424               0
5          32    10000   10000          50      0.444270               0
6          64    10000   10000          50      0.559822               0
7         128    10000   10000          50      0.396564               0
8           1    10000   10000         100      2.063410               0
9           2    10000   10000         100      1.191579               0
10          4    10000   10000         100      0.623694               0
11          8    10000   10000         100      0.359428               0
12         16    10000   10000         100      0.353618               0
13         32    10000   10000         100      0.283564               0
14         64    10000   10000         100      0.616601               0
15        128    10000   10000         100      0.439544               0
16          1    10000   10000         500      4.059608               0
17          2    10000   10000         500      2.084081               0
18          4    10000   10000         500      1.088404               0
19          8    10000   10000         500      0.607854               0
20         16    10000   10000         500      0.585447               0
21         32    10000   10000         500      0.539778               0
22         64    10000   10000         500      0.728474               0
23        128    10000   10000         500      0.497490               0

With mimalloc:

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1    10000   10000          50      1.843863               0
1           2    10000   10000          50      0.878143               0
2           4    10000   10000          50      0.477720               0
3           8    10000   10000          50      0.248385               0
4          16    10000   10000          50      0.202744               0
5          32    10000   10000          50      0.155233               0
6          64    10000   10000          50      0.123474               0
7         128    10000   10000          50      0.243421               0
8           1    10000   10000         100      2.043319               0
9           2    10000   10000         100      1.030469               0
10          4    10000   10000         100      0.528292               0
11          8    10000   10000         100      0.308843               0
12         16    10000   10000         100      0.219315               0
13         32    10000   10000         100      0.165821               0
14         64    10000   10000         100      0.144672               0
15        128    10000   10000         100      0.268027               0
16          1    10000   10000         500      3.942212               0
17          2    10000   10000         500      2.009313               0
18          4    10000   10000         500      1.010291               0
19          8    10000   10000         500      0.537854               0
20         16    10000   10000         500      0.387427               0
21         32    10000   10000         500      0.314703               0
22         64    10000   10000         500      0.254417               0
23        128    10000   10000         500      0.378934               0

Speed-ups between 1.0 (`998e8f2`) and this PR @ `c4d8c4a` (via 5f3f389a67a2fe914959ee3a0abc7ad3c6e220bd)

From ×2 up to ×20 speed-ups in normal configurations.
Small regression in some cases.

(I need to check if mimalloc can be LD_PRELOAD-ed when using asv to report results).

Without mimalloc:

1 core

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
-      86.9±0.6ms       17.8±0.2ms     0.20  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-         835±2ms        161±0.2ms     0.19  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-         8.37±0s          1.58±0s     0.19  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)
-      13.0±0.1ms      2.41±0.04ms     0.18  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
-       148±0.6ms       21.4±0.1ms     0.14  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 10000, 100)
-         1.47±0s        203±0.3ms     0.14  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

2 cores

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
-       595±0.7ms        318±0.8ms     0.53  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-         5.98±0s       3.11±0.01s     0.52  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)
-      63.4±0.3ms       32.4±0.2ms     0.51  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-      11.0±0.2ms      3.98±0.06ms     0.36  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
-         1.24±0s          373±1ms     0.30  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)
-       125±0.6ms       37.4±0.2ms     0.30  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

4 cores

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
-      52.2±0.4ms       18.4±0.3ms     0.35  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-         481±1ms          166±1ms     0.35  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-      4.83±0.01s       1.62±0.02s     0.34  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)
-     10.0±0.09ms      2.79±0.08ms     0.28  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
-       113±0.6ms       23.2±0.3ms     0.20  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 10000, 100)
-         1.11±0s          221±2ms     0.20  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

8 cores

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
-      10.8±0.2ms       3.20±0.1ms     0.30  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
-      58.2±0.6ms       11.8±0.2ms     0.20  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-         535±2ms       91.9±0.5ms     0.17  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-         5.35±0s          886±6ms     0.17  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)
-       116±0.5ms       17.1±0.2ms     0.15  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 10000, 100)
-         1.11±0s          151±1ms     0.14  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

16 cores

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
-      10.9±0.2ms       3.92±0.2ms     0.36  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
-       123±0.9ms       28.7±0.3ms     0.23  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 10000, 100)
-      55.5±0.6ms       8.37±0.1ms     0.15  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-         495±8ms       71.0±0.8ms     0.14  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-      4.93±0.04s          508±3ms     0.10  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)
-      1.15±0.02s        117±0.9ms     0.10  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

32 cores

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
-     10.9±0.05ms       5.25±0.3ms     0.48  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
-         127±1ms       30.1±0.3ms     0.24  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 10000, 100)
-      54.3±0.6ms       9.03±0.8ms     0.17  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-         480±4ms       63.2±0.2ms     0.13  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-         1.17±0s        105±0.6ms     0.09  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)
-      4.80±0.01s          326±1ms     0.07  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

64 cores

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
-     11.9±0.05ms         8.54±5ms     0.72  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
-         152±2ms       34.1±0.2ms     0.22  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 10000, 100)
-        70.5±2ms         12.4±2ms     0.18  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-         581±5ms       56.5±0.4ms     0.10  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-      1.30±0.01s        102±0.7ms     0.08  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)
-      5.76±0.01s          263±1ms     0.05  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

128 cores

       before           after         ratio
     [998e8f20]       [c4d8c4a4]
     <main>           <pairwise-distances-radius-neighborhood>
+      12.3±0.2ms          146±7ms    11.88  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 1000, 100)
+        83.8±4ms          135±4ms     1.61  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 1000, 100)
-      1.39±0.02s         434±10ms     0.31  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(1000, 100000, 100)
-         643±4ms          188±5ms     0.29  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 10000, 100)
-      6.31±0.03s         624±20ms     0.10  pdr.PairwiseDistancesRadiusNeighborhoodBenchmark.time_radius_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Benchmarks information

Machine specification

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           2
NUMA node(s):        2
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               49
Model name:          AMD EPYC 7742 64-Core Processor
Stepping:            0
CPU MHz:             3392.969
BogoMIPS:            4491.83
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-63,128-191
NUMA node1 CPU(s):   64-127,192-255
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

…it-learn#22064) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Remove duplicated Bunch

Also move the error message upfront if results have to be sorted without the distances being returned.

sklearn/neighbors/_base.py

jjerphan · 2022-02-23T09:14:56Z

No memory leaks.

Tested with this script

from sklearn.metrics._pairwise_distances_reduction import (
    PairwiseDistancesRadiusNeighborhood,
)
import numpy as np
import psutil
import pandas as pd
import gc
import matplotlib.pyplot as plt

if __name__ == "__main__":
    memory_info = []
    mem_info = psutil.Process().memory_full_info()
    memory_info.append(mem_info)
    rng = np.random.RandomState(0)
    X = rng.rand(10000, 100)
    Y = rng.rand(10, 100)
    mem_info = psutil.Process().memory_full_info()
    memory_info.append(mem_info)

    n_calls = 100
    radius = 10000
    return_distance = True

    for i in range(n_calls):
        PairwiseDistancesRadiusNeighborhood.compute(
                X, Y,
                radius=radius,
                return_distance=return_distance,
                sort_results=True,
                strategy="parallel_on_Y",
        )
        mem_info = psutil.Process().memory_full_info()
        print(f"Iteration {i} / {n_calls}", mem_info)
        memory_info.append(mem_info)

    gc.collect()
    mem_info = psutil.Process().memory_full_info()
    memory_info.append(mem_info)

    title = "Memory usage on calls to PairwiseDistancesRadiusNeighborhood.compute"
    title += f"\n X.shape={X.shape} -- Y.shape={Y.shape} " \
             f"radius={radius} -- return_distance={return_distance}"

    df = pd.DataFrame(data=memory_info)
    df[["data", "rss", "uss"]].plot(title=title,
                                    xlabel="Number of calls",
                                    ylabel="Size of segment (in bytes)",
                                    figsize=(16, 9), fontsize=10)
    plt.savefig(f"/tmp/n_calls={n_calls}_radius={radius}_return_distance={return_distance}.png")
    plt.show()

ogrisel · 2022-02-23T11:13:38Z

I think we can consider reviewing this PR for inclusion and merge without requiring integration of mimalloc. Integrating mimalloc (in a vendored dynamic lib in the PyPI wheels or as an explicit dependency for the conda-forge package) might be doable with manageable maintenance overhead but can be explored in a follow-up PR (with [cd build] runs to check that it works).

ogrisel

This looks good to me (pending that the full [cd build] is green)!

I did some tests on my local Apple M1 laptop (without mimalloc) and I typically observe 3x to 5x speed-ups over the main branch and a slightly improved memory usage.

In particular I tried with the following script:

from time import perf_counter
import numpy as np
from joblib import Memory
from scipy.spatial.distance import cdist
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.neighbors import RadiusNeighborsClassifier


n_samples_train = 1_000_000
n_samples_test = 1000
n_features = 100

m = Memory(location="/tmp/joblib")
make_blobs = m.cache(make_blobs)

X, y = make_blobs(
    n_samples=n_samples_train + n_samples_test,
    centers=30,
    n_features=n_features,
    random_state=0,
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=n_samples_test, random_state=0
)
radius = np.quantile(a=cdist(X_train[:10000], X_test[:10]).ravel(), q=0.001)
print(f"{radius=:.3f}")

rnc = RadiusNeighborsClassifier(radius=radius, outlier_label="most_frequent").fit(
    X_train, y_train
)
for i in range(10):
    t0 = perf_counter()
    accuracy = rnc.score(X_test, y_test)
    t1 = perf_counter()
    print(
        f"{n_samples_train=}, {n_samples_test=}, {n_features=}, "
        f"{accuracy=}, duration={t1 - t0:.3f}s"
    )

and launched it with mprof run bench_radius_neighbors.py / mprof plot to the following results:

on main:

radius=12.098
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=5.216s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.714s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.996s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.937s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=5.077s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.868s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.766s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.959s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.835s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=4.938s

on this branch:

radius=12.098
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.193s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.227s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.221s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.253s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.264s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.364s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.271s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.321s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.323s
n_samples_train=1000000, n_samples_test=1000, n_features=100, accuracy=0.997, duration=1.291s

So both faster and more memory efficient. And more importantly, the memory management seems to be correct because memory remains stable after several iterations (no memory leak)! I also tried with moving the constructor under the for loop and the result is the same and to run for a larger number of iterations and memory usage remained stable.

I also tried viztracer --open bench_radius_neighbors.py and I did not see anything suspicious.

Also all scikit-learn tests pass on macos/arm64 which is currently not tested on the CI.

Some minor comments and suggestions below:

sklearn/metrics/tests/test_pairwise_distances_reduction.py

sklearn/neighbors/tests/test_neighbors.py

sklearn/metrics/_pairwise_distances_reduction.pyx

jjerphan · 2022-02-24T18:56:41Z

Thank you for this thoughtful review, @ogrisel: I highly appreciate it. 🤝

I will address your comments tomorrow.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

BLD Add compiler args for C++11

thomasjpfan

I opened jjerphan#13 and jjerphan#12 on your forks on ideas about using vectors and reduce derefs in the StdVectorSentinel

At some point, I feel like StdVectorSentinel can be useful in other parts of the codebase. I am okay to keep in pairwise for now.

Otherwise, the implementation looks solid and I see the same performance benefits when running the benchmarks from #22320 (review)

sklearn/metrics/_pairwise_distances_reduction.pyx

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

jjerphan · 2022-03-04T10:41:38Z

At some point, I feel like StdVectorSentinel can be useful in other parts of the codebase. I am okay to keep in pairwise for now.

Yes, I do agree. Probably this can be re-factored with other fixtures using C++ to have proper common Cython interfaces.

thomasjpfan

I ran more benchmarks using #22320 (review) with other metrics such as metric="chebyshev" to go down the PairwiseDistancesRadiusNeighborhood code path and see memory and runtime improvements compare to main.

I pushed a [cd build] commit and the wheels build successfully with the adoption of vector for dist_middle_terms_chunks in FastEuclideanPairwiseDistancesRadiusNeighborhood.

LGTM

jjerphan · 2022-03-04T17:50:20Z

Thank you once again, @thomasjpfan and @ogrisel for the thorough reviews.

The radius definition is from Olivier: scikit-learn#22320 (review) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

jjerphan and others added 7 commits January 5, 2022 15:19

MAINT Introduce Pairwise Distances Reductions private submodule (scik…

0532aab

…it-learn#22064) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

MAINT Introduce FastEuclideanPairwiseArgKmin (scikit-learn#22065)

60dcd13

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Merge branch 'main' into pairwise-distances-argkmin

cfb8c92

fixup! Merge branch 'main' into pairwise-distances-argkmin

1abb441

Remove duplicated Bunch

Merge branch 'main' into pairwise-distances-argkmin

92a23ba

Merge branch 'main' into pairwise-distances-argkmin

7fc984e

ENH Introduce PairwiseDistancesRadiusNeighborhood

ac6f623

github-actions bot added module:metrics module:utils cython labels Jan 28, 2022

jjerphan added the No Changelog Needed label Jan 28, 2022

Plug PairwiseDistancesRadiusNeighborhood as a back-end

37b61e3

Also move the error message upfront if results have to be sorted without the distances being returned.

lorentzenchr mentioned this pull request Feb 17, 2022

ENH Introduce PairwiseDistancesReduction and PairwiseDistancesArgKmin (feature branch) #22134

Merged

10 tasks

Merge branch 'main' into pairwise-distances-radius-neighborhood

aeb79ca

jjerphan commented Feb 20, 2022

View reviewed changes

sklearn/neighbors/_base.py Show resolved Hide resolved

jjerphan added 2 commits February 22, 2022 11:05

TST Check consistency of backends' results

390f624

MAINT Group vector fixtures together

c4d8c4a

jjerphan marked this pull request as ready for review February 23, 2022 10:54

jjerphan added 2 commits February 23, 2022 11:54

Merge branch 'main' into pairwise-distances-radius-neighborhood

fd8d681

DOC Update whats_new entry

f091804

jjerphan added the Performance label Feb 23, 2022

ogrisel self-requested a review February 23, 2022 13:03

jjerphan mentioned this pull request Feb 23, 2022

PERF PairwiseDistancesReductions initial work #22587

Closed

ogrisel added 2 commits February 24, 2022 19:23

Merge branch 'main' into pairwise-distances-radius-neighborhood

bb88b55

Trigger full [cd build]

e477c10

ogrisel approved these changes Feb 24, 2022

View reviewed changes

Reformat and fix typos

494a0b5

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

jjerphan and others added 9 commits February 26, 2022 21:18

Use shared pointers instead of raw pointers

05adeb0

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

fixup! Use shared pointers instead of raw pointers

086fb78

BLD Add compiler args for C++11

708e734

ENH Uses unique pointers

84fb321

Merge pull request #9 from thomasjpfan/22320_compile_args

d577709

BLD Add compiler args for C++11

DOC Update comment for unique pointers

970d4dd

Merge pull request #10 from thomasjpfan/shared_ptr_casting

a2efdd1

Revert "ENH Uses unique pointers"

6e61773

Merge pull request #11 from jjerphan/revert-10-shared_ptr_casting

b34e69a

ogrisel mentioned this pull request Mar 3, 2022

Warning the user of bad default values, starting by dbscan.eps #14942

Closed

ENH Uses vectors for Euclidean Radius

46f3bc4

thomasjpfan reviewed Mar 4, 2022

View reviewed changes

sklearn/metrics/_pairwise_distances_reduction.pyx Outdated Show resolved Hide resolved

sklearn/metrics/_pairwise_distances_reduction.pyx Outdated Show resolved Hide resolved

jjerphan and others added 2 commits March 4, 2022 11:16

Merge pull request #13 from thomasjpfan/pairwise_vector_radius

01f33a6

DOC Reference external documentations for StdVectorSentinel

0a2736b

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

jjerphan and others added 3 commits March 4, 2022 12:24

Merge branch 'main' into pairwise-distances-radius-neighborhood

bb48fb8

TST Remove useless haversine case

bba8f73

CI [cd build]

afa4e55

thomasjpfan approved these changes Mar 4, 2022

View reviewed changes

thomasjpfan merged commit fb71eef into scikit-learn:main Mar 4, 2022

jjerphan deleted the pairwise-distances-radius-neighborhood branch March 4, 2022 17:49

This was referenced Mar 7, 2022

MAINT Refactor the common logic for GEMM in wrapper #22719

Merged

POC 32bit datasets support for PairwiseDistancesReduction #22590

Closed

This was referenced Mar 7, 2022

CLN Removes unused fused dtype in pairwise distances #22727

Merged

MAINT Refactor vector sentinel into utils #22728

Merged

jjerphan mentioned this pull request Mar 14, 2022

MAINT Adapt scheduling of PairwiseDistancesRadiusNeighborhood #22829

Merged

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Aug 5, 2022

ASV Add benchmarks for PairwiseDistancesReductions

06e733e

The radius definition is from Olivier: scikit-learn#22320 (review) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

jjerphan mentioned this pull request Jul 26, 2023

PERF Implement PairwiseDistancesReduction backend for RadiusNeighbors.predict_proba #26828

Merged

lesteve mentioned this pull request Mar 5, 2024

BUILD/CI Switch to Meson as main build backend #28506

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Introduce `PairwiseDistancesRadiusNeighborhood` #22320

ENH Introduce `PairwiseDistancesRadiusNeighborhood` #22320

Uh oh!

jjerphan commented Jan 28, 2022 •

edited

Loading

Uh oh!

Uh oh!

jjerphan commented Feb 23, 2022

Uh oh!

ogrisel commented Feb 23, 2022

Uh oh!

ogrisel left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan commented Feb 24, 2022

Uh oh!

thomasjpfan left a comment

Uh oh!

Uh oh!

Uh oh!

jjerphan commented Mar 4, 2022

Uh oh!

thomasjpfan left a comment •

edited

Loading

Uh oh!

jjerphan commented Mar 4, 2022

Uh oh!

Uh oh!

Uh oh!

ENH Introduce PairwiseDistancesRadiusNeighborhood #22320

ENH Introduce PairwiseDistancesRadiusNeighborhood #22320

Uh oh!

Conversation

jjerphan commented Jan 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Hardware Scalability on 1.0 (9b03375) using this script

Hardware Scalability for this PR @ aeb79ca using this script

Without mimalloc:

With mimalloc:

Speed-ups between 1.0 (998e8f2) and this PR @ c4d8c4a (via 5f3f389a67a2fe914959ee3a0abc7ad3c6e220bd)

Without mimalloc:

Benchmarks information

Uh oh!

Uh oh!

jjerphan commented Feb 23, 2022

Uh oh!

ogrisel commented Feb 23, 2022

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan commented Feb 24, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jjerphan commented Mar 4, 2022

Uh oh!

thomasjpfan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan commented Mar 4, 2022

Uh oh!

Uh oh!

ENH Introduce `PairwiseDistancesRadiusNeighborhood` #22320

ENH Introduce `PairwiseDistancesRadiusNeighborhood` #22320

jjerphan commented Jan 28, 2022 •

edited

Loading

Hardware Scalability on 1.0 (`9b03375`) using this script

Hardware Scalability for this PR @ `aeb79ca` using this script

Speed-ups between 1.0 (`998e8f2`) and this PR @ `c4d8c4a` (via 5f3f389a67a2fe914959ee3a0abc7ad3c6e220bd)

ogrisel left a comment •

edited

Loading

thomasjpfan left a comment •

edited

Loading