Skip to content

ENH add pairwise distances backend for LocalOutlierFactor #26316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Towards #25888

What does this implement/fix? Explain your changes.

  • Adds a dedicated pairwise distances backend to handle computation of local reachability density in the fit method of LocalOutlierFactor.

Any other comments?

CC: @jjerphan Could you kindly have a look and see how this looks?

Copy link
Contributor

@Micky774 Micky774 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there @OmarManzoor, always wonderful seeing a PR from you :)

I know you are still early in the process of finalizing this PR and it is only marked as a draft, but I figured some (potentially unsolicited) advice may help smooth out the road in front of you.

I also wanted to go ahead and mention that this PR will require some thorough benchmarking just to confirm that there is indeed a performance gain, and to potentially evaluate what the best deployment strategies are (i.e. parallel_on_{X, Y}) since the asymmetry of the algorithm may require us to adopt a different heuristic than what is the default for ArgKmin.

Please do not hesitate to ping if you have any questions or concerns. Thanks!

Copy link
Contributor

@Micky774 Micky774 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Aside from the missing test, please ping me once you have some benchmark results ready :)

Comment on lines +750 to +753
raise ValueError(
"Only float64 or float32 datasets pairs are supported at this time, "
f"got: X.dtype={X.dtype} and Y.dtype={Y.dtype}."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a test for this

@jjerphan
Copy link
Member

@OmarManzoor: you can use asv to easily perform benchmarks between those revisions.

You can take some inspiration from those ASV benchmarks definitions and adapt them for LocalOutlierFactor.

@OmarManzoor
Copy link
Contributor Author

@OmarManzoor: you can use asv to easily perform benchmarks between those revisions.

You can take some inspiration from those ASV benchmarks definitions and adapt them for LocalOutlierFactor.

Right thank you.

@OmarManzoor
Copy link
Contributor Author

@jjerphan Could you kindly have a look if this benchmark seems correct argkmin_lrd_benchmark?

@OmarManzoor
Copy link
Contributor Author

@jjerphan Could you kindly have a look if this benchmark seems correct argkmin_lrd_benchmark?

I think this is going to take a lot of time to run.

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented May 16, 2023

@jjerphan @Micky774 These are the results that I got so far

For scikit-learn commit 881123aa <pairwise_distance_backend_for_lof> (round 1/1):
argkmin_lrd.ArgKminLRDBenchmark.time_fit
========== ============ ============= ==============
--                    n_test / n_features           
---------- -----------------------------------------
n_train    1000 / 100   10000 / 100   100000 / 100 
========== ============ ============= ==============
 1000      19.3±2ms     18.2±0.1ms    18.0±0.1ms  
10000     1.06±0.02s    1.02±0.02s    1.04±0.02s  
10000000     failed        failed        failed    
========== ============ ============= ==============


For scikit-learn commit c5f10c8b <main> (round 1/1):
[argkmin_lrd.ArgKminLRDBenchmark.time_fit
========== ============ ============= ==============
--                    n_test / n_features           
---------- -----------------------------------------
n_train    1000 / 100   10000 / 100   100000 / 100 
========== ============ ============= ==============
  1000     4.48±0.2ms    4.81±0.1ms    4.65±0.1ms  
 10000      142±6ms       146±2ms       147±5ms    
10000000     failed        failed        failed    
========== ============ ============= ==============

I don't think this looks good. The performance seems to have decreased.

@Micky774
Copy link
Contributor

Micky774 commented May 16, 2023

I don't think this looks good. The performance seems to have decreased.

Can you try profiling (e.g. with Linux perf, cprofile or whatever else you prefer) to see what the hotspots are? I'll also take a look soon to see if I can duplicate your results. We'll get this sorted :)

@OmarManzoor
Copy link
Contributor Author

I don't think this looks good. The performance seems to have decreased.

Can you try profiling (e.g. with Linux perf, cprofile or whatever else you prefer) to see what the hotspots are? I'll also take a look soon to see if I can duplicate your results. We'll get this sorted :)

Thank you!

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented May 17, 2023

@Micky774 I tried using cProfile and these are the results for the argkmin_lrd.pyx file
using this simple script

Code
import cProfile
import numpy as np

from sklearn.neighbors import LocalOutlierFactor

n_train = 1000
n_features = 100

rng = np.random.RandomState(0)
X_train = rng.rand(n_train, n_features).astype(np.float32)
cProfile.run("LocalOutlierFactor(n_neighbors=10).fit(X=X_train)")
ncalls  tottime  percall  cumtime  percall   filename:lineno(function)
1        0.000    0.000    0.000    0.000     _argkmin_lrd.pyx:134(_finalize_results)
1        0.031    0.031    0.074    0.074     _argkmin_lrd.pyx:34(compute)
1        0.000    0.000    0.015    0.015     _argkmin_lrd.pyx:64(__init__)

It seems like the compute method if taking the actual time.

@jjerphan
Copy link
Member

Python and native code can be profiled with py-spy and results exported for SpeedScope using:

py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py

I would recommend setting thread affinity to only use one. On GNU/Linux, this can be done using taskset(1):

taskset -c 0 py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py

Profiles' files can be uploaded here so that they be inspected by readers as well.

@OmarManzoor
Copy link
Contributor Author

Python and native code can be profiled with py-spy and results exported for SpeedScope using:

py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py

I would recommend setting thread affinity to only use one. On GNU/Linux, this can be done using taskset(1):

taskset -c 0 py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py

Profiles' files can be uploaded here so that they be inspected by readers as well.

I think I realized what the issue is with the significant drop. The metric was being set as euclidean which has a specialized implementation when we use kneighbors.

@jjerphan
Copy link
Member

Cool!

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented May 17, 2023

These are the latest benchmarks with float32

Benchmark Code
import numpy as np

from .common import Benchmark

from sklearn.neighbors import LocalOutlierFactor


class ArgKminLRDBenchmark(Benchmark):
    param_names = ["n_train", "n_features"]
    params = [
        [1000, 10000, 100000],
        [100],
    ]

    def setup(self, n_train, n_features):
        rng = np.random.RandomState(0)
        self.X_train = rng.rand(n_train, n_features).astype(np.float32)
        self.y_train = rng.randint(low=-1, high=1, size=(n_train,))

    def time_fit(self, n_train, n_features):
        est = LocalOutlierFactor(n_neighbors=10, metric="manhattan").fit(X=self.X_train)
        self.estimator_ = est
PR - time_fit 

========= ============
 --         n_features 
--------- ------------
n_train      100     
========= ============
  1000    16.9±0.9ms 
 10000     926±20ms  
 100000   1.54±0.2m  
========= ============


MAIN - time_fit

========= ============
--         n_features 
--------- ------------
 n_train      100     
========= ============
   1000    16.5±0.7ms 
  10000     926±9ms   
  100000   1.49±0.01m 
========= ============

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented May 17, 2023

Benchmarks with float64

Benchmark Code
import numpy as np

from .common import Benchmark

from sklearn.neighbors import LocalOutlierFactor


class ArgKminLRDBenchmark(Benchmark):
    param_names = ["n_train", "n_features"]
    params = [
        [1000, 10000, 100000],
        [100],
    ]

    def setup(self, n_train, n_features):
        rng = np.random.RandomState(0)
        self.X_train = rng.rand(n_train, n_features).astype(np.float64)
        self.y_train = rng.randint(low=-1, high=1, size=(n_train,))

    def time_fit(self, n_train, n_features):
        LocalOutlierFactor(n_neighbors=10, metric="manhattan").fit(X=self.X_train)
PR - time_fit 

========= ============
 --         n_features 
--------- ------------
n_train      100     
========= ============
  1000     18.6±0.1ms 
 10000     1.11±0.02s  
 100000    1.77±0m  
========= ============


MAIN - time_fit

========= ============
--         n_features 
--------- ------------
 n_train      100     
========= ============
   1000    18.7±0.4ms 
  10000    1.11±0.02s   
  100000   1.80±0.01m 
========= ============

@OmarManzoor
Copy link
Contributor Author

It looks like no specific improvements are noted.

@OmarManzoor
Copy link
Contributor Author

Even by setting the number of neighbors to be 1000 no improvement was noted.

Benchmark Code
import numpy as np

from .common import Benchmark

from sklearn.neighbors import LocalOutlierFactor


class ArgKminLRDBenchmark(Benchmark):
    param_names = ["n_train", "n_features"]
    params = [
        [100000],
        [100],
    ]

    def setup(self, n_train, n_features):
        rng = np.random.RandomState(0)
        self.X_train = rng.rand(n_train, n_features).astype(np.float32)

    def time_fit(self, n_train, n_features):
        LocalOutlierFactor(n_neighbors=1000, metric="manhattan").fit(X=self.X_train)
[ 0.00%] · For scikit-learn commit d15e8c09 <pairwise_distance_backend_for_lof> (round 1/1):
[ 0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[50.00%] ··· argkmin_lrd.ArgKminLRDBenchmark.time_fit                                                                                                                    ok
[50.00%] ··· ========= ============
             --         n_features 
             --------- ------------
              n_train      100     
             ========= ============
               100000   1.68±0.01m 
             ========= ============

[50.00%] · For scikit-learn commit 55af30d9 <main> (round 1/1):
[50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl..
[50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[100.00%] ··· argkmin_lrd.ArgKminLRDBenchmark.time_fit                                                                                                                    ok
[100.00%] ··· ========= =========
              --        n_features
              --------- ---------
               n_train     100   
              ========= =========
                100000   1.68±0m 
              ========= =========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@Micky774
Copy link
Contributor

After inspecting w/ py-spy it seems that, quite simply, the computational bottleneck of LocalOutlierFactor is the computation of kneighbors, which is already ArgKmin accelerated. The actual _local_reachability_density work comprises ~1.2% of runtime. There's not even much to optimize out, unfortunately.

@jjerphan @OmarManzoor please do sanity check me on this, but afaik I think this is just ill-suited for further optimization via a dedicated backend. This is good information though, since it allows us to focus efforts in other areas -- and also probably should set the precedent of confirming bottlenecks w/ an initial profiling rather than after work has been done :)

Once you two are satisfied with this explanation, we can close this and move on.

@OmarManzoor
Copy link
Contributor Author

After inspecting w/ py-spy it seems that, quite simply, the computational bottleneck of LocalOutlierFactor is the computation of kneighbors, which is already ArgKmin accelerated. The actual _local_reachability_density work comprises ~1.2% of runtime. There's not even much to optimize out, unfortunately.

@jjerphan @OmarManzoor please do sanity check me on this, but afaik I think this is just ill-suited for further optimization via a dedicated backend. This is good information though, since it allows us to focus efforts in other areas -- and also probably should set the precedent of confirming bottlenecks w/ an initial profiling rather than after work has been done :)

Once you two are satisfied with this explanation, we can close this and move on.

Thank you for inspecting with py-spy. I think what you are specifying makes sense and reinforces the observed benchmarks.

@OmarManzoor
Copy link
Contributor Author

@jjerphan I think we can close this PR then?

@jjerphan
Copy link
Member

Closing since no performance improvements can be reached.

Sorry, @OmarManzoor. 🤦‍♂️ I wished I provided more evidenced certainty about room for performance improvements beforehand, with a profile of the current execution of algorithms for instance.

RadiusNeighbors.{predict, predict_proba} might be good candidates, yet we need to profile their execution to be sure dedicated backends will bring benefits.

I think I will reword the issue description in this regard.

@jjerphan jjerphan closed this May 20, 2023
@OmarManzoor
Copy link
Contributor Author

Closing since no performance improvements can be reached.

Sorry, @OmarManzoor. 🤦‍♂️ I wished I provided more evidenced certainty about room for performance improvements beforehand, with a profile of the current execution of algorithms for instance.

RadiusNeighbors.{predict, predict_proba} might be good candidates, yet we need to profile their execution to be sure dedicated backends will bring benefits.

I think I will reword the issue description in this regard.

No no there is no need to apologize. It was a good learning experience. We can try out RadiusNeighbors.{predict, predict_proba} next as you suggest. Thank you.

@OmarManzoor OmarManzoor deleted the pairwise_distance_backend_for_lof branch May 20, 2023 12:45
@Micky774
Copy link
Contributor

Micky774 commented May 20, 2023

No no there is no need to apologize. It was a good learning experience.

❤️

We can try out RadiusNeighbors.{predict, predict_proba} next as you suggest. Thank you.

I have updated #25888 to reflect the other ongoing/open work for finalizing the optimization of KNeighbors*.predict* in case you would be interested in that as well. Whatever you prefer though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants