PERF `PairwiseDistancesReductions` subsequent work #25888

jjerphan · 2023-03-17T08:16:19Z

PairwiseDistancesReductions have been introduced as a hierarchy of Cython classes to implement back-ends of some scikit-learn algorithms.

Initial work was listed in #22587.

💡 See this presentation for a better understanding of the design of PairwiseDistancesReductions.

💡 Before working on improving performance, one must profile the current execution of algorithms to see if there are any substantial benefits.

Subsequent possible work includes:

The text was updated successfully, but these errors were encountered:

jjerphan · 2023-03-17T09:11:39Z

@OmarManzoor: you also might be interested in items of this issue. 🙂

OmarManzoor · 2023-03-17T09:17:45Z

@OmarManzoor: you also might be interested in items of this issue. 🙂

Nice. Thanks!

adam2392 · 2023-03-17T17:23:55Z

Support "precomputed" distances

Looking into this...

jjerphan · 2023-03-17T18:17:14Z

I started working on this item with jjerphan@78b3aa4.

The logic of PairwiseDistancesReductions for metric="precomputed" has to be adapted. Feel free to use another alternative. 🙂

OmarManzoor · 2023-03-20T13:58:40Z

@jjerphan Would the remaining items in the Implement back-end for other Estimators or Transformers require significant amount of work?

jjerphan · 2023-03-20T14:15:53Z

Basically, some of the interfaces whose performance have been improved in 1.1 with ArgKmin and RadiusNeighbors via kneighbors and radius_neighbors might be optimize-able with dedicated back-end.

Hence, I think this requires a significant amount of work, yes.

Edit: For now, I have added a link to a presentation explaining the design of those implementation in the description of this issue. I think I will write about the design of it (cc @adam2392).

OmarManzoor · 2023-03-20T14:27:12Z

Basically, some of the interfaces whose performance have been improved in 1.1 with ArgKmin and RadiusNeighbors via kneighbors and radius_neighbors might be optimize-able with dedicated back-end.

Hence, I think this requires a significant amount of work, yes.

Edit: For now, I have added a link to a presentation explaining the design of those implementation in the description of this issue. I think I will write about the design of it (cc @adam2392).

Thank you for sharing the presentation.

jjerphan · 2023-03-20T14:28:37Z

You are welcome.

OmarManzoor · 2023-03-20T14:55:16Z

You are welcome.

This looks interesting but I might need time to understand it properly.

jjerphan · 2023-03-20T15:23:14Z

This presentation is a bit out of date.

You might also want to have a look at the elements of documentation present here:

scikit-learn/sklearn/metrics/_pairwise_distances_reduction/__init__.py

Lines 6 to 86 in 5b45d1f

    
           # Overview 
        
           # -------- 
        
           # 
        
           #    This module provides routines to compute pairwise distances between a set 
        
           #    of row vectors of X and another set of row vectors of Y and apply a 
        
           #    reduction on top. The canonical example is the brute-force computation 
        
           #    of the top k nearest neighbors by leveraging the arg-k-min reduction. 
        
           # 
        
           #    The reduction takes a matrix of pairwise distances between rows of X and Y 
        
           #    as input and outputs an aggregate data-structure for each row of X. The 
        
           #    aggregate values are typically smaller than the number of rows in Y, hence 
        
           #    the term reduction. 
        
           # 
        
           #    For computational reasons, the reduction are performed on the fly on chunks 
        
           #    of rows of X and Y so as to keep intermediate data-structures in CPU cache 
        
           #    and avoid unnecessary round trips of large distance arrays with the RAM 
        
           #    that would otherwise severely degrade the speed by making the overall 
        
           #    processing memory-bound. 
        
           # 
        
           #    Finally, the routines follow a generic parallelization template to process 
        
           #    chunks of data with OpenMP loops (via Cython prange), either on rows of X 
        
           #    or rows of Y depending on their respective sizes. 
        
           # 
        
           # 
        
           # Dispatching to specialized implementations 
        
           # ------------------------------------------ 
        
           # 
        
           #    Dispatchers are meant to be used in the Python code. Under the hood, a 
        
           #    dispatcher must only define the logic to choose at runtime to the correct 
        
           #    dtype-specialized :class:`BaseDistancesReductionDispatcher` implementation based 
        
           #    on the dtype of X and of Y. 
        
           # 
        
           # 
        
           # High-level diagram 
        
           # ------------------ 
        
           # 
        
           #    Legend: 
        
           # 
        
           #      A ---⊳ B: A inherits from B 
        
           #      A ---x B: A dispatches to B 
        
           # 
        
           # 
        
           #                               (base dispatcher) 
        
           #                         BaseDistancesReductionDispatcher 
        
           #                                       ∆ 
        
           #                                       | 
        
           #                                       | 
        
           #               +-----------------------+----------------------+ 
        
           #               |                                              | 
        
           #          (dispatcher)                                   (dispatcher) 
        
           #            ArgKmin                                     RadiusNeighbors 
        
           #               |                                              | 
        
           #               |                                              | 
        
           #               |              (float{32,64} implem.)          | 
        
           #               |           BaseDistancesReduction{32,64}      | 
        
           #               |                       ∆                      | 
        
           #               |                       |                      | 
        
           #               |                       |                      | 
        
           #               |     +-----------------+-----------------+    | 
        
           #               |     |                                   |    | 
        
           #               |     |                                   |    | 
        
           #               x     |                                   |    x 
        
           #            ArgKmin{32,64}                        RadiusNeighbors{32,64} 
        
           #               |     ∆                                   ∆    | 
        
           #               |     |                                   |    | 
        
           #        ======================= Specializations ============================= 
        
           #               |     |                                   |    | 
        
           #               |     |                                   |    | 
        
           #               x     |                                   |    x 
        
           #        EuclideanArgKmin{32,64}               EuclideanRadiusNeighbors{32,64} 
        
           # 
        
           #    For instance :class:`ArgKmin` dispatches to: 
        
           #      - :class:`ArgKmin64` if X and Y are two `float64` array-likes 
        
           #      - :class:`ArgKmin32` if X and Y are two `float32` array-likes 
        
           # 
        
           #    In addition, if the metric parameter is set to "euclidean" or "sqeuclidean", 
        
           #    then `ArgKmin{32,64}` further dispatches to `EuclideanArgKmin{32,64}`. For 
        
           #    example, :class:`ArgKmin64` would dispatch to :class:`EuclideanArgKmin64`, a 
        
           #    specialized subclass that optimally handles the Euclidean distance case 
        
           #    using Generalized Matrix Multiplication over `float64` data (see the 
        
           #    docstring of :class:`GEMMTermComputer64` for details).

OmarManzoor · 2023-03-20T15:33:14Z

This presentation is a bit out of date.

You might also want to have a look at the elements of documentation present here:

scikit-learn/sklearn/metrics/_pairwise_distances_reduction/__init__.py

Lines 6 to 86 in 5b45d1f

# Overview

# --------

#

# This module provides routines to compute pairwise distances between a set

# of row vectors of X and another set of row vectors of Y and apply a

# reduction on top. The canonical example is the brute-force computation

# of the top k nearest neighbors by leveraging the arg-k-min reduction.

#

# The reduction takes a matrix of pairwise distances between rows of X and Y

# as input and outputs an aggregate data-structure for each row of X. The

# aggregate values are typically smaller than the number of rows in Y, hence

# the term reduction.

#

# For computational reasons, the reduction are performed on the fly on chunks

# of rows of X and Y so as to keep intermediate data-structures in CPU cache

# and avoid unnecessary round trips of large distance arrays with the RAM

# that would otherwise severely degrade the speed by making the overall

# processing memory-bound.

#

# Finally, the routines follow a generic parallelization template to process

# chunks of data with OpenMP loops (via Cython prange), either on rows of X

# or rows of Y depending on their respective sizes.

#

#

# Dispatching to specialized implementations

# ------------------------------------------

#

# Dispatchers are meant to be used in the Python code. Under the hood, a

# dispatcher must only define the logic to choose at runtime to the correct

# dtype-specialized :class:`BaseDistancesReductionDispatcher` implementation based

# on the dtype of X and of Y.

#

#

# High-level diagram

# ------------------

#

# Legend:

#

# A ---⊳ B: A inherits from B

# A ---x B: A dispatches to B

#

#

# (base dispatcher)

# BaseDistancesReductionDispatcher

# ∆

# |

# |

# +-----------------------+----------------------+

# | |

# (dispatcher) (dispatcher)

# ArgKmin RadiusNeighbors

# | |

# | |

# | (float{32,64} implem.) |

# | BaseDistancesReduction{32,64} |

# | ∆ |

# | | |

# | | |

# | +-----------------+-----------------+ |

# | | | |

# | | | |

# x | | x

# ArgKmin{32,64} RadiusNeighbors{32,64}

# | ∆ ∆ |

# | | | |

# ======================= Specializations =============================

# | | | |

# | | | |

# x | | x

# EuclideanArgKmin{32,64} EuclideanRadiusNeighbors{32,64}

#

# For instance :class:`ArgKmin` dispatches to:

# - :class:`ArgKmin64` if X and Y are two `float64` array-likes

# - :class:`ArgKmin32` if X and Y are two `float32` array-likes

#

# In addition, if the metric parameter is set to "euclidean" or "sqeuclidean",

# then `ArgKmin{32,64}` further dispatches to `EuclideanArgKmin{32,64}`. For

# example, :class:`ArgKmin64` would dispatch to :class:`EuclideanArgKmin64`, a

# specialized subclass that optimally handles the Euclidean distance case

# using Generalized Matrix Multiplication over `float64` data (see the

# docstring of :class:`GEMMTermComputer64` for details).

I think I'll have a look at this after wrapping up the binary tree PR since I have already started it. What do you think?

jjerphan · 2023-03-20T16:57:27Z

Yes, let's probably finish #25914 first.

OmarManzoor · 2023-03-30T11:54:48Z

@jjerphan I have started looking at LocalOutlierFactor. The method kneighbors is being used within fit and score_samples. Do we need to try out a dedicated backend for both these cases?

jjerphan · 2023-03-30T14:33:45Z

Yes.

Similarly to ArgKminClassmode which has been introduced for KNeighbors*.predict_proba by @Micky774, we could imagine back-ends for LocalOutlierFactor.{fit,score_samples}.

OmarManzoor · 2023-03-31T10:36:03Z

@jjerphan At this point we actually need the distances of all the points for extracting the distances of neighbors.

scikit-learn/sklearn/neighbors/_lof.py

Lines 506 to 507 in c3bfe86

    
           dist_k = self._distances_fit_X_[neighbors_indices, self.n_neighbors_ - 1] 
        
           reach_dist_array = np.maximum(distances_X, dist_k)

Where would this be available in pairwise distances?

jjerphan · 2023-03-31T12:17:33Z

I do not think we need to compute and store the distance matrices.

The new back-end would change the implementation so that self._distances_fit_X_ returned by kneighbors won't be stored: kneighbors, _local_reachability_density and the computations of lrd_ratios_array should be replaceable by a new back-end having a logic similar to the one of ArgKmin.

This might be a lot as a rework and I have not been able to have a look at it yet, but I think this is feasible.

Let me know if this is clear or not.

OmarManzoor · 2023-05-30T11:34:32Z

Hi @jjerphan , @Micky774

If we want to optimize RadiusNeighborsClassifier's predict_proba and predict methods, should we run some profiling first? This does seem similar to Kneighbor's predict in terms of how the methods are being called.

jjerphan · 2023-05-30T11:47:14Z

I would run some profiling ahead of implementations, yes.

OmarManzoor · 2023-05-30T11:50:24Z

I would run some profiling ahead of implementations, yes.

Could you kindly guide me how to run such a profiling? You mentioned that we can use py-spy but should I create a python script to run the classifier's desired methods under py-spy?

Micky774 · 2023-05-30T11:59:11Z

Could you kindly guide me how to run such a profiling? You mentioned that we can use py-spy but should I create a python script to run the classifier's desired methods under py-spy?

Yes, basically. Once you create a python script whose main time bottleneck is the code section you want to profile (ideally the majority of time is taken up by predict_proba) then you can run py-spy record via CLI to analyze it.

In particular you'll want to run something like py-spy record --native -o py-spy.profile -f speedscope -- python <script to profile>

This ensures that native C/Cython functions are included in the profile, and will output a speedscope compatible profile report. Note that that's not the only output format you can use, but it is especially convenient for general profiling.

kyrajeep · 2024-06-17T19:06:40Z

I thought supporting "precomputed" distances would be a good task to start with, but after reading the discussion not sure if that is something I can do. Does that part have to be reorganized? Perhaps I should start with a different estimator?

jjerphan · 2024-06-17T19:56:22Z

Hi @kyrajeep ,

Introducing the support for precomputed distance matrices is definitely something you can do. 👍

Note that @adam2392 started this work with #26032. I would recommend discussing the state of this PR before starting anything new.

kyrajeep · 2024-06-25T14:59:17Z

Hello @jjerphan @OmarManzoor @ogrisel,
I decided to look at other estimators since precomputed option does not seem to require too much more work.

I have some questions regarding sklearn/neighbors. It seems like _kd_tree.pyx.tp and _ball_tree.pyx.tp are generated from _binary_tree.pxi.tp. Is that the case?

For _graph.py in neighbors, I do not see a .pyx file and so I created one and wrote a little bit of code to get some clarity. There is much communication to catch up on this topic that confuses me sometimes. :) I'd like to keep working on _graph.pyx

What was your and others workflow for _kd_tree.pyx.tp and _binary_tree.pxi.tp? Thanks so much!

jjerphan · 2024-06-25T18:29:56Z

I decided to look at other estimators since precomputed option does not seem to require too much more work.

If we want to do it correctly, it does require a really good amount of work for a first contribution (we need to adapt part of the class hierarchy).

I have some questions regarding sklearn/neighbors. It seems like _kd_tree.pyx.tp and _ball_tree.pyx.tp are generated from _binary_tree.pxi.tp. Is that the case?

The later is included in the two firsts after tempita expension.

jjerphan · 2024-06-25T18:36:50Z

What was your and others workflow for _kd_tree.pyx.tp and _binary_tree.pxi.tp? Thanks so much!

If would recommend starting with using Cython in Jupyter notebook.

kyrajeep · 2024-06-25T19:27:25Z

Great, thanks for letting me know. I will take another look and reach out to the previous contributor. That looks like a good place to start!

kyrajeep · 2024-06-27T03:49:45Z

I have not got a reply about the status of the PR #26032 yet. How should I proceed? Should I wait longer?

jjerphan · 2024-06-27T06:11:24Z

Responses in the open-source projects sometimes take weeks. You can start a draft PR, mentioning it there. :)

adam2392 · 2024-06-27T11:17:12Z

Hi feel free to move the work along @kyrajeep.

I never got around to figuring out how the tempita cython code works.

Found it harder than reading C++ 😝. Hope to revisit this issue though to contribute down the line.

jjerphan · 2024-06-27T13:12:36Z

I never got around to figuring out how the tempita cython code works.
Found it harder than reading C++ 😝

Some maintainers share this perspective, some do not.

I think there would be benefits to using something else than Tempita, but that would take some time and mental bandwidth (which I don't have personally).

OmarManzoor · 2024-06-27T13:17:02Z

Plus I think Tempita is used quite a lot now in the project so it seems like a reasonable standard.

kyrajeep · 2024-06-27T14:15:08Z

Thank you for the update @adam2392 . @jjerphan, do you have a suggestion for where to start with tempita cython? I did the short tutorial on cython linked above on my local MacBook M1. It was a simple .pyx script with a setup.py.

I have never used tempita before. Thank you 😊

jjerphan · 2024-06-27T15:19:18Z

Tempita is a micro pre-processor for Cython. We use Tempita to workaround the absence of template class in Cython so that we can handle float32 and float64 while maintaining single implementations.

I would recommend writing implementation for float64 first. Porting an implementation to float32 with Tempita can be inferred easily (e.g. see this simple implementation)

Regarding getting started, I hope the instructions regarding Cython in the documentation of scikit-learn and scikit-learn's tooling are clear and helpful enough.

kyrajeep · 2024-06-28T16:07:22Z

Thanks! Yes, this is a great start point.

kyrajeep · 2024-07-02T19:46:21Z

I started to work on the issue but realized I did not know what 'precomputed' meant exactly. Does it mean that the user already knows the distance between the two matrices? If so, does that chunk of data bypass computing the distance altogether? Thank you.

Micky774 · 2024-07-03T02:02:10Z

I started to work on the issue but realized I did not know what 'precomputed' meant exactly. Does it mean that the user already knows the distance between the two matrices?

@kyrajeep Yes. In that sense, you have an array D where entry D[i, j] is the distance between points X[i], X[j]

If so, does that chunk of data bypass computing the distance altogether? Thank you.

Yes. The goal is to have something that "looks" like the other PairwiseDistanceReductions, but behaves in a very special way. That way, code that expects such an object can be kept the same, but benefits from not having to do any new computation. So, if we normally expect a PairwiseDistanceReduction to "look" like a metric, then we want our special precomputed version to have the same API and usage, but without computation since we already have the answers and just need to "look up" the answers and return them.

I'd strongly recommend starting with the work that @adam2392 has done in #26032 and taking over the PR (while preserving them as a co-author). This means checking out those changes and including them in your own, new branch, and finishing whatever there is left to be done, instead of starting from scratch.

kyrajeep · 2024-07-05T23:43:25Z

Thank you, @Micky774 for the clarification. I ran 'gh pr checkout 26032' to have a local copy of the PR by @adam2392. As scikit-learn has other commits, how should I fetch/pull from the remote repo? I have tried this in the past with confusing results at times so I'd appreciate your input. Thank you.

Micky774 · 2024-07-05T23:46:11Z

scikit-learn has other commits, how should I fetch/pull from the remote repo? I have tried this in the past with confusing results at times so I'd appreciate your input.

In general you should be able to cleanly merge changes from the upstream main branch (except maybe a few simple conflicts here and there). So e.g.

git checkout <PR branch>
git merge main

if there are big conflicts, they can be analyzed individually

OmarManzoor · 2024-07-22T12:51:57Z

Hi @jjerphan

I was experimenting with mst_linkage_core
and it seems if we want to adapt a pairwise distances implementation for this function, we need to compute all the distances before handling the actual conditional code in the inner loop. This is because I don't think it is possible to share variables like current_node, new_distance and new_node between different threads.

On the other hand the issue with computing the complete distance matrix is that it would require considerably higher memory for larger data sizes, even if it allows us to speed up the code.

What do you think regarding this?

jjerphan · 2024-07-22T18:34:02Z

Hi Omar,

I need to have a look at it in detail to be able to tell.

OmarManzoor · 2024-07-23T05:30:24Z

I need to have a look at it in detail to be able to tell.

Sure

jjerphan · 2024-07-23T18:10:40Z

I do not have time to have a look at it, unfortunately.

OmarManzoor · 2024-07-24T05:33:43Z

I do not have time to have a look at it, unfortunately.

No problem at all.

kyrajeep · 2024-11-18T18:59:05Z

The tests have been added for #29483 and I believe the code is ready for a review! Sorry it took a while!

kyrajeep · 2024-12-09T19:06:44Z

@jjerphan, the review comments for #29483 have been addressed!

github-actions bot added the Needs Triage Issue requires triage label Mar 17, 2023

jjerphan mentioned this issue Mar 17, 2023

PERF PairwiseDistancesReductions initial work #22587

Closed

jjerphan added cython Long Term Meta-issue General issue associated to an identified list of tasks Performance and removed Needs Triage Issue requires triage labels Mar 17, 2023

jjerphan mentioned this issue Mar 20, 2023

FEA Introduce PairwiseDistances, a generic back-end for pairwise_distances #25915

Closed

adam2392 mentioned this issue Mar 30, 2023

[ENH, DRAFT] Implement Precomputed distances in the Pairwise Distances Reduction framework #26032

Draft

OmarManzoor mentioned this issue May 2, 2023

ENH add pairwise distances backend for LocalOutlierFactor #26316

Closed

OmarManzoor mentioned this issue Jan 22, 2024

PERF Euclidean Specialization for ArgKminClassMode #28219

Open

lorentzenchr removed the Long Term label Mar 14, 2024

jjerphan mentioned this issue Jun 5, 2024

FIX use correct formulation for ExpSineSquared kernel #20070

Draft

3 tasks

kyrajeep mentioned this issue Jul 2, 2024

[WIP] Draft: Precompute feature for PairwiseDistancesReductions #29391

Open

kyrajeep mentioned this issue Jul 15, 2024

FEAT Support precomputed distance matrix for PairwiseDistancesReductions #29483

Open

PERF PairwiseDistancesReductions subsequent work #25888

PERF PairwiseDistancesReductions subsequent work #25888

Comments

jjerphan commented Mar 17, 2023 • edited Loading

jjerphan commented Mar 17, 2023

OmarManzoor commented Mar 17, 2023

adam2392 commented Mar 17, 2023

jjerphan commented Mar 17, 2023

OmarManzoor commented Mar 20, 2023

jjerphan commented Mar 20, 2023 • edited Loading

OmarManzoor commented Mar 20, 2023

jjerphan commented Mar 20, 2023

OmarManzoor commented Mar 20, 2023

jjerphan commented Mar 20, 2023 • edited Loading

OmarManzoor commented Mar 20, 2023

jjerphan commented Mar 20, 2023

OmarManzoor commented Mar 30, 2023

jjerphan commented Mar 30, 2023

OmarManzoor commented Mar 31, 2023 • edited Loading

jjerphan commented Mar 31, 2023 • edited Loading

OmarManzoor commented May 30, 2023 • edited Loading

jjerphan commented May 30, 2023

OmarManzoor commented May 30, 2023

Micky774 commented May 30, 2023 • edited Loading

kyrajeep commented Jun 17, 2024

jjerphan commented Jun 17, 2024 • edited Loading

kyrajeep commented Jun 25, 2024

jjerphan commented Jun 25, 2024 • edited Loading

jjerphan commented Jun 25, 2024 • edited Loading

kyrajeep commented Jun 25, 2024

kyrajeep commented Jun 27, 2024 • edited Loading

jjerphan commented Jun 27, 2024

adam2392 commented Jun 27, 2024

jjerphan commented Jun 27, 2024

OmarManzoor commented Jun 27, 2024

kyrajeep commented Jun 27, 2024 • edited Loading

jjerphan commented Jun 27, 2024

kyrajeep commented Jun 28, 2024

kyrajeep commented Jul 2, 2024 • edited Loading

Micky774 commented Jul 3, 2024

kyrajeep commented Jul 5, 2024

Micky774 commented Jul 5, 2024

OmarManzoor commented Jul 22, 2024 • edited Loading

jjerphan commented Jul 22, 2024

OmarManzoor commented Jul 23, 2024

jjerphan commented Jul 23, 2024

OmarManzoor commented Jul 24, 2024

kyrajeep commented Nov 18, 2024 • edited Loading

kyrajeep commented Dec 9, 2024

PERF `PairwiseDistancesReductions` subsequent work #25888

PERF `PairwiseDistancesReductions` subsequent work #25888

jjerphan commented Mar 17, 2023 •

edited

Loading

jjerphan commented Mar 20, 2023 •

edited

Loading

jjerphan commented Mar 20, 2023 •

edited

Loading

OmarManzoor commented Mar 31, 2023 •

edited

Loading

jjerphan commented Mar 31, 2023 •

edited

Loading

OmarManzoor commented May 30, 2023 •

edited

Loading

Micky774 commented May 30, 2023 •

edited

Loading

jjerphan commented Jun 17, 2024 •

edited

Loading

jjerphan commented Jun 25, 2024 •

edited

Loading

jjerphan commented Jun 25, 2024 •

edited

Loading

kyrajeep commented Jun 27, 2024 •

edited

Loading

kyrajeep commented Jun 27, 2024 •

edited

Loading

kyrajeep commented Jul 2, 2024 •

edited

Loading

OmarManzoor commented Jul 22, 2024 •

edited

Loading

kyrajeep commented Nov 18, 2024 •

edited

Loading