MAINT Plug `PairwiseDistancesArgKmin` as a back-end #22214

jjerphan · 2022-01-14T17:57:42Z

Reference Issues/PRs

Part of #21462 (comment).

What does this implement/fix? Explain your changes.

This plugs PairwiseDistancesArgKmin as a back-end so that it is used for some API, namely:

sklearn.metrics.pairwise_distances_argmin
sklearn.metrics.pairwise_distances_argmin_min
sklearn.cluster.AffinityPropagation
sklearn.cluster.Birch
sklearn.cluster.MeanShift
sklearn.cluster.OPTICS
sklearn.cluster.SpectralClustering
sklearn.feature_selection.mutual_info_regression
sklearn.neighbors.KNeighborsClassifier
sklearn.neighbors.KNeighborsRegressor
sklearn.neighbors.LocalOutlierFactor
sklearn.neighbors.NearestNeighbors
sklearn.manifold.Isomap
sklearn.manifold.LocallyLinearEmbedding
sklearn.manifold.TSNE
sklearn.manifold.trustworthiness
sklearn.semi_supervised.LabelPropagation
sklearn.semi_supervised.LabelSpreading

Existing tests are adapted and completed accordingly.

Any other comments?

Currently checking if any original change has been left behind.

thomasjpfan

Here is a quick first pass

sklearn/metrics/_pairwise_distances_reduction.pyx

sklearn/metrics/tests/test_dist_metrics.py

sklearn/neighbors/_base.py

thomasjpfan · 2022-01-15T20:42:10Z

sklearn/metrics/pairwise.py

+        )
+        indices = indices.flatten()
+    else:
+        # TODO: once PairwiseDistancesArgKmin supports sparse input matrices and 32 bit,


Are there options to get 32bit to work besides `Tempita?

Apart from duplication based on dtype in Cython, I hardly can see something else.

C++ template classes would come in handy here.

I guess, we can start with Tempita first and see how this makes the code more complex.

What do the others think?

If it works with fused types in Cython, that would be great. If not, C++ and Tempita a both fine for me.

Fused types can't be used for attributes unfortunately (hence the need to use Tempita similarly to what exists for WeightVectors).

ogrisel · 2022-01-18T11:00:15Z

sklearn/neighbors/_base.py

            raise ValueError("p must be greater or equal to one for minkowski metric")

    def _fit(self, X, y=None):
        if self._get_tags()["requires_y"]:
            if not isinstance(X, (KDTree, BallTree, NeighborsBase)):
-                X, y = self._validate_data(X, y, accept_sparse="csr", multi_output=True)
+                X, y = self._validate_data(
+                    X, y, accept_sparse="csr", multi_output=True, order="C"


Note to other reviewers: this is required by the new Cython implementation. It triggers a potential memory copy in the rare cases where X is not already contiguous (even in cases when we cannot use the new Cython code (e.g. for float32 data that are not supported yet).

But as the plan will be to also write the Cython code for the 32 bit float in the future, let's keep the input data validation logic simpler by always enforcing contiguity from the start.

jjerphan · 2022-01-19T09:04:54Z

#22241 has been opened to fix a test failure.

doc/whats_new/v1.1.rst

thomasjpfan

Looking over the tests, a bunch of the changes look like refactors for using pytest's parametrize. I think they can be broken out into their own PRs and merged directly into main.

Implementation looks good to me. The major change is having to set order="C" everywhere, which I think that is okay.

sklearn/neighbors/_classification.py

thomasjpfan · 2022-01-22T19:01:10Z

sklearn/multioutput.py

@@ -399,7 +399,7 @@ class MultiOutputClassifier(ClassifierMixin, _MultiOutputEstimator):
    >>> X, y = make_multilabel_classification(n_classes=3, random_state=0)
    >>> clf = MultiOutputClassifier(KNeighborsClassifier()).fit(X, y)
    >>> clf.predict(X[-2:])
-    array([[1, 1, 0], [1, 1, 1]])
+    array([[1, 1, 1], [1, 1, 1]])


Does this mean the new implementation gives different results than the previous one?

test_neighbors_metrics normally test consistency of the three algorithms' results.
Yet, in the specific case of this doc-test, the result is different.

I think the best is to test the new 'brute' back-end against the current implementation.
I will come up with test for this.

sklearn/neighbors/tests/test_neighbors.py

jjerphan · 2022-01-24T09:30:44Z

@thomasjpfan: To address your comments, some changes made in this PR have been extracted in #22280 and #22281.

jjerphan · 2022-01-24T20:44:58Z

Superseded by #22288.

github-actions bot added module:metrics module:neighbors cython labels Jan 14, 2022

jjerphan changed the title ~~Forward pairwise_dist_chunk_size in the configuration~~ MAINT Forward pairwise_dist_chunk_size in the configuration Jan 14, 2022

jjerphan mentioned this pull request Jan 14, 2022

ENH Pairwise Distances ArgKmin #21462

Closed

jjerphan changed the title ~~MAINT Forward pairwise_dist_chunk_size in the configuration~~ MAINT Plug PairwiseDistancesArgKmin as a back-end Jan 14, 2022

jjerphan added the No Changelog Needed label Jan 14, 2022

jjerphan marked this pull request as ready for review January 14, 2022 18:12

jjerphan requested review from thomasjpfan and ogrisel January 14, 2022 18:12

jjerphan marked this pull request as draft January 14, 2022 18:55

thomasjpfan reviewed Jan 15, 2022

View reviewed changes

ogrisel mentioned this pull request Jan 18, 2022

Change input validation order for kneighbors jjerphan/scikit-learn#8

Merged

ogrisel reviewed Jan 18, 2022

View reviewed changes

jjerphan commented Jan 19, 2022

View reviewed changes

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved

jjerphan marked this pull request as ready for review January 19, 2022 10:00

jjerphan force-pushed the pairwise-distances-argkmin-plug branch from dce919f to 3448b01 Compare January 20, 2022 15:49

thomasjpfan reviewed Jan 22, 2022

View reviewed changes

This was referenced Jan 24, 2022

MAINT Do not compute distances for uniform weighting for knn-based estimators' predictions #22280

Merged

TST Add some tests parametrization for test_neighbors #22281

Merged

jjerphan closed this Jan 24, 2022

jjerphan force-pushed the pairwise-distances-argkmin-plug branch from cb27fdf to 92a23ba Compare January 24, 2022 19:35

jjerphan mentioned this pull request Jan 24, 2022

MAINT Plug PairwiseDistancesArgKmin as a back-end #22288

Merged

Uh oh!

MAINT Plug PairwiseDistancesArgKmin as a back-end #22214

MAINT Plug PairwiseDistancesArgKmin as a back-end #22214

Uh oh!

Conversation

jjerphan commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan Jan 15, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan Jan 15, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Jan 22, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan Jan 22, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan commented Jan 19, 2022

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomasjpfan Jan 22, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan commented Jan 24, 2022

Uh oh!

jjerphan commented Jan 24, 2022

Uh oh!

Uh oh!

MAINT Plug `PairwiseDistancesArgKmin` as a back-end #22214

MAINT Plug `PairwiseDistancesArgKmin` as a back-end #22214

jjerphan commented Jan 14, 2022 •

edited

Loading

ogrisel Jan 18, 2022 •

edited

Loading