FIX wminkowski removed in 1.8.0.dev0 #21741

ogrisel · 2021-11-22T14:24:08Z

This should fix the scipy-dev nightly build failure:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=35203&view=logs&j=dfe99b15-50db-5d7b-b1e9-4105c42527cf&t=ef785ae2-496b-5b02-9f0e-07a6c3ab3081

________________________ test_cdist[X10-X20-wminkowski] ________________________
[gw0] linux -- Python 3.10.0 /usr/share/miniconda/envs/testvenv/bin/python

[...]
        elif isinstance(metric, str):
            mstr = metric.lower()
            metric_info = _METRIC_ALIAS.get(mstr, None)
            if metric_info is not None:
                cdist_fn = metric_info.cdist_func
                return cdist_fn(XA, XB, out=out, **kwargs)
            elif mstr.startswith("test_"):
                metric_info = _TEST_METRICS.get(mstr, None)
                if metric_info is None:
                    raise ValueError(f'Unknown "Test" Distance Metric: {mstr[5:]}')
                XA, XB, typ, kwargs = _validate_cdist_input(
                    XA, XB, mA, mB, n, metric_info, **kwargs)
                return _cdist_callable(
                    XA, XB, metric=metric_info.dist_func, out=out, **kwargs)
            else:
>               raise ValueError('Unknown Distance Metric: %s' % mstr)
E               ValueError: Unknown Distance Metric: wminkowski

I tested locally and it seems to work as expected.

EDIT: to be consistent with scipy 1.8 overall, it was also necessary to make it possible to pass a w parameter when metric="minkowski" in which case the weights are to be raised to the power p to be consistent with scipy 1.8.

ogrisel · 2021-11-22T14:26:08Z

There is no need for backport because it will only impact the nightly build before the release of 1.8.0 final, and only the scikit-learn main branch.

glemaitre

LGTM. Does it mean that parse_version does not parse dev and rc numbers?

ogrisel · 2021-11-22T15:22:20Z

LGTM. Does it mean that parse_version does not parse dev and rc numbers?

parse_version works correctly, it's just than:

parse_version("1.7.2") < parse_version("1.8.0.dev0") < parse_version("1.8.0.dev0.whatever") < parse_version("1.8.0")

as expected. It's just our code that was wrong since the wminkowski has been removed in the dev version, after any 1.7.x but before 1.8.0 and it crashes the nightly build that runs against 1.8.0.dev0.something.

glemaitre · 2021-11-22T15:34:45Z

as expected. It's just our code that was wrong since the wminkowski has been removed in the dev version, after any 1.7.x but before 1.8.0 and it crashes the nightly build that runs against 1.8.0.dev0.something.

Right. Is parse_version doing some loose checking parse_version("1.8")?

adrinjalali · 2021-11-22T15:53:19Z

It'd be nice to do a scipy dev commit before merging

ogrisel · 2021-11-22T16:11:32Z

It'd be nice to do a scipy dev commit before merging

Forgot we could do this ;)

ogrisel · 2021-11-22T16:39:21Z

Hum I realize that there are plenty of other failures in the original log caused by a new DeprecationWarning in np.percentile:

E DeprecationWarning the `interpolation=` argument to percentile was renamed to `method=`, which has additional options.
E       Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they. (Deprecated NumPy 1.22)
`np.percentile`,

ogrisel · 2021-11-22T17:21:10Z

There is another failure in test_neighbors_metrics related to wminkowski's removal. I will fix it as part of this PR. The other failures should best be tackled in a dedicated PR.

…wski

ogrisel · 2021-11-23T10:40:57Z

I pushed a new commit to make the code work consistently with scipy 1.8 while still being backward compatible with the scipy 1.7 and earlier parametrization. I think we need to backport this 1.0.2 to make this release compatible with 1.8 that might be released approximately at the same time and very probably before we release 1.1.

sklearn/metrics/_dist_metrics.pyx

adrinjalali

Otherwise LGTM.

This error is making it hard to see if there's any relevant error in the logs though:

E       DeprecationWarning: the `interpolation=` argument to percentile was renamed to `method=`, which has additional options.
E       Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they. (Deprecated NumPy 1.22)

adrinjalali · 2021-11-23T13:17:34Z

sklearn/metrics/_dist_metrics.pyx

+                # WMinkowskiDistance in sklearn implements the weighting
+                # scheme of the old 'wminkowski' in scipy < 1.8 and the
+                # following adaptation:
+                return WMinkowskiDistance(p, w ** (1/p), **kwargs)


The issue then is that the user can create a WMinkowskiDistance(p, q, **kwargs) themself, which would be different than get_metric("wminkowski", p, w), and kinda similar to get_metric("minkowski", p, w), but with a transformation on w. I think this can cause quite a bit of confusion, and I'm not sure how to solve it.

If we remove "wminkowski", then it'd be more consistent I guess

I can rewrite MinkowskiDistance and WMinkowskiDistance to implement the new scheme in MinkowskiDistance directly and make WMinkowskiDistance inherit from MinkowskiDistance with the transformation of the weight in its constructor and leave DistanceMetric.get_metric to be more straightforward.

And then later we can deprecate "wminkowski" (but probably after a year or two, once scipy 1.8 is commonly installed).

I think it will be easier to do this change once we no longer support versions of scipy (pre-1.6) where wminkowski is not deprecated.

I think this PR is good enough for now. WDYT?

I think this PR is good enough for now. WDYT?

I agree with you and believe the steps you gave in the first comment are proper ones to take in another PR.

I'm looking at the metric across the board, and here's what I see:

_dist_metrics.pyx:558-594

# W-Minkowski Distance # d = sum(w_i^p * (x_i^p - y_i^p)) ^ (1/p) cdef class WMinkowskiDistance(DistanceMetric): r"""Weighted Minkowski Distance .. math:: D(x, y) = [\sum_i |w_i * (x_i - y_i)|^p] ^ (1/p)

the comment and the docstring are inconsistent, but the docstring seems to be consistent with the implementation:

for j in range(size): d += pow(self.vec[j] * fabs(x1[j] - x2[j]), self.p)

Now this PR is making this change (on the minkowski part):

"minkowski" MinkowskiDistance p, w ``sum(w * |x - y|^p)^(1/p)`` "wminkowski" WMinkowskiDistance p, w ``sum(|w * (x - y)|^p)^(1/p)``

and the two are just different and not consistent with one another.

Since scipy has deprecated wminkowski already for a while, and now it's removed, I'd say it makes sense for this PR to deprecate WMinkowskiDistance, and make everything consistent with scipy? Especially since from what I see, it seems to be the case that we also have been a little bit flaky on how we incorporate w there, and now is a good time to just stick to one model.

Actually, if one of the component of w is negative, WMinkowskiDistance's call to pow might crash.

Edit: actually, this gives nan on main:

from sklearn.neighbors import DistanceMetric import numpy as np rng = np.random.RandomState(1) X = rng.rand(10, 100) w = rng.rand(X.shape[1],) w[0] = -1337 dist = DistanceMetric.get_metric("wminkowski", p=3, w=w) # This produces nan dm = dist.pairwise(X) print(dm)

[[ 0. nan nan nan nan nan nan nan nan nan] [nan 0. nan nan nan nan nan nan nan nan] [nan nan 0. nan nan nan nan nan nan nan] [nan nan nan 0. nan nan nan nan nan nan] [nan nan nan nan 0. nan nan nan nan nan] [nan nan nan nan nan 0. nan nan nan nan] [nan nan nan nan nan nan 0. nan nan nan] [nan nan nan nan nan nan nan 0. nan nan] [nan nan nan nan nan nan nan nan 0. nan] [nan nan nan nan nan nan nan nan nan 0.]]

Since scipy has deprecated wminkowski already for a while, and now it's removed, I'd say it makes sense for this PR to deprecate WMinkowskiDistance, and make everything consistent with scipy? Especially since from what I see, it seems to be the case that we also have been a little bit flaky on how we incorporate w there, and now is a good time to just stick to one model.

I intended this PR to be minimal so as to be backportable to 1.0.2.

I would rather not deprecate a scikit-learn component in a scikit-learn minor release, even if that component maps more or less directly to a deprecated components in a dependency.

I can do the proposed change but I think it should target 1.1 and then 1.0.2 will have 2 broken tests under scipy 1.8 but maybe we don't care. Or maybe we care because it can make the release process automation fail...

Actually, if one of the component of w is negative, WMinkowskiDistance's call to pow might crash.

I don't think we care about negative weights. We could raise a better error message but this out of the scope of this PR.

the comment and the docstring are inconsistent, but the docstring seems to be consistent with the implementation

The inline comment is just plain wrong. Let me delete it, it's just useless.

Now this PR is making this change (on the minkowski part):
"minkowski" MinkowskiDistance p, w sum(w * |x - y|^p)^(1/p)
"wminkowski" WMinkowskiDistance p, w sum(|w * (x - y)|^p)^(1/p)
and the two are just different and not consistent with one another.

This is correct. DistanceMetric.get_metric("minkowski", p, w) is not the same as DistanceMetric.get_metric("minkowski", p, w) as you need to take the p-root or p-power of one w to get the equivalent w of the other. This is all consistent with what scipy does.

jjerphan

LGTM. Thank you, @ogrisel.

Some minor suggestions.

sklearn/metrics/_dist_metrics.pyx

jjerphan · 2021-11-23T16:01:14Z

sklearn/metrics/_dist_metrics.pyx

+                # WMinkowskiDistance in sklearn implements the weighting
+                # scheme of the old 'wminkowski' in scipy < 1.8 and the
+                # following adaptation:
+                return WMinkowskiDistance(p, w ** (1/p), **kwargs)


I think this PR is good enough for now. WDYT?

I agree with you and believe the steps you gave in the first comment are proper ones to take in another PR.

sklearn/metrics/tests/test_dist_metrics.py

jjerphan · 2021-11-23T16:09:26Z

sklearn/metrics/tests/test_dist_metrics.py

+def test_cdist(metric_param_grid, X1, X2):
+    metric, param_grid = metric_param_grid
+    keys = param_grid.keys()
+    for vals in itertools.product(*param_grid.values()):
        kwargs = dict(zip(keys, vals))
        if metric == "mahalanobis":
            # See: https://github.com/scipy/scipy/issues/13861
-            pytest.xfail("scipy#13861: cdist with 'mahalanobis' fails onmemmap data")
-        elif metric == "wminkowski":
-            if sp_version >= parse_version("1.8.0"):
-                pytest.skip("wminkowski will be removed in SciPy 1.8.0")
-
-            # wminkoski is deprecated in SciPy 1.6.0 and removed in 1.8.0
-            ExceptionToAssert = None
-            if sp_version >= parse_version("1.6.0"):
-                ExceptionToAssert = DeprecationWarning
-            with pytest.warns(ExceptionToAssert):
-                D_true = cdist(X1, X2, metric, **kwargs)
-        else:
-            D_true = cdist(X1, X2, metric, **kwargs)
-
-        check_cdist(metric, kwargs, D_true)
+            # Possibly caused by: https://github.com/joblib/joblib/issues/563
+            pytest.xfail(
+                "scipy#13861: cdist with 'mahalanobis' fails on joblib memmap data"
+            )
+        check_cdist(metric, kwargs, X1, X2)


This test is way clearer now: thanks 👍

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

adrinjalali

I agree that this is suitable for the minor release, but it'd be nice if you could do a follow up PR for 1.1 :)

… MinkowskiDistance's docstring

ogrisel · 2021-11-23T18:02:26Z

I agree that this is suitable for the minor release, but it'd be nice if you could do a follow up PR for 1.1 :)

I opened #21765 ;)

sklearn/metrics/_dist_metrics.pyx

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

This follow-ups with scikit-learn#21741

This is follow-up for scikit-learn#21741.

MAINT wminkowski removed in 1.8.0.dev0

7c88bec

github-actions bot added the module:metrics label Nov 22, 2021

ogrisel added the Build / CI label Nov 22, 2021

ogrisel added this to the 1.0.2 milestone Nov 22, 2021

ogrisel added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Nov 22, 2021

ogrisel removed this from the 1.0.2 milestone Nov 22, 2021

ogrisel removed the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Nov 22, 2021

ogrisel added the No Changelog Needed label Nov 22, 2021

glemaitre reviewed Nov 22, 2021

View reviewed changes

adrinjalali approved these changes Nov 22, 2021

View reviewed changes

[scipy-dev]

6caa195

Fix compat for minkowski in scipy 1.8 with backward compat for wminko…

8c95060

…wski

github-actions bot added the cython label Nov 23, 2021

ogrisel added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Nov 23, 2021

ogrisel added this to the 1.0.2 milestone Nov 23, 2021

ogrisel changed the title ~~MAINT wminkowski removed in 1.8.0.dev0~~ FIX wminkowski removed in 1.8.0.dev0 Nov 23, 2021

[scipy-dev]

b5526dc

ogrisel removed the No Changelog Needed label Nov 23, 2021

ogrisel commented Nov 23, 2021

View reviewed changes

sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved

ogrisel requested a review from adrinjalali November 23, 2021 11:02

adrinjalali reviewed Nov 23, 2021

View reviewed changes

jjerphan approved these changes Nov 23, 2021

View reviewed changes

Typos and phrasing from code review

d57efcb

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

adrinjalali approved these changes Nov 23, 2021

View reviewed changes

Remove wrong and useless inline comments and add missing abs value in…

31fe12f

… MinkowskiDistance's docstring

ogrisel mentioned this pull request Nov 23, 2021

Deprecate WMinkowskiDistance and make MinkowskiDistance accept weights directly #21765

Closed

ogrisel commented Nov 23, 2021

View reviewed changes

sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved

Phrasing in comment

fa7bdaf

jjerphan merged commit f924bc8 into scikit-learn:main Nov 23, 2021

ogrisel deleted the main-skip-wminkowski-in-test_cdist branch November 24, 2021 13:20

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021

FIX wminkowski removed in 1.8.0.dev0 (scikit-learn#21741)

8c0ec41

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

FIX wminkowski removed in 1.8.0.dev0 (scikit-learn#21741)

9003f37

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021

FIX wminkowski removed in 1.8.0.dev0 (scikit-learn#21741)

a4a13a8

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

glemaitre pushed a commit that referenced this pull request Dec 25, 2021

FIX wminkowski removed in 1.8.0.dev0 (#21741)

ca71a3d

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 18, 2022

MAINT Fallback to 'brute' when using 'wminkowski' for 'kdtree'

9c1e318

This follow-ups with scikit-learn#21741

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 18, 2022

MAINT Fallback to 'brute' when using 'wminkowski' for 'kdtree'

3812d4f

This is follow-up for scikit-learn#21741.

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 18, 2022

MAINT Fallback to 'brute' when using 'wminkowski' for 'kdtree'

cde97ae

This is follow-up for scikit-learn#21741.

jjerphan mentioned this pull request Jan 18, 2022

FIX Fallback to ball_tree using minkowski with w for kd_tree #22241

Merged

Uh oh!

FIX wminkowski removed in 1.8.0.dev0 #21741

FIX wminkowski removed in 1.8.0.dev0 #21741

Uh oh!

Conversation

ogrisel commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Nov 22, 2021

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Nov 22, 2021

Uh oh!

adrinjalali commented Nov 22, 2021

Uh oh!

ogrisel commented Nov 22, 2021

Uh oh!

ogrisel commented Nov 22, 2021

Uh oh!

ogrisel commented Nov 22, 2021

Uh oh!

ogrisel commented Nov 23, 2021

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali Nov 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan Nov 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Nov 23, 2021

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Nov 22, 2021 •

edited

Loading

ogrisel commented Nov 22, 2021 •

edited

Loading

adrinjalali Nov 23, 2021 •

edited

Loading

jjerphan Nov 23, 2021 •

edited

Loading