Skip to content

MNT Deprecate metrics.pairwise.paired_*_distances and paired_distances public functions #27129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1133,10 +1133,6 @@ See the :ref:`metrics` section of the user guide for further details.
metrics.pairwise.polynomial_kernel
metrics.pairwise.rbf_kernel
metrics.pairwise.sigmoid_kernel
metrics.pairwise.paired_euclidean_distances
metrics.pairwise.paired_manhattan_distances
metrics.pairwise.paired_cosine_distances
metrics.pairwise.paired_distances
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be moved to

Recently deprecated
===================
?

metrics.pairwise_distances
metrics.pairwise_distances_argmin
metrics.pairwise_distances_argmin_min
Expand Down
7 changes: 7 additions & 0 deletions doc/whats_new/v1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,13 @@ Changelog
:func:`metrics.root_mean_squared_log_error` instead.
:pr:`26734` by :user:`Alejandro Martin Gil <101AlexMartin>`.

- |API| :func:`~metrics.pairwise.paired_distances`,
:func:`~metrics.pairwise.paired_euclidean_distances`,
:func:`~metrics.pairwise.paired_manhattan_distances` and
:func:`~metrics.pairwise.paired_cosine_distances` are now deprecated and
will be removed in 1.6.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 1.4 is released, this changelog entry needs to be moved to v1.5.rst and:

Suggested change
will be removed in 1.6.
will be removed in 1.7.

:pr:`27129` by :user:`Shreesha Kumar Bhat <Shreesha3112>`.

- |Fix| :func:`metrics.make_scorer` now raises an error when using a regressor on a
scorer requesting a non-thresholded decision function (from `decision_function` or
`predict_proba`). Such scorer are specific to classification.
Expand Down
4 changes: 2 additions & 2 deletions sklearn/cluster/_agglomerative.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
)
from ..metrics import DistanceMetric
from ..metrics._dist_metrics import METRIC_MAPPING64
from ..metrics.pairwise import _VALID_METRICS, paired_distances
from ..metrics.pairwise import _VALID_METRICS, _paired_distances
from ..utils import check_array
from ..utils._fast_dict import IntFloatDict
from ..utils._param_validation import (
Expand Down Expand Up @@ -588,7 +588,7 @@ def linkage_tree(
else:
# FIXME We compute all the distances, while we could have only computed
# the "interesting" distances
distances = paired_distances(
distances = _paired_distances(
X[connectivity.row], X[connectivity.col], metric=affinity
)
connectivity.data = distances
Expand Down
4 changes: 2 additions & 2 deletions sklearn/cluster/tests/test_hierarchical.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from sklearn.metrics import DistanceMetric
from sklearn.metrics.cluster import adjusted_rand_score, normalized_mutual_info_score
from sklearn.metrics.pairwise import (
PAIRED_DISTANCES,
_PAIRED_DISTANCES,
cosine_distances,
manhattan_distances,
pairwise_distances,
Expand Down Expand Up @@ -237,7 +237,7 @@ def test_agglomerative_clustering(global_random_seed, lil_container):
clustering.fit(X)

# Test using another metric than euclidean works with linkage complete
for metric in PAIRED_DISTANCES.keys():
for metric in _PAIRED_DISTANCES.keys():
# Compare our (structured) implementation to scipy
clustering = AgglomerativeClustering(
n_clusters=10,
Expand Down
156 changes: 154 additions & 2 deletions sklearn/metrics/pairwise.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
StrOptions,
validate_params,
)
from ..utils.deprecation import deprecated
from ..utils.extmath import row_norms, safe_sparse_dot
from ..utils.fixes import parse_version, sp_base_version
from ..utils.parallel import Parallel, delayed
Expand Down Expand Up @@ -1129,6 +1130,11 @@ def cosine_distances(X, Y=None):
return S


# TODO(1.6): Remove in 1.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO(1.6): Remove in 1.6
# TODO(1.7): Remove in 1.7

@deprecated(
"The public function `sklearn.pairwise.paired_euclidean_distances` has been "
"deprecated in 1.4 and will be removed in 1.6."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"deprecated in 1.4 and will be removed in 1.6."
"deprecated in 1.5 and will be removed in 1.7."

)
# Paired distances
@validate_params(
{"X": ["array-like", "sparse matrix"], "Y": ["array-like", "sparse matrix"]},
Expand All @@ -1139,6 +1145,29 @@ def paired_euclidean_distances(X, Y):

Read more in the :ref:`User Guide <metrics>`.

Parameters
----------
X : {array-like, sparse matrix} of shape (n_samples, n_features)
Input array/matrix X.

Y : {array-like, sparse matrix} of shape (n_samples, n_features)
Input array/matrix Y.

Returns
-------
distances : ndarray of shape (n_samples,)
Output array/matrix containing the calculated paired euclidean
distances.
"""

return _paired_euclidean_distances(X, Y)


def _paired_euclidean_distances(X, Y):
"""Compute the paired euclidean distances between X and Y.

Read more in the :ref:`User Guide <metrics>`.

Parameters
----------
X : {array-like, sparse matrix} of shape (n_samples, n_features)
Expand All @@ -1157,6 +1186,11 @@ def paired_euclidean_distances(X, Y):
return row_norms(X - Y)


# TODO(1.6): Remove in 1.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO(1.6): Remove in 1.6
# TODO(1.7): Remove in 1.7

@deprecated(
"The public function `sklearn.pairwise.paired_manhattan_distances` has been "
"deprecated in 1.4 and will be removed in 1.6."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"deprecated in 1.4 and will be removed in 1.6."
"deprecated in 1.5 and will be removed in 1.7."

)
@validate_params(
{"X": ["array-like", "sparse matrix"], "Y": ["array-like", "sparse matrix"]},
prefer_skip_nested_validation=True,
Expand Down Expand Up @@ -1192,6 +1226,31 @@ def paired_manhattan_distances(X, Y):
>>> paired_manhattan_distances(X, Y)
array([1., 2., 1.])
"""
return _paired_manhattan_distances(X, Y)


def _paired_manhattan_distances(X, Y):
"""Compute the paired L1 distances between X and Y.

Distances are calculated between (X[0], Y[0]), (X[1], Y[1]), ...,
(X[n_samples], Y[n_samples]).

Read more in the :ref:`User Guide <metrics>`.

Parameters
----------
X : {array-like, sparse matrix} of shape (n_samples, n_features)
An array-like where each row is a sample and each column is a feature.

Y : {array-like, sparse matrix} of shape (n_samples, n_features)
An array-like where each row is a sample and each column is a feature.

Returns
-------
distances : ndarray of shape (n_samples,)
L1 paired distances between the row vectors of `X`
and the row vectors of `Y`.
"""
X, Y = check_paired_arrays(X, Y)
diff = X - Y
if issparse(diff):
Expand All @@ -1201,6 +1260,11 @@ def paired_manhattan_distances(X, Y):
return np.abs(diff).sum(axis=-1)


# TODO*1.6: Remove in 1.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO*1.6: Remove in 1.6
# TODO(1.7): Remove in 1.7

@deprecated(
"The public function `sklearn.pairwise.paired_cosine_distances` has been "
"deprecated in 1.4 and will be removed in 1.6."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"deprecated in 1.4 and will be removed in 1.6."
"deprecated in 1.5 and will be removed in 1.7."

)
@validate_params(
{"X": ["array-like", "sparse matrix"], "Y": ["array-like", "sparse matrix"]},
prefer_skip_nested_validation=True,
Expand All @@ -1211,6 +1275,35 @@ def paired_cosine_distances(X, Y):

Read more in the :ref:`User Guide <metrics>`.

Parameters
----------
X : {array-like, sparse matrix} of shape (n_samples, n_features)
An array where each row is a sample and each column is a feature.

Y : {array-like, sparse matrix} of shape (n_samples, n_features)
An array where each row is a sample and each column is a feature.

Returns
-------
distances : ndarray of shape (n_samples,)
Returns the distances between the row vectors of `X`
and the row vectors of `Y`, where `distances[i]` is the
distance between `X[i]` and `Y[i]`.

Notes
-----
The cosine distance is equivalent to the half the squared
euclidean distance if each sample is normalized to unit norm.
"""
return _paired_cosine_distances(X, Y)


def _paired_cosine_distances(X, Y):
"""
Compute the paired cosine distances between X and Y.

Read more in the :ref:`User Guide <metrics>`.

Parameters
----------
X : {array-like, sparse matrix} of shape (n_samples, n_features)
Expand All @@ -1235,6 +1328,8 @@ def paired_cosine_distances(X, Y):
return 0.5 * row_norms(normalize(X) - normalize(Y), squared=True)


# TODO(1.6): Remove PAIRED_DISTANCES dictionary since pairwise_*_distance public
# functions are deprecated in 1.6
PAIRED_DISTANCES = {
"cosine": paired_cosine_distances,
"euclidean": paired_euclidean_distances,
Expand All @@ -1244,7 +1339,21 @@ def paired_cosine_distances(X, Y):
"cityblock": paired_manhattan_distances,
}

_PAIRED_DISTANCES = {
"cosine": _paired_cosine_distances,
"euclidean": _paired_euclidean_distances,
"l2": _paired_euclidean_distances,
"l1": _paired_manhattan_distances,
"manhattan": _paired_manhattan_distances,
"cityblock": _paired_manhattan_distances,
}


# TODO(1.6): Remove in 1.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO(1.6): Remove in 1.6
# TODO(1.7): Remove in 1.7

@deprecated(
"The public function `sklearn.pairwise.paired_distances` has been "
"deprecated in 1.4 and will be removed in 1.6."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"deprecated in 1.4 and will be removed in 1.6."
"deprecated in 1.5 and will be removed in 1.7."

)
@validate_params(
{
"X": ["array-like"],
Expand Down Expand Up @@ -1301,9 +1410,52 @@ def paired_distances(X, Y, *, metric="euclidean", **kwds):
>>> paired_distances(X, Y)
array([0., 1.])
"""
return _paired_distances(X, Y, metric=metric, **kwds)


def _paired_distances(X, Y, *, metric="euclidean", **kwds):
"""
Compute the paired distances between X and Y.

Compute the distances between (X[0], Y[0]), (X[1], Y[1]), etc...

Read more in the :ref:`User Guide <metrics>`.

Parameters
----------
X : ndarray of shape (n_samples, n_features)
Array 1 for distance computation.

Y : ndarray of shape (n_samples, n_features)
Array 2 for distance computation.

metric : str or callable, default="euclidean"
The metric to use when calculating distance between instances in a
feature array. If metric is a string, it must be one of the options
specified in PAIRED_DISTANCES, including "euclidean",
"manhattan", or "cosine".
Alternatively, if metric is a callable function, it is called on each
pair of instances (rows) and the resulting value recorded. The callable
should take two arrays from `X` as input and return a value indicating
the distance between them.

**kwds : dict
Unused parameters.

Returns
-------
distances : ndarray of shape (n_samples,)
Returns the distances between the row vectors of `X`
and the row vectors of `Y`.

See Also
--------
sklearn.metrics.pairwise_distances : Computes the distance between every pair of
samples.
"""

if metric in PAIRED_DISTANCES:
func = PAIRED_DISTANCES[metric]
if metric in _PAIRED_DISTANCES:
func = _PAIRED_DISTANCES[metric]
return func(X, Y)
elif callable(metric):
# Check the matrix first (it is usually done by the metric)
Expand Down
Loading