Description
Describe the bug
Three issues, quite related.
-
Using a metric that is not Manhattan or Euclidean triggers the following warning:
UserWarning: Averaging for metrics other than euclidean and manhattan not supported. The average is set to be the mean.
, but I believe an error message should be thrown instead. Indeed, computing the centroids as the average of the class samples is wrong for (I think) all distances except Euclidean, so it is misleading to actually do it even with a warning. Throwing an error here would fix the 2 following issues. -
Calling
.predict
on aNearestCentroid
with metricwminkowski
,seuclidean
ormahalanobis
throws an error, because those metrics need params that we can't pass toNearestCentroid
. DOC Improve doc of Nearest Neighbors metric param #23806 would fix the documentation, but maybe it should be nice to handle this properly in the code. -
Using any string as a
metric
(even a random string likefoobar
) does not throws an error when fitting, but does when calling.predict
. Doing this with another classifier likeKNeighborsClassifier
throws an error at the fitting step.
If relevant I'd like to work on fixing those issues (new contributor here :)) I noticed a stalled PR (#19262) related to implementing partial_fit
to NearestCentroid
. Maybe I should start working on it first, as it seems to improve the code quality of _nearest_centroid.py
, so it would be an interesting base to start with.
Steps/Code to Reproduce
#Reproduces the second issue mentionned
from sklearn.neighbors import NearestCentroid
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
nearest_centroid = NearestCentroid(metric="seuclidean")
nearest_centroid.fit(X,y)
nearest_centroid.predict(X)
Expected Results
Fitting a NearestCentroid
with a metric that is not Manhattan or Euclidean should trigger a proper error message.
Actual Results
Error related to the second issue:
ValueError: The 'V' parameter is required for the seuclidean metric when Y is passed.
Versions
System:
python: 3.8.6 (default, Mar 29 2021, 18:42:37) [Clang 12.0.0 (clang-1200.0.32.29)]
executable: /Users/valentinlaurent/.pyenv/versions/3.8.6/envs/lewagon/bin/python3.8
machine: macOS-10.15.7-x86_64-i386-64bit
Python dependencies:
pip: 22.1.1
setuptools: 49.2.1
sklearn: 1.0.2
numpy: 1.19.5
scipy: 1.6.2
Cython: 0.29.26
pandas: 1.1.3
matplotlib: 3.4.1
joblib: 1.1.0
threadpoolctl: 2.1.0
Built with OpenMP: True