Skip to content

Commit 39cc03f

Browse files
jeremiedbbbetatim
andauthored
MNT Make scorers return python floats (#30575)
Co-authored-by: Tim Head <betatim@gmail.com>
1 parent a996f43 commit 39cc03f

11 files changed

+211
-159
lines changed

doc/modules/clustering.rst

+17-17
Original file line numberDiff line numberDiff line change
@@ -1305,7 +1305,7 @@ ignoring permutations::
13051305
>>> labels_true = [0, 0, 0, 1, 1, 1]
13061306
>>> labels_pred = [0, 0, 1, 1, 2, 2]
13071307
>>> metrics.rand_score(labels_true, labels_pred)
1308-
np.float64(0.66...)
1308+
0.66...
13091309

13101310
The Rand index does not ensure to obtain a value close to 0.0 for a
13111311
random labelling. The adjusted Rand index **corrects for chance** and
@@ -1319,7 +1319,7 @@ labels, rename 2 to 3, and get the same score::
13191319

13201320
>>> labels_pred = [1, 1, 0, 0, 3, 3]
13211321
>>> metrics.rand_score(labels_true, labels_pred)
1322-
np.float64(0.66...)
1322+
0.66...
13231323
>>> metrics.adjusted_rand_score(labels_true, labels_pred)
13241324
0.24...
13251325

@@ -1328,7 +1328,7 @@ Furthermore, both :func:`rand_score` :func:`adjusted_rand_score` are
13281328
thus be used as **consensus measures**::
13291329

13301330
>>> metrics.rand_score(labels_pred, labels_true)
1331-
np.float64(0.66...)
1331+
0.66...
13321332
>>> metrics.adjusted_rand_score(labels_pred, labels_true)
13331333
0.24...
13341334

@@ -1348,7 +1348,7 @@ will not necessarily be close to zero.::
13481348
>>> labels_true = [0, 0, 0, 0, 0, 0, 1, 1]
13491349
>>> labels_pred = [0, 1, 2, 3, 4, 5, 5, 6]
13501350
>>> metrics.rand_score(labels_true, labels_pred)
1351-
np.float64(0.39...)
1351+
0.39...
13521352
>>> metrics.adjusted_rand_score(labels_true, labels_pred)
13531353
-0.07...
13541354

@@ -1644,16 +1644,16 @@ We can turn those concept as scores :func:`homogeneity_score` and
16441644
>>> labels_pred = [0, 0, 1, 1, 2, 2]
16451645

16461646
>>> metrics.homogeneity_score(labels_true, labels_pred)
1647-
np.float64(0.66...)
1647+
0.66...
16481648

16491649
>>> metrics.completeness_score(labels_true, labels_pred)
1650-
np.float64(0.42...)
1650+
0.42...
16511651

16521652
Their harmonic mean called **V-measure** is computed by
16531653
:func:`v_measure_score`::
16541654

16551655
>>> metrics.v_measure_score(labels_true, labels_pred)
1656-
np.float64(0.51...)
1656+
0.51...
16571657

16581658
This function's formula is as follows:
16591659

@@ -1662,12 +1662,12 @@ This function's formula is as follows:
16621662
`beta` defaults to a value of 1.0, but for using a value less than 1 for beta::
16631663

16641664
>>> metrics.v_measure_score(labels_true, labels_pred, beta=0.6)
1665-
np.float64(0.54...)
1665+
0.54...
16661666

16671667
more weight will be attributed to homogeneity, and using a value greater than 1::
16681668

16691669
>>> metrics.v_measure_score(labels_true, labels_pred, beta=1.8)
1670-
np.float64(0.48...)
1670+
0.48...
16711671

16721672
more weight will be attributed to completeness.
16731673

@@ -1678,14 +1678,14 @@ Homogeneity, completeness and V-measure can be computed at once using
16781678
:func:`homogeneity_completeness_v_measure` as follows::
16791679

16801680
>>> metrics.homogeneity_completeness_v_measure(labels_true, labels_pred)
1681-
(np.float64(0.66...), np.float64(0.42...), np.float64(0.51...))
1681+
(0.66..., 0.42..., 0.51...)
16821682

16831683
The following clustering assignment is slightly better, since it is
16841684
homogeneous but not complete::
16851685

16861686
>>> labels_pred = [0, 0, 0, 1, 2, 2]
16871687
>>> metrics.homogeneity_completeness_v_measure(labels_true, labels_pred)
1688-
(np.float64(1.0), np.float64(0.68...), np.float64(0.81...))
1688+
(1.0, 0.68..., 0.81...)
16891689

16901690
.. note::
16911691

@@ -1815,21 +1815,21 @@ between two clusters.
18151815
>>> labels_pred = [0, 0, 1, 1, 2, 2]
18161816

18171817
>>> metrics.fowlkes_mallows_score(labels_true, labels_pred)
1818-
np.float64(0.47140...)
1818+
0.47140...
18191819

18201820
One can permute 0 and 1 in the predicted labels, rename 2 to 3 and get
18211821
the same score::
18221822

18231823
>>> labels_pred = [1, 1, 0, 0, 3, 3]
18241824

18251825
>>> metrics.fowlkes_mallows_score(labels_true, labels_pred)
1826-
np.float64(0.47140...)
1826+
0.47140...
18271827

18281828
Perfect labeling is scored 1.0::
18291829

18301830
>>> labels_pred = labels_true[:]
18311831
>>> metrics.fowlkes_mallows_score(labels_true, labels_pred)
1832-
np.float64(1.0)
1832+
1.0
18331833

18341834
Bad (e.g. independent labelings) have zero scores::
18351835

@@ -1912,7 +1912,7 @@ cluster analysis.
19121912
>>> kmeans_model = KMeans(n_clusters=3, random_state=1).fit(X)
19131913
>>> labels = kmeans_model.labels_
19141914
>>> metrics.silhouette_score(X, labels, metric='euclidean')
1915-
np.float64(0.55...)
1915+
0.55...
19161916

19171917
.. topic:: Advantages:
19181918

@@ -1969,7 +1969,7 @@ cluster analysis:
19691969
>>> kmeans_model = KMeans(n_clusters=3, random_state=1).fit(X)
19701970
>>> labels = kmeans_model.labels_
19711971
>>> metrics.calinski_harabasz_score(X, labels)
1972-
np.float64(561.59...)
1972+
561.59...
19731973

19741974

19751975
.. topic:: Advantages:
@@ -2043,7 +2043,7 @@ cluster analysis as follows:
20432043
>>> kmeans = KMeans(n_clusters=3, random_state=1).fit(X)
20442044
>>> labels = kmeans.labels_
20452045
>>> davies_bouldin_score(X, labels)
2046-
np.float64(0.666...)
2046+
0.666...
20472047

20482048

20492049
.. topic:: Advantages:

doc/modules/model_evaluation.rst

+33-33
Original file line numberDiff line numberDiff line change
@@ -377,7 +377,7 @@ You can create your own custom scorer object using
377377
>>> import numpy as np
378378
>>> def my_custom_loss_func(y_true, y_pred):
379379
... diff = np.abs(y_true - y_pred).max()
380-
... return np.log1p(diff)
380+
... return float(np.log1p(diff))
381381
...
382382
>>> # score will negate the return value of my_custom_loss_func,
383383
>>> # which will be np.log(2), 0.693, given the values for X
@@ -389,9 +389,9 @@ You can create your own custom scorer object using
389389
>>> clf = DummyClassifier(strategy='most_frequent', random_state=0)
390390
>>> clf = clf.fit(X, y)
391391
>>> my_custom_loss_func(y, clf.predict(X))
392-
np.float64(0.69...)
392+
0.69...
393393
>>> score(clf, X, y)
394-
np.float64(-0.69...)
394+
-0.69...
395395

396396
.. dropdown:: Custom scorer objects from scratch
397397

@@ -673,10 +673,10 @@ where :math:`k` is the number of guesses allowed and :math:`1(x)` is the
673673
... [0.2, 0.4, 0.3],
674674
... [0.7, 0.2, 0.1]])
675675
>>> top_k_accuracy_score(y_true, y_score, k=2)
676-
np.float64(0.75)
676+
0.75
677677
>>> # Not normalizing gives the number of "correctly" classified samples
678678
>>> top_k_accuracy_score(y_true, y_score, k=2, normalize=False)
679-
np.int64(3)
679+
3.0
680680

681681
.. _balanced_accuracy_score:
682682

@@ -786,7 +786,7 @@ and not for more than two annotators.
786786
>>> labeling1 = [2, 0, 2, 2, 0, 1]
787787
>>> labeling2 = [0, 0, 2, 2, 0, 2]
788788
>>> cohen_kappa_score(labeling1, labeling2)
789-
np.float64(0.4285714285714286)
789+
0.4285714285714286
790790

791791
.. _confusion_matrix:
792792

@@ -837,9 +837,9 @@ false negatives and true positives as follows::
837837

838838
>>> y_true = [0, 0, 0, 1, 1, 1, 1, 1]
839839
>>> y_pred = [0, 1, 0, 1, 0, 1, 0, 1]
840-
>>> tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
840+
>>> tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel().tolist()
841841
>>> tn, fp, fn, tp
842-
(np.int64(2), np.int64(1), np.int64(2), np.int64(3))
842+
(2, 1, 2, 3)
843843

844844
.. rubric:: Examples
845845

@@ -1115,7 +1115,7 @@ Here are some small examples in binary classification::
11151115
>>> threshold
11161116
array([0.1 , 0.35, 0.4 , 0.8 ])
11171117
>>> average_precision_score(y_true, y_scores)
1118-
np.float64(0.83...)
1118+
0.83...
11191119

11201120

11211121

@@ -1234,19 +1234,19 @@ In the binary case::
12341234
>>> y_pred = np.array([[1, 1, 1],
12351235
... [1, 0, 0]])
12361236
>>> jaccard_score(y_true[0], y_pred[0])
1237-
np.float64(0.6666...)
1237+
0.6666...
12381238

12391239
In the 2D comparison case (e.g. image similarity):
12401240

12411241
>>> jaccard_score(y_true, y_pred, average="micro")
1242-
np.float64(0.6)
1242+
0.6
12431243

12441244
In the multilabel case with binary label indicators::
12451245

12461246
>>> jaccard_score(y_true, y_pred, average='samples')
1247-
np.float64(0.5833...)
1247+
0.5833...
12481248
>>> jaccard_score(y_true, y_pred, average='macro')
1249-
np.float64(0.6666...)
1249+
0.6666...
12501250
>>> jaccard_score(y_true, y_pred, average=None)
12511251
array([0.5, 0.5, 1. ])
12521252

@@ -1258,9 +1258,9 @@ multilabel problem::
12581258
>>> jaccard_score(y_true, y_pred, average=None)
12591259
array([1. , 0. , 0.33...])
12601260
>>> jaccard_score(y_true, y_pred, average='macro')
1261-
np.float64(0.44...)
1261+
0.44...
12621262
>>> jaccard_score(y_true, y_pred, average='micro')
1263-
np.float64(0.33...)
1263+
0.33...
12641264

12651265
.. _hinge_loss:
12661266

@@ -1315,7 +1315,7 @@ with a svm classifier in a binary class problem::
13151315
>>> pred_decision
13161316
array([-2.18..., 2.36..., 0.09...])
13171317
>>> hinge_loss([-1, 1, 1], pred_decision)
1318-
np.float64(0.3...)
1318+
0.3...
13191319

13201320
Here is an example demonstrating the use of the :func:`hinge_loss` function
13211321
with a svm classifier in a multiclass problem::
@@ -1329,7 +1329,7 @@ with a svm classifier in a multiclass problem::
13291329
>>> pred_decision = est.decision_function([[-1], [2], [3]])
13301330
>>> y_true = [0, 2, 3]
13311331
>>> hinge_loss(y_true, pred_decision, labels=labels)
1332-
np.float64(0.56...)
1332+
0.56...
13331333

13341334
.. _log_loss:
13351335

@@ -1445,7 +1445,7 @@ function:
14451445
>>> y_true = [+1, +1, +1, -1]
14461446
>>> y_pred = [+1, -1, +1, +1]
14471447
>>> matthews_corrcoef(y_true, y_pred)
1448-
np.float64(-0.33...)
1448+
-0.33...
14491449

14501450
.. rubric:: References
14511451

@@ -1640,12 +1640,12 @@ We can use the probability estimates corresponding to `clf.classes_[1]`.
16401640

16411641
>>> y_score = clf.predict_proba(X)[:, 1]
16421642
>>> roc_auc_score(y, y_score)
1643-
np.float64(0.99...)
1643+
0.99...
16441644

16451645
Otherwise, we can use the non-thresholded decision values
16461646

16471647
>>> roc_auc_score(y, clf.decision_function(X))
1648-
np.float64(0.99...)
1648+
0.99...
16491649

16501650
.. _roc_auc_multiclass:
16511651

@@ -1951,13 +1951,13 @@ Here is a small example of usage of this function::
19511951
>>> y_prob = np.array([0.1, 0.9, 0.8, 0.4])
19521952
>>> y_pred = np.array([0, 1, 1, 0])
19531953
>>> brier_score_loss(y_true, y_prob)
1954-
np.float64(0.055)
1954+
0.055
19551955
>>> brier_score_loss(y_true, 1 - y_prob, pos_label=0)
1956-
np.float64(0.055)
1956+
0.055
19571957
>>> brier_score_loss(y_true_categorical, y_prob, pos_label="ham")
1958-
np.float64(0.055)
1958+
0.055
19591959
>>> brier_score_loss(y_true, y_prob > 0.5)
1960-
np.float64(0.0)
1960+
0.0
19611961

19621962
The Brier score can be used to assess how well a classifier is calibrated.
19631963
However, a lower Brier score loss does not always mean a better calibration.
@@ -2236,7 +2236,7 @@ Here is a small example of usage of this function::
22362236
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]])
22372237
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
22382238
>>> coverage_error(y_true, y_score)
2239-
np.float64(2.5)
2239+
2.5
22402240

22412241
.. _label_ranking_average_precision:
22422242

@@ -2283,7 +2283,7 @@ Here is a small example of usage of this function::
22832283
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]])
22842284
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
22852285
>>> label_ranking_average_precision_score(y_true, y_score)
2286-
np.float64(0.416...)
2286+
0.416...
22872287

22882288
.. _label_ranking_loss:
22892289

@@ -2318,11 +2318,11 @@ Here is a small example of usage of this function::
23182318
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]])
23192319
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
23202320
>>> label_ranking_loss(y_true, y_score)
2321-
np.float64(0.75...)
2321+
0.75...
23222322
>>> # With the following prediction, we have perfect and minimal loss
23232323
>>> y_score = np.array([[1.0, 0.1, 0.2], [0.1, 0.2, 0.9]])
23242324
>>> label_ranking_loss(y_true, y_score)
2325-
np.float64(0.0)
2325+
0.0
23262326

23272327

23282328
.. dropdown:: References
@@ -2700,7 +2700,7 @@ function::
27002700
>>> y_true = [3, -0.5, 2, 7]
27012701
>>> y_pred = [2.5, 0.0, 2, 8]
27022702
>>> median_absolute_error(y_true, y_pred)
2703-
np.float64(0.5)
2703+
0.5
27042704

27052705

27062706

@@ -2732,7 +2732,7 @@ Here is a small example of usage of the :func:`max_error` function::
27322732
>>> y_true = [3, 2, 7, 1]
27332733
>>> y_pred = [9, 2, 7, 1]
27342734
>>> max_error(y_true, y_pred)
2735-
np.int64(6)
2735+
6.0
27362736

27372737
The :func:`max_error` does not support multioutput.
27382738

@@ -3011,15 +3011,15 @@ of 0.0.
30113011
>>> y_true = [3, -0.5, 2, 7]
30123012
>>> y_pred = [2.5, 0.0, 2, 8]
30133013
>>> d2_absolute_error_score(y_true, y_pred)
3014-
np.float64(0.764...)
3014+
0.764...
30153015
>>> y_true = [1, 2, 3]
30163016
>>> y_pred = [1, 2, 3]
30173017
>>> d2_absolute_error_score(y_true, y_pred)
3018-
np.float64(1.0)
3018+
1.0
30193019
>>> y_true = [1, 2, 3]
30203020
>>> y_pred = [2, 2, 2]
30213021
>>> d2_absolute_error_score(y_true, y_pred)
3022-
np.float64(0.0)
3022+
0.0
30233023

30243024

30253025
.. _visualization_regression_evaluation:

sklearn/metrics/_base.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ def _average_binary_score(binary_metric, y_true, y_score, average, sample_weight
118118
# score from being affected by 0-weighted NaN elements.
119119
average_weight = np.asarray(average_weight)
120120
score[average_weight == 0] = 0
121-
return np.average(score, weights=average_weight)
121+
return float(np.average(score, weights=average_weight))
122122
else:
123123
return score
124124

0 commit comments

Comments
 (0)