-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Make scorers return python floats #30575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8b45e53
3e0a15b
7181297
e86436e
4946b11
c595588
9b0c165
d7f57b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -377,7 +377,7 @@ You can create your own custom scorer object using | |
>>> import numpy as np | ||
>>> def my_custom_loss_func(y_true, y_pred): | ||
... diff = np.abs(y_true - y_pred).max() | ||
... return np.log1p(diff) | ||
... return float(np.log1p(diff)) | ||
... | ||
>>> # score will negate the return value of my_custom_loss_func, | ||
>>> # which will be np.log(2), 0.693, given the values for X | ||
|
@@ -389,9 +389,9 @@ You can create your own custom scorer object using | |
>>> clf = DummyClassifier(strategy='most_frequent', random_state=0) | ||
>>> clf = clf.fit(X, y) | ||
>>> my_custom_loss_func(y, clf.predict(X)) | ||
np.float64(0.69...) | ||
0.69... | ||
>>> score(clf, X, y) | ||
np.float64(-0.69...) | ||
-0.69... | ||
|
||
.. dropdown:: Custom scorer objects from scratch | ||
|
||
|
@@ -673,10 +673,10 @@ where :math:`k` is the number of guesses allowed and :math:`1(x)` is the | |
... [0.2, 0.4, 0.3], | ||
... [0.7, 0.2, 0.1]]) | ||
>>> top_k_accuracy_score(y_true, y_score, k=2) | ||
np.float64(0.75) | ||
0.75 | ||
>>> # Not normalizing gives the number of "correctly" classified samples | ||
>>> top_k_accuracy_score(y_true, y_score, k=2, normalize=False) | ||
np.int64(3) | ||
3.0 | ||
|
||
.. _balanced_accuracy_score: | ||
|
||
|
@@ -786,7 +786,7 @@ and not for more than two annotators. | |
>>> labeling1 = [2, 0, 2, 2, 0, 1] | ||
>>> labeling2 = [0, 0, 2, 2, 0, 2] | ||
>>> cohen_kappa_score(labeling1, labeling2) | ||
np.float64(0.4285714285714286) | ||
0.4285714285714286 | ||
|
||
.. _confusion_matrix: | ||
|
||
|
@@ -837,9 +837,9 @@ false negatives and true positives as follows:: | |
|
||
>>> y_true = [0, 0, 0, 1, 1, 1, 1, 1] | ||
>>> y_pred = [0, 1, 0, 1, 0, 1, 0, 1] | ||
>>> tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel() | ||
>>> tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel().tolist() | ||
>>> tn, fp, fn, tp | ||
(np.int64(2), np.int64(1), np.int64(2), np.int64(3)) | ||
(2, 1, 2, 3) | ||
|
||
.. rubric:: Examples | ||
|
||
|
@@ -1115,7 +1115,7 @@ Here are some small examples in binary classification:: | |
>>> threshold | ||
array([0.1 , 0.35, 0.4 , 0.8 ]) | ||
>>> average_precision_score(y_true, y_scores) | ||
np.float64(0.83...) | ||
0.83... | ||
|
||
|
||
|
||
|
@@ -1234,19 +1234,19 @@ In the binary case:: | |
>>> y_pred = np.array([[1, 1, 1], | ||
... [1, 0, 0]]) | ||
>>> jaccard_score(y_true[0], y_pred[0]) | ||
np.float64(0.6666...) | ||
0.6666... | ||
|
||
In the 2D comparison case (e.g. image similarity): | ||
|
||
>>> jaccard_score(y_true, y_pred, average="micro") | ||
np.float64(0.6) | ||
0.6 | ||
|
||
In the multilabel case with binary label indicators:: | ||
|
||
>>> jaccard_score(y_true, y_pred, average='samples') | ||
np.float64(0.5833...) | ||
0.5833... | ||
>>> jaccard_score(y_true, y_pred, average='macro') | ||
np.float64(0.6666...) | ||
0.6666... | ||
>>> jaccard_score(y_true, y_pred, average=None) | ||
array([0.5, 0.5, 1. ]) | ||
|
||
|
@@ -1258,9 +1258,9 @@ multilabel problem:: | |
>>> jaccard_score(y_true, y_pred, average=None) | ||
array([1. , 0. , 0.33...]) | ||
>>> jaccard_score(y_true, y_pred, average='macro') | ||
np.float64(0.44...) | ||
0.44... | ||
>>> jaccard_score(y_true, y_pred, average='micro') | ||
np.float64(0.33...) | ||
0.33... | ||
|
||
.. _hinge_loss: | ||
|
||
|
@@ -1315,7 +1315,7 @@ with a svm classifier in a binary class problem:: | |
>>> pred_decision | ||
array([-2.18..., 2.36..., 0.09...]) | ||
>>> hinge_loss([-1, 1, 1], pred_decision) | ||
np.float64(0.3...) | ||
0.3... | ||
|
||
Here is an example demonstrating the use of the :func:`hinge_loss` function | ||
with a svm classifier in a multiclass problem:: | ||
|
@@ -1329,7 +1329,7 @@ with a svm classifier in a multiclass problem:: | |
>>> pred_decision = est.decision_function([[-1], [2], [3]]) | ||
>>> y_true = [0, 2, 3] | ||
>>> hinge_loss(y_true, pred_decision, labels=labels) | ||
np.float64(0.56...) | ||
0.56... | ||
|
||
.. _log_loss: | ||
|
||
|
@@ -1445,7 +1445,7 @@ function: | |
>>> y_true = [+1, +1, +1, -1] | ||
>>> y_pred = [+1, -1, +1, +1] | ||
>>> matthews_corrcoef(y_true, y_pred) | ||
np.float64(-0.33...) | ||
-0.33... | ||
|
||
.. rubric:: References | ||
|
||
|
@@ -1640,12 +1640,12 @@ We can use the probability estimates corresponding to `clf.classes_[1]`. | |
|
||
>>> y_score = clf.predict_proba(X)[:, 1] | ||
>>> roc_auc_score(y, y_score) | ||
np.float64(0.99...) | ||
0.99... | ||
|
||
Otherwise, we can use the non-thresholded decision values | ||
|
||
>>> roc_auc_score(y, clf.decision_function(X)) | ||
np.float64(0.99...) | ||
0.99... | ||
|
||
.. _roc_auc_multiclass: | ||
|
||
|
@@ -1951,13 +1951,13 @@ Here is a small example of usage of this function:: | |
>>> y_prob = np.array([0.1, 0.9, 0.8, 0.4]) | ||
>>> y_pred = np.array([0, 1, 1, 0]) | ||
>>> brier_score_loss(y_true, y_prob) | ||
np.float64(0.055) | ||
0.055 | ||
>>> brier_score_loss(y_true, 1 - y_prob, pos_label=0) | ||
np.float64(0.055) | ||
0.055 | ||
>>> brier_score_loss(y_true_categorical, y_prob, pos_label="ham") | ||
np.float64(0.055) | ||
0.055 | ||
>>> brier_score_loss(y_true, y_prob > 0.5) | ||
np.float64(0.0) | ||
0.0 | ||
|
||
The Brier score can be used to assess how well a classifier is calibrated. | ||
However, a lower Brier score loss does not always mean a better calibration. | ||
|
@@ -2236,7 +2236,7 @@ Here is a small example of usage of this function:: | |
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]]) | ||
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]]) | ||
>>> coverage_error(y_true, y_score) | ||
np.float64(2.5) | ||
2.5 | ||
|
||
.. _label_ranking_average_precision: | ||
|
||
|
@@ -2283,7 +2283,7 @@ Here is a small example of usage of this function:: | |
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]]) | ||
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]]) | ||
>>> label_ranking_average_precision_score(y_true, y_score) | ||
np.float64(0.416...) | ||
0.416... | ||
|
||
.. _label_ranking_loss: | ||
|
||
|
@@ -2318,11 +2318,11 @@ Here is a small example of usage of this function:: | |
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]]) | ||
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]]) | ||
>>> label_ranking_loss(y_true, y_score) | ||
np.float64(0.75...) | ||
0.75... | ||
>>> # With the following prediction, we have perfect and minimal loss | ||
>>> y_score = np.array([[1.0, 0.1, 0.2], [0.1, 0.2, 0.9]]) | ||
>>> label_ranking_loss(y_true, y_score) | ||
np.float64(0.0) | ||
0.0 | ||
|
||
|
||
.. dropdown:: References | ||
|
@@ -2700,7 +2700,7 @@ function:: | |
>>> y_true = [3, -0.5, 2, 7] | ||
>>> y_pred = [2.5, 0.0, 2, 8] | ||
>>> median_absolute_error(y_true, y_pred) | ||
np.float64(0.5) | ||
0.5 | ||
|
||
|
||
|
||
|
@@ -2732,7 +2732,7 @@ Here is a small example of usage of the :func:`max_error` function:: | |
>>> y_true = [3, 2, 7, 1] | ||
>>> y_pred = [9, 2, 7, 1] | ||
>>> max_error(y_true, y_pred) | ||
np.int64(6) | ||
6.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a change from a numpy scalar with an int dtype to a Python float. This may be a bit more suprising than from a numpy scalar with a float dtype to a Python float. I spent 5 minutes trying to find a way it could have unintended side-effects but I could not find anything. Maybe somebody else wants to think about it during 5 minutes as well? For example on from sklearn.metrics import max_error
1 / max_error([1, 2], [3, 5]) returns I seem to remember there were some differences in numpy scalar handling in numpy 2.0 maybe worth a look about what happens with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a reason to not change this (or change to using I think for very large integers you have to be careful when converting to floats as some of them can't be represented as float. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one is arguably a fix for an inconsistency with the rest of the code base. We don't enforce this behavior for any other scorer when the usual return type is float but could be an int in some specific setting. For instance, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, the docstring of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well then, no need for an exception :) |
||
|
||
The :func:`max_error` does not support multioutput. | ||
|
||
|
@@ -3011,15 +3011,15 @@ of 0.0. | |
>>> y_true = [3, -0.5, 2, 7] | ||
>>> y_pred = [2.5, 0.0, 2, 8] | ||
>>> d2_absolute_error_score(y_true, y_pred) | ||
np.float64(0.764...) | ||
0.764... | ||
>>> y_true = [1, 2, 3] | ||
>>> y_pred = [1, 2, 3] | ||
>>> d2_absolute_error_score(y_true, y_pred) | ||
np.float64(1.0) | ||
1.0 | ||
>>> y_true = [1, 2, 3] | ||
>>> y_pred = [2, 2, 2] | ||
>>> d2_absolute_error_score(y_true, y_pred) | ||
np.float64(0.0) | ||
0.0 | ||
|
||
|
||
.. _visualization_regression_evaluation: | ||
|
Uh oh!
There was an error while loading. Please reload this page.