-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) #9593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) #9593
Conversation
Thanks! Running ROC over the output of |
Yes, perhaps 0 isn't the best pick, particularly for |
I think better not to error in the context of using this in ClassifierChain or a stacking meta-estimator. |
Hmm, you're right. One note though, I can't speak for
NaN makes it clearer that these columns weren't considered at all for this particular split, but wouldn't that break things? For instance, I frequently use Also, wouldn't that break ClassifierChains and stacking meta-estimators anyway? |
yes, nan would probably break it all. a warning is probably a good idea.
But first we need to fix the regression.
…On 21 Aug 2017 11:55 pm, "Reiichiro Nakano" ***@***.***> wrote:
I think better not to error in the context of using this in
ClassifierChain or a stacking meta-estimator.
Hmm, you're right. One note though, I can't speak for ClassifierChain but
for stacking, these zeroed out columns would mess with how the secondary
learner learns and I wouldn't trust the results. Perhaps display a warning?
Yes, perhaps 0 isn't the best pick, particularly for predict_proba. NaN?
NaN makes it clearer that these columns weren't considered at all for this
particular split, but wouldn't that break things? For instance, I
frequently use np.argmax(cross_val_predict(..., method='predict_proba))
to recover the true labels for calculating accuracy, etc. Of course, I
always make sure my folds are stratified so I'm probably not going to face
it, but still.. Also, wouldn't that break ClassifierChains and stacking
meta-estimators?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9593 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz63bP5qekOvF21sYAcB9DUrp3Nc5mks5saYw_gaJpZM4O9TZp>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a non-regression test.
I've added the necessary unit tests. Although at this point, I'd like to point out there's another regression for the case of SVC with Code to test: from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_predict
X, y = load_digits(return_X_y=True)
cross_val_predict(SVC(kernel='linear', decision_function_shape='ovo'), X, y, method='decision_function').shape Expected Results Actual Results
This error is due to the shape of an SVC decision function with Should I include the fix in this PR? |
Thanks. Maybe fix that in a separate PR unless they're going to step on
each others' toes.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @jnothman, I changed this back to WIP since there was a conflict with a recent merge. I also figured out a better fix that also fixes the SVC "ovo" regression. |
Okay. Let me know when it's MRG, i.e. ready for review. |
…unction' into fix-cross-val-predict-decision-function
…a', 'predict_log_proba'
@jnothman Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to add a test for the OvO case?
else: | ||
predictions_[:, estimator.classes_] = predictions | ||
predictions = predictions_ | ||
if not n_classes == len(estimator.classes_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not ... == -> ... !=
Yup, sorry. Added unit test and addressed comments. |
I think I've mostly sorted this out. For the non-special case of Apologies for the confusing code previously, I should've done it this way in the first place, but I didn't expect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need a warning in any case when n_classes_training != n_classes
.
I would still like feedback from some core devs about #9593 (comment), especially about the default value for missing classes for predict_log_proba and decision_function (personally I would favour -inf
but maybe there are consumers of cross_val_predict
which would raise an error with infinite values)
'use a cross-validation technique resulting ' | ||
'in properly stratified folds\.', | ||
cross_val_predict, est, X, y, | ||
cv=KFold(), method='decision_function') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use explicit number of splits here n_splits=3
.
predictions_[:, estimator.classes_] = predictions | ||
predictions = predictions_ | ||
if n_classes != len(estimator.classes_): | ||
if predictions.ndim == 2 and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I would use parentheses rather than backslash.
if n_classes != len(estimator.classes_): | ||
if predictions.ndim == 2 and \ | ||
predictions.shape[1] != len(estimator.classes_): | ||
# This handles the case when the shape of predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the comment matches the error message. IIUC the error can happen for one of these reasons:
- you use SVC with decision_function_shape='ovo' whose predict output does not follow the same convention as all the other classifiers. This corresponds to the comment. In this case the recommendation in the error message is not helpful at all.
- you don't have all the classes in your training fold. This corresponds to the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, for the first case (ovo), the recommendation in the error message is helpful. We can handle the ovo case but only if the folds are stratified.
predictions_[:, estimator.classes_] = predictions | ||
|
||
else: # Special handling logic for decision_function | ||
err_mess = 'Output shape {} of {} does not match ' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have parentheses than backslashes
error_message = ('asdfsa '
'asdfds '
'asdfsa ')
|
||
predictions_ = np.full((_num_samples(X_test), n_classes), | ||
np.finfo(predictions.dtype).min) | ||
predictions_[:, estimator.classes_] = predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes for each method is the default value, you could simplify the code and use better variable names, e.g. something like this in the non-error case:
float_min = np.finfo(predictions.dtype).min
default_values = {'decision_function': float_min, 'predict_log_proba': float_min,
'predict_proba': 0}
predictions_for_all_classes = np.full((_num_samples(predictions), n_classes),
default_values[method])
predictions_for_all_classes[:, estimator.classes_] = predictions
' in different folds. To fix this, use ' \ | ||
'a cross-validation technique resulting' \ | ||
' in properly stratified folds' | ||
if predictions.ndim == 2 and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parentheses rather than backslashes
@@ -1267,6 +1347,18 @@ def test_cross_val_predict_class_subset(): | |||
est, method) | |||
assert_array_almost_equal(expected_predictions, predictions) | |||
|
|||
# Special test for decision_function. This makes sure not to trigger | |||
# any of the numerous edge cases in decision_function | |||
X = np.arange(200).reshape(100, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should keep decision_function in the loop above. Just find a dataset that works for decision_function. e.g. this one or one with many classes to avoid the edge cases that error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need a warning in any case when n_classes_training != n_classes.
You have not tackled this comment AFAICT. It would be good to add a test for this with assert_warns as well.
len(estimator.classes_))) | ||
if len(estimator.classes_) <= 2: | ||
# In this special case, `predictions` contains a 1D array. | ||
raise ValueError(err_mess.format(predictions.shape, method, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this error message makes sense here since predictions has the right shape compared to len(estimator.classes_)
. I am not sure what the error message should say, to be honest, maybe something like: only two classes in the training fold, this is not supported for method='decision_function' + the error message above about using properly stratified folds.
@@ -1259,7 +1339,8 @@ def test_cross_val_predict_class_subset(): | |||
assert_array_almost_equal(expected_predictions, predictions) | |||
|
|||
# Testing unordered labels | |||
y = [1, 1, -4, 6] | |||
y = np.array([x//20 for x in range(-100, 100, 2)]) | |||
y = shuffle(y, random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just y = np.repeat([1, 1, -4, 6], 25)
. Always a good idea to avoid unnecessary complications ...
@@ -776,6 +777,87 @@ def split(self, X, y=None, groups=None): | |||
|
|||
assert_raises(ValueError, cross_val_predict, est, X, y, cv=BadCV()) | |||
|
|||
X, y = load_iris(return_X_y=True) | |||
assert_warns(RuntimeWarning, cross_val_predict, LogisticRegression(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use assert_warns_message to test the warning message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from this do you have any concerns with the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I am fine with the general strategy as I mentioned above. There are some minor things I will fix shortly.
If you haven't already, maybe you can have a quick look at the warnings/error message and make sure they are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing: are you fine with float_min as the default value for missing classes for predict_log_proba and decision_function. Another option is -inf
but maybe some consumers downstream will raise an error with infinite values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predict_log_proba should probably use -inf... decision_function is a messy case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think float-min is perfect, but I'm documenting it at least.
Other minor fixes
Codecov Report
@@ Coverage Diff @@
## master #9593 +/- ##
==========================================
- Coverage 96.16% 95.97% -0.19%
==========================================
Files 336 336
Lines 62442 62566 +124
==========================================
+ Hits 60046 60049 +3
- Misses 2396 2517 +121
Continue to review full report at Codecov.
|
# it with. This case is found when sklearn.svm.SVC is | ||
# set to `decision_function_shape='ovo'`. | ||
raise ValueError('Output shape {} of {} does not match ' | ||
'number of classes ({}) in fold. Cannot ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error message is wrong:
- I would remove "Cannot reconcile different number of classes in different folds, because the problem is specific to one fold
- the recommendation of using properly stratified folds does not apply. If we are sure that this problem only exists for SVC with
decision_shape_function='ovo'
, we should say usedecision_shape_function='ovr'
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. It could happen in other situations? Could also be a custom estimator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just say that wacky decision functions are not supported :)
I would like to see this merged and 0.19.1 released by the end of the week. Is that reasonable, @amueller? |
default score needs to be assigned to all instances for that class if | ||
``method`` produces columns per class, as in {'decision_function', | ||
'predict_proba', 'predict_log_proba'}. For ``predict_proba`` this value is | ||
0. In order to ensure finite output, we approximate negative infinity by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this behaviour and note is sufficient for a first release.
…ecision_function (Fixes scikit-learn#9589) (scikit-learn#9593) * fix cross_val_predict for binary classification in decision_function * Add unit tests * Add unit tests * Add unit tests * better fix * fix conflict * fix broken * only calculate n_classes if one of 'decision_function', 'predict_proba', 'predict_log_proba' * add test for SVC ovo in cross_val_predict * flake8 fix * fix case of ovo and imbalanced folds for binary classification * change assert_raises to assert_raise_message for ovo case * fix flake8 linetoo long * add comments and clearer tests * improve comments and error message for OvO * fix .format error with L * use assert_raises_regex for better error message * raise error in decision_function special cases. change predict_log_proba missing classes to minimum numpy value * fix broken tests due to special cases of decision_function * add modified test for decision_function behavior that does not trigger edge cases * fix typos * fix typos * escape regex . * escape regex . * address comments. one unaddressed comment * simplify code * flake * wrong classes range * address comments. adjust error message * add warning * change warning to runtimewarning * add test for the warning * Use assert_warns_message rather than assert_warns Other minor fixes * Note on class-absent replacement values * Improve error message
…ecision_function (Fixes scikit-learn#9589) (scikit-learn#9593) * fix cross_val_predict for binary classification in decision_function * Add unit tests * Add unit tests * Add unit tests * better fix * fix conflict * fix broken * only calculate n_classes if one of 'decision_function', 'predict_proba', 'predict_log_proba' * add test for SVC ovo in cross_val_predict * flake8 fix * fix case of ovo and imbalanced folds for binary classification * change assert_raises to assert_raise_message for ovo case * fix flake8 linetoo long * add comments and clearer tests * improve comments and error message for OvO * fix .format error with L * use assert_raises_regex for better error message * raise error in decision_function special cases. change predict_log_proba missing classes to minimum numpy value * fix broken tests due to special cases of decision_function * add modified test for decision_function behavior that does not trigger edge cases * fix typos * fix typos * escape regex . * escape regex . * address comments. one unaddressed comment * simplify code * flake * wrong classes range * address comments. adjust error message * add warning * change warning to runtimewarning * add test for the warning * Use assert_warns_message rather than assert_warns Other minor fixes * Note on class-absent replacement values * Improve error message
…ecision_function (Fixes scikit-learn#9589) (scikit-learn#9593) * fix cross_val_predict for binary classification in decision_function * Add unit tests * Add unit tests * Add unit tests * better fix * fix conflict * fix broken * only calculate n_classes if one of 'decision_function', 'predict_proba', 'predict_log_proba' * add test for SVC ovo in cross_val_predict * flake8 fix * fix case of ovo and imbalanced folds for binary classification * change assert_raises to assert_raise_message for ovo case * fix flake8 linetoo long * add comments and clearer tests * improve comments and error message for OvO * fix .format error with L * use assert_raises_regex for better error message * raise error in decision_function special cases. change predict_log_proba missing classes to minimum numpy value * fix broken tests due to special cases of decision_function * add modified test for decision_function behavior that does not trigger edge cases * fix typos * fix typos * escape regex . * escape regex . * address comments. one unaddressed comment * simplify code * flake * wrong classes range * address comments. adjust error message * add warning * change warning to runtimewarning * add test for the warning * Use assert_warns_message rather than assert_warns Other minor fixes * Note on class-absent replacement values * Improve error message
Reference Issue
Fixes #9589
What does this implement/fix? Explain your changes.
I figured that the purpose of this code (original)
is to handle cases where the cross-validation done in
cross_val_predict
doesn't properly stratify the classes i.e. one fold in the CV may not have all classes present.However, it misses an edge case for
decision_function
whenlen(estimator.classes_) == 2
(train split has only 2 classes) andn_classes == 2
(total data has only 2 classes). It wrongly assumes that the output will be of shape(n_samples, n_classes)
, when in fact, this is only true fordecision_function
ifn_classes > 2
. For the correct behavior, we must first check if the total number of classes is greater than 2, and if it is not, we must stick to an output shape of(n_samples,)
.Any other comments?
While this PR fixes the case raised in #9589, it still doesn't take into account the fact that for
sklearn.svm.SVC
,decision_function
does not always follow the shape(n_samples, n_classes)
or(n_samples,)
. Ifdecision_function_shape
is not set toovr
, the shape is(n_samples, n_classes * (n_classes-1) / 2)
, and if different train splits have different numbers of classes (not stratified), this inconsistency can't be fixed by the approach currently taken bycross_val_predict
.I would also like to point out my personal opinion that
cross_val_predict
should probably just throw an error if the classes are not stratified properly (by this I mean some classes are entirely absent in some splits) instead of handling it automatically as it tries to do now. Metrics like log loss and ROC might be affected if for some splits, entire columns are zeroed out. Idk, it just feels inconsistent. Of course, you guys have the final say on this.What I mean by inconsistent:
returns (array cut down in size)
Notice how there are "zeroed out" columns.
And even worse:
returns (array cut down in size)
which is hardly of use to anyone.