-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX make sure OutputCodeClassifier rejects complex labels #20219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX make sure OutputCodeClassifier rejects complex labels #20219
Conversation
BTW, I could not reproduce the original failure on my machine. I am not sure why the nested call to |
@@ -911,6 +912,7 @@ def fit(self, X, y): | |||
""" | |||
y = column_or_1d(y, warn=True) | |||
_assert_all_finite(y) | |||
_ensure_no_complex_data(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is check_X_y
doing it. We might have forgotten to add this check in some other PR of @jeremiedbb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not. It's just that the _ConstantPredictor does no check at all. There's a discussion in #20205
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we don't, but we probably should:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/validation.py#L883-L890
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not. It's just that the _ConstantPredictor does no check at all. There's a discussion in #20205
But why does this test pass most of the time and fails randomly? I could not reproduce the failure on my machine. Also I don't understand why one would get the constant predictor to kick in, in this case. The RNG seems properly seeded, so one should always get the validation of X by LogisticRegression._validate_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no we do not set the random_state in check_complex_data
on the estimator clone. That does not explain why this test does not fail for me though, but that is a bug on its own. I will submit another PR.
Closing in favor of #20221 (hopefully). |
This is a quick fix for the random failure reported in #20218 after the merge of #20192. I am not sure about the root cause of the randomness but we indeed need to factorize
y
validation whenX
is not validated in a meta-estimator as @jeremiedbb suggested elsewhere.In the mean time, let's make sure that the CI can be green in main to avoid disturbing concurrent PRs.