Skip to content

FIX make sure OutputCodeClassifier rejects complex labels #20219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Jun 7, 2021

This is a quick fix for the random failure reported in #20218 after the merge of #20192. I am not sure about the root cause of the randomness but we indeed need to factorize y validation when X is not validated in a meta-estimator as @jeremiedbb suggested elsewhere.

In the mean time, let's make sure that the CI can be green in main to avoid disturbing concurrent PRs.

@ogrisel ogrisel added the Bug label Jun 7, 2021
@ogrisel ogrisel changed the title FIX make sure OutputCodeClassifier reject complex labels FIX make sure OutputCodeClassifier rejects complex labels Jun 7, 2021
@ogrisel
Copy link
Member Author

ogrisel commented Jun 7, 2021

BTW, I could not reproduce the original failure on my machine. I am not sure why the nested call to LogisticRegression._validate_data would not have raised the required ValueError in the first place for this estimator check.

@ogrisel ogrisel requested review from jeremiedbb and glemaitre June 7, 2021 14:44
@@ -911,6 +912,7 @@ def fit(self, X, y):
"""
y = column_or_1d(y, warn=True)
_assert_all_finite(y)
_ensure_no_complex_data(y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is check_X_y doing it. We might have forgotten to add this check in some other PR of @jeremiedbb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not. It's just that the _ConstantPredictor does no check at all. There's a discussion in #20205

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not. It's just that the _ConstantPredictor does no check at all. There's a discussion in #20205

But why does this test pass most of the time and fails randomly? I could not reproduce the failure on my machine. Also I don't understand why one would get the constant predictor to kick in, in this case. The RNG seems properly seeded, so one should always get the validation of X by LogisticRegression._validate_data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no we do not set the random_state in check_complex_data on the estimator clone. That does not explain why this test does not fail for me though, but that is a bug on its own. I will submit another PR.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 7, 2021

Closing in favor of #20221 (hopefully).

@ogrisel ogrisel closed this Jun 7, 2021
@ogrisel ogrisel deleted the fix-complex-data-outputcodeclassifier branch June 7, 2021 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants