-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Raise an error when all fits fail in cross-validation or grid-search #21026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise an error when all fits fail in cross-validation or grid-search #21026
Conversation
You only need to add an entry in 1.1 changelog to have the CI passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While is theory this could be considered a breaking change, I have the feeling that from a backward compatible point of view, this enhancement is also a usability bug fix, so I don't think we need to go through the usual deprecation cycle in this case: I don't really see a valid case where an all nan output could be of any use.
I am fine with merging this as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1 with raising an error when all the fits fail during cross validation.
I left a minor comment, otherwise LGTM!
The number of unique groups was less than the number of splits.
d5e9bfb
to
defae07
Compare
From some discussion with @ogrisel and @glemaitre using |
"You can try to debug the error by setting error_score='raise'.\n\n" | ||
f"Below are more details about the failures:\n{fit_errors_summary}" | ||
) | ||
raise NotFittedError(all_fits_failed_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed IRL, maybe NotFittedError
is not the best choice of exception type here. If all fit fail, it's very likely to be caused by an incompatible choice of parameters or a bad interaction with statiscal properties of the input data. So we could use ValueError
instead.
Alternatively, we could record the type of the underlying exceptions, if it's uniformly the same for all fit calls and raise an exception of that type instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that apparently, not raising NotFittedError
here would cause a slight change of behavior in some edge case of multimetrics handling but this can probably considered a bug fix. Maybe @lesteve can give us more details about this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I had not read #21026 (comment) before starting this thread ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan any opinion on the above? I have also the feeling that @jnothman has been gifted with uncommon abilities and judgement in the subtle art of choosing exception types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that apparently, not raising NotFittedError here would cause a slight change of behavior in some edge case of multimetrics handling but this can probably considered a bug fix. Maybe @lesteve can give us more details about this point.
Quoting @thomasjpfan in #20619 (review):
With callable multimetric, at least one _fit_and_score has to succeed so that *SearchCV can create error_score dictionaries for the failed cases.
Before this PR, when all the fits failed in callable multimetric, the error was NotFittedError
. This was done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the description we have for NotFittedError
:
Exception class to raise if estimator is used before fitting.
I do not think NotFittedError
is suppose to be used when fitting. I am in favor of changing the exception to ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be consensus on this so I changed it to ValueError
.
…nto cross-val-score-error-when-all-fits-fail
splits failed. Similarly raise an error during grid-search when the fits for | ||
all the models and all the splits failed. :pr:`21026` by :user:`Loïc Estève <lesteve>`. | ||
|
||
:mod:`sklearn.pipeline` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change normal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes it was not in alphabetic order
Merged! Thanks for the usability improvement @lesteve! |
Reference Issues/PRs
This is a follow-up on #20619 (comment)
What does this implement/fix? Explain your changes.
Raise an error when all fits fail in cross-validation or grid-search even if
error_score='warn'
. When that happens it is very likely that something is wrong in the model definition.Any other comments?
Things to look at: