Skip to content

Raise an error when all fits fail in cross-validation or grid-search #21026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Sep 13, 2021

Reference Issues/PRs

This is a follow-up on #20619 (comment)

What does this implement/fix? Explain your changes.

Raise an error when all fits fail in cross-validation or grid-search even if error_score='warn'. When that happens it is very likely that something is wrong in the model definition.

Any other comments?

Things to look at:

  • error message wording
  • exception type: NotFittedError for now. This was the error type in a multi-metric edge case but not sure whether this is the most appropriate exception type.
  • for grid-search the error only happens when the fits fail for every models on all the splits. Do we want an error when the fits failed for a single model on all the splits?

@glemaitre
Copy link
Member

You only need to add an entry in 1.1 changelog to have the CI passing.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While is theory this could be considered a breaking change, I have the feeling that from a backward compatible point of view, this enhancement is also a usability bug fix, so I don't think we need to go through the usual deprecation cycle in this case: I don't really see a valid case where an all nan output could be of any use.

I am fine with merging this as it is.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 with raising an error when all the fits fail during cross validation.

I left a minor comment, otherwise LGTM!

@lesteve lesteve force-pushed the cross-val-score-error-when-all-fits-fail branch from d5e9bfb to defae07 Compare September 27, 2021 09:21
@lesteve
Copy link
Member Author

lesteve commented Sep 28, 2021

exception type: NotFittedError for now. This was the error type in a multi-metric edge case but not sure whether this is the most appropriate exception type.

From some discussion with @ogrisel and @glemaitre using ValueError may be more appropriate. This need that it will change the error in the multi-metric edge case but this is probably fine.

"You can try to debug the error by setting error_score='raise'.\n\n"
f"Below are more details about the failures:\n{fit_errors_summary}"
)
raise NotFittedError(all_fits_failed_message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL, maybe NotFittedError is not the best choice of exception type here. If all fit fail, it's very likely to be caused by an incompatible choice of parameters or a bad interaction with statiscal properties of the input data. So we could use ValueError instead.

Alternatively, we could record the type of the underlying exceptions, if it's uniformly the same for all fit calls and raise an exception of that type instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that apparently, not raising NotFittedError here would cause a slight change of behavior in some edge case of multimetrics handling but this can probably considered a bug fix. Maybe @lesteve can give us more details about this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I had not read #21026 (comment) before starting this thread ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan any opinion on the above? I have also the feeling that @jnothman has been gifted with uncommon abilities and judgement in the subtle art of choosing exception types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that apparently, not raising NotFittedError here would cause a slight change of behavior in some edge case of multimetrics handling but this can probably considered a bug fix. Maybe @lesteve can give us more details about this point.

Quoting @thomasjpfan in #20619 (review):

With callable multimetric, at least one _fit_and_score has to succeed so that *SearchCV can create error_score dictionaries for the failed cases.

Before this PR, when all the fits failed in callable multimetric, the error was NotFittedError. This was done here.

Copy link
Member

@thomasjpfan thomasjpfan Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the description we have for NotFittedError:

Exception class to raise if estimator is used before fitting.

I do not think NotFittedError is suppose to be used when fitting. I am in favor of changing the exception to ValueError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be consensus on this so I changed it to ValueError.

splits failed. Similarly raise an error during grid-search when the fits for
all the models and all the splits failed. :pr:`21026` by :user:`Loïc Estève <lesteve>`.

:mod:`sklearn.pipeline`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change normal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes it was not in alphabetic order

@ogrisel ogrisel merged commit 93bc20f into scikit-learn:main Oct 5, 2021
@ogrisel
Copy link
Member

ogrisel commented Oct 5, 2021

Merged! Thanks for the usability improvement @lesteve!

@glemaitre glemaitre mentioned this pull request Oct 23, 2021
10 tasks
@lesteve lesteve deleted the cross-val-score-error-when-all-fits-fail branch December 23, 2021 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants