Raise an error when all fits fail in cross-validation or grid-search #21026

lesteve · 2021-09-13T14:10:27Z

Reference Issues/PRs

This is a follow-up on #20619 (comment)

What does this implement/fix? Explain your changes.

Raise an error when all fits fail in cross-validation or grid-search even if error_score='warn'. When that happens it is very likely that something is wrong in the model definition.

Any other comments?

Things to look at:

error message wording
exception type: NotFittedError for now. This was the error type in a multi-metric edge case but not sure whether this is the most appropriate exception type.
for grid-search the error only happens when the fits fail for every models on all the splits. Do we want an error when the fits failed for a single model on all the splits?

glemaitre · 2021-09-15T10:06:50Z

You only need to add an entry in 1.1 changelog to have the CI passing.

ogrisel

While is theory this could be considered a breaking change, I have the feeling that from a backward compatible point of view, this enhancement is also a usability bug fix, so I don't think we need to go through the usual deprecation cycle in this case: I don't really see a valid case where an all nan output could be of any use.

I am fine with merging this as it is.

thomasjpfan

I am +1 with raising an error when all the fits fail during cross validation.

I left a minor comment, otherwise LGTM!

sklearn/model_selection/tests/test_validation.py

The number of unique groups was less than the number of splits.

lesteve · 2021-09-28T14:15:08Z

exception type: NotFittedError for now. This was the error type in a multi-metric edge case but not sure whether this is the most appropriate exception type.

From some discussion with @ogrisel and @glemaitre using ValueError may be more appropriate. This need that it will change the error in the multi-metric edge case but this is probably fine.

ogrisel · 2021-09-30T08:35:57Z

sklearn/model_selection/_validation.py

+                "You can try to debug the error by setting error_score='raise'.\n\n"
+                f"Below are more details about the failures:\n{fit_errors_summary}"
+            )
+            raise NotFittedError(all_fits_failed_message)


As discussed IRL, maybe NotFittedError is not the best choice of exception type here. If all fit fail, it's very likely to be caused by an incompatible choice of parameters or a bad interaction with statiscal properties of the input data. So we could use ValueError instead.

Alternatively, we could record the type of the underlying exceptions, if it's uniformly the same for all fit calls and raise an exception of that type instead.

Note that apparently, not raising NotFittedError here would cause a slight change of behavior in some edge case of multimetrics handling but this can probably considered a bug fix. Maybe @lesteve can give us more details about this point.

Sorry I had not read #21026 (comment) before starting this thread ;)

@thomasjpfan any opinion on the above? I have also the feeling that @jnothman has been gifted with uncommon abilities and judgement in the subtle art of choosing exception types.

Note that apparently, not raising NotFittedError here would cause a slight change of behavior in some edge case of multimetrics handling but this can probably considered a bug fix. Maybe @lesteve can give us more details about this point.

Quoting @thomasjpfan in #20619 (review):

With callable multimetric, at least one _fit_and_score has to succeed so that *SearchCV can create error_score dictionaries for the failed cases.

Before this PR, when all the fits failed in callable multimetric, the error was NotFittedError. This was done here.

Given the description we have for NotFittedError:

Exception class to raise if estimator is used before fitting.

I do not think NotFittedError is suppose to be used when fitting. I am in favor of changing the exception to ValueError.

There seem to be consensus on this so I changed it to ValueError.

…nto cross-val-score-error-when-all-fits-fail

glemaitre · 2021-10-05T13:46:19Z

doc/whats_new/v1.1.rst

+  splits failed. Similarly raise an error during grid-search when the fits for
+  all the models and all the splits failed. :pr:`21026` by :user:`Loïc Estève <lesteve>`.
+
+:mod:`sklearn.pipeline`


Is this change normal?

Oh yes it was not in alphabetic order

ogrisel · 2021-10-05T14:43:11Z

Merged! Thanks for the usability improvement @lesteve!

…cikit-learn#21026)

github-actions bot added the module:model_selection label Sep 13, 2021

thomasjpfan self-requested a review September 14, 2021 19:02

ogrisel approved these changes Sep 16, 2021

View reviewed changes

thomasjpfan approved these changes Sep 25, 2021

View reviewed changes

sklearn/model_selection/tests/test_validation.py Outdated Show resolved Hide resolved

lesteve added 2 commits September 27, 2021 11:19

Raise an error when all fits failed in cross-validation or grid-search

5060a25

Fix test error by lowering n_splits.

defae07

The number of unique groups was less than the number of splits.

lesteve force-pushed the cross-val-score-error-when-all-fits-fail branch from d5e9bfb to defae07 Compare September 27, 2021 09:21

lesteve added 3 commits September 27, 2021 11:23

Remove unused predict

94ddfea

Add changelog and reorder alphabetically

8b81cec

Fix PR number

561ff89

ogrisel reviewed Sep 30, 2021

View reviewed changes

lesteve added 2 commits October 4, 2021 10:29

Replace NotFittedError by ValueError

a63c0b4

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

279c0e6

…nto cross-val-score-error-when-all-fits-fail

glemaitre reviewed Oct 5, 2021

View reviewed changes

ogrisel merged commit 93bc20f into scikit-learn:main Oct 5, 2021

glemaitre mentioned this pull request Oct 23, 2021

Release 1.0.1 #21404

Merged

10 tasks

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

Raise an error when all fits fail in cross-validation or grid-search (s…

31b4f3e

…cikit-learn#21026)

lesteve deleted the cross-val-score-error-when-all-fits-fail branch December 23, 2021 13:20

lesteve mentioned this pull request Dec 23, 2021

Raise warning when fit/score fail in learning curve #22057

Closed

fealho mentioned this pull request May 13, 2022

scikit-learn release breaks quality tests sdv-dev/RDT#513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise an error when all fits fail in cross-validation or grid-search #21026

Raise an error when all fits fail in cross-validation or grid-search #21026

lesteve commented Sep 13, 2021

glemaitre commented Sep 15, 2021

ogrisel left a comment •

edited

Loading

thomasjpfan left a comment

lesteve commented Sep 28, 2021

ogrisel Sep 30, 2021

ogrisel Sep 30, 2021

ogrisel Sep 30, 2021

ogrisel Sep 30, 2021

lesteve Oct 1, 2021

thomasjpfan Oct 3, 2021 •

edited

Loading

lesteve Oct 4, 2021

glemaitre Oct 5, 2021

glemaitre Oct 5, 2021

ogrisel commented Oct 5, 2021

Raise an error when all fits fail in cross-validation or grid-search #21026

Raise an error when all fits fail in cross-validation or grid-search #21026

Conversation

lesteve commented Sep 13, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre commented Sep 15, 2021

ogrisel left a comment • edited Loading

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

lesteve commented Sep 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan Oct 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Oct 5, 2021

ogrisel left a comment •

edited

Loading

thomasjpfan Oct 3, 2021 •

edited

Loading