-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Enable grid_search with classifiers that fail on individual training folds #2587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -264,18 +264,33 @@ def fit_grid_point(X, y, base_estimator, parameters, train, test, scorer, | |
if y is not None: | ||
y_test = y[safe_mask(y, test)] | ||
y_train = y[safe_mask(y, train)] | ||
clf.fit(X_train, y_train, **fit_params) | ||
|
||
if scorer is not None: | ||
this_score = scorer(clf, X_test, y_test) | ||
try: | ||
clf.fit(X_train, y_train, **fit_params) | ||
except ValueError as e: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably be catching There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My reasoning here was that if the classifier fails on some specific folds but not others then this is due to data and should probably result in a ValueError. It seems that it is generally recommended to only catch a subset of exceptions rather than all of them. But perhaps assuming that it will be a ValueError is going too far. |
||
# If the classifier fails, the score is set to 0 by default | ||
this_score = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not certain 0 is always the correct value. For example, for correlation metrics, might -1 be preferable sometimes? Maybe to support this functionality we should have an additional parameter to
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this idea. |
||
warnings.warn("Classifier fit failed. The score for this fold on " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder how much we need to to help the user understand why/where it failed. In terms of where: should we be noting the parameters and the fold number that broke? Perhaps not if all non-failing folds scored > 0, in which case it's easy enough to inspect the search results. In terms of why: Firstly, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose we could print out the parameters that break the classfier but with fold number it could be more tricky because Switching from Same for providing a switch to raise errors, although again this means an extra parameter for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think there's harm in an extra parameter for |
||
"this test point will be set to zero. Details: " + | ||
str(e), RuntimeWarning) | ||
else: | ||
this_score = clf.score(X_test, y_test) | ||
if scorer is not None: | ||
this_score = scorer(clf, X_test, y_test) | ||
else: | ||
this_score = clf.score(X_test, y_test) | ||
else: | ||
clf.fit(X_train, **fit_params) | ||
if scorer is not None: | ||
this_score = scorer(clf, X_test) | ||
try: | ||
clf.fit(X_train, **fit_params) | ||
except ValueError as e: | ||
# If the classifier fails, the score is set to 0 by default | ||
this_score = 0 | ||
warnings.warn("Classifier fit failed. The score for this fold on " | ||
"this test point will be set to zero. Details: " + | ||
str(e), RuntimeWarning) | ||
else: | ||
this_score = clf.score(X_test) | ||
if scorer is not None: | ||
this_score = scorer(clf, X_test) | ||
else: | ||
this_score = clf.score(X_test) | ||
|
||
if not isinstance(this_score, numbers.Number): | ||
raise ValueError("scoring must return a number, got %s (%s)" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication could be avoided with something like
clf.fit(*fit_args, **fit_params)
wherefit_args
is set differently for they is None
andy is not None
cases. (In some PR related to grid search somewhere I have implemented it this way, but the remainder of the PR was too controversial to merge as yet.)