Skip to content

[MRG+2] Enable grid search with classifiers that may fail on individual fits. #2795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

romaniukm
Copy link

Supersedes #2587. The old pull request was so outdated (and the functionality affected was moved to different files) that I decided to re-do the work starting with a fresh checkout from master. Since this is technically a new branch, I'm opening a new pull request.

self.parameter = parameter

def fit(self, X, y=None):
raise ValueError("Failing classifier failed as requiered")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead do:

if self.parameter:
    raise ...
return self

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"requiered" -> "required"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does raise ... do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was writing shorthand. I meant that your test would be more robust if the classifier didn't always fail, so you can check it doesn't give zero score to all parameters on failure

@jnothman
Copy link
Member

I think fit_exceptions_to_warnings and fit_exception_score are very long names that are hard to remember (the missing s in the second one is extra tricky). What's wrong with on_error : float or 'error' (or on_fit_error if you must).

You might as well warn always; use a custom warning class and the user can ignore it with warnings.simplefilter.

if return_train_score:
train_score = fit_exception_score
warnings.warn("Classifier fit failed. The score on this train-test"
" partition will be set to zero. Details: " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'zero' should be variable: format with %0.1f, and %r for the exception.

@romaniukm
Copy link
Author

I made some changes according to the feedback received here.

Meanwhile, I'm also having problems with some tests failing but these tests seem to be unrelated to the code I changed:

======================================================================
FAIL: sklearn.feature_extraction.tests.test_image.test_connect_regions
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/feature_extraction/tests/test_image.py", line 63, in test_connect_regions
    assert_equal(ndimage.label(mask)[1], connected_components(graph)[0])
AssertionError: 777 != 767
    '777 != 767' = '%s != %s' % (safe_repr(777), safe_repr(767))
    '777 != 767' = self._formatMessage('777 != 767', '777 != 767')
>>  raise self.failureException('777 != 767')


======================================================================
FAIL: sklearn.feature_extraction.tests.test_image.test_connect_regions_with_grid
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/feature_extraction/tests/test_image.py", line 70, in test_connect_regions_with_grid
    assert_equal(ndimage.label(mask)[1], connected_components(graph)[0])
AssertionError: 777 != 767
    '777 != 767' = '%s != %s' % (safe_repr(777), safe_repr(767))
    '777 != 767' = self._formatMessage('777 != 767', '777 != 767')
>>  raise self.failureException('777 != 767')

train_score = exception_score
warnings.warn("Classifier fit failed. The score on this train-test"
" partition will be set to %f. Details: \n%r" %
(exception_score, e), RuntimeWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sub-class RuntimeWarning so that the user can easily filter these warnings.

@jnothman
Copy link
Member

jnothman commented Feb 3, 2014

I'm still not convinced you need separate parameters 'skip_exceptions' and 'exception_score', but all in all this is looking good.

Does someone else want to pitch in? @ogrisel, you've mentioned failing CV before...

@romaniukm
Copy link
Author

@jnothman, I subclassed RuntimeWarning as you suggested. Any more ideas for improvement?

@GaelVaroquaux
Copy link
Member

I'm still not convinced you need separate parameters 'skip_exceptions' and 'exception_score'

I agree. Less parameters is better.

@romaniukm
Copy link
Author

Can you explain how I can do this with one parameter?

@jnothman
Copy link
Member

For example:

GridSearchCV(on_error='raise') will do what the code currently does: raise
the exception

GridSearchCV(on_error=0) will return 0 in place of the breaking fold.

This assumes no one uses a class label called 'raise', but that may be a
fair enough assumption.

On 12 February 2014 02:15, Michal Romaniuk notifications@github.com wrote:

Can you explain how I can do this with one parameter?

Reply to this email directly or view it on GitHubhttps://github.com//pull/2795#issuecomment-34763490
.

@romaniukm
Copy link
Author

@jnothman @GaelVaroquaux I'm not too keen on having one parameter with a special value but I changed it as you requested. Anything else?

@@ -1142,6 +1147,10 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,
verbose : integer
The verbosity level.

on_error : numeric or 'raise'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I would call this parameter 'error_score'. When set to 'raise' it would indeed raise the exception.

@jnothman : what do you think of the proposed name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind what we call it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with error_score.

@GaelVaroquaux
Copy link
Member

I am OK with the general lines of this PR. I made just a few minor comments, the only one that might be a bit subject to discussion is the proposed name change for the argument.

Also, could you rebase your branch on master: it is not mergeable in the current state.

Thanks a lot for your work.

@romaniukm
Copy link
Author

Hi @GaelVaroquaux and @jnothman,

I'm back to working on this after a long-ish break. I made the requested changes but it looks like something went wrong when I was trying to figure out how to do the rebase. Can you tell me if it's ok, and if not then what I should do?

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 0d3c952 on romaniukm:gridsearch-failing-classifiers-2 into c49723d on scikit-learn:master.

@jnothman
Copy link
Member

I've not looked at the damage (though it looks like you've done a merge rather than a rebase), but you can just do the rebase again if it's not arduous:

$ git checkout master
$ git pull https://github.com/scikit-learn/scikit-learn master:master
$ git checkout gridsearch-failing-classifiers-2
$ git reset --hard b5e5c5d  # take us back to "Renamed on_error ..."
$ git rebase master
# fix any conflicts, followed by git add and git rebase --continue
$ git push --force https://github.com/romaniukm/scikit-learn gridsearch-failing-classifiers-2

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling ae70cdf on romaniukm:gridsearch-failing-classifiers-2 into b4625c7 on scikit-learn:master.

@romaniukm
Copy link
Author

Rebase done.


def _fit_and_score(estimator, X, y, scorer, train, test, verbose,
parameters, fit_params, return_train_score=False,
return_parameters=False, skip_exceptions=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have the skip_exceptions parameter though it does nothing.

@jnothman
Copy link
Member

I'm not too keen on having one parameter with a special value but I changed it as you requested.

We have parameter overloading all over the place in scikit-learn. In some places it's arguably more problematic than here, where 'raise' simply can't be a valid score, and score is irrelevant if an error is to be raised.

@romaniukm
Copy link
Author

Thanks @jnothman for reviewing my code. I corrected the mistakes you pointed out and I hope it's all well now.

@@ -1141,6 +1145,11 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,
verbose : integer
The verbosity level.

error_score : numeric or 'raise'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this 'raise' (default) or numeric.

@@ -1141,6 +1145,12 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,
verbose : integer
The verbosity level.

error_score : 'raise' (default) or numeric
Value to assign to the score if an error occurs in estimator fitting.
If set to 'raise', the error is raised. If a numeric value is given,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the numerical value mean? Why a numerical value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is stated on the line before, Value to assign to the score if an error occurs in estimator fitting.

But yes, the ordering makes it a little unclear. Perhaps "otherwise" would suffice. I wish there were also a better word than "raise" for warnings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe change the ordering of the lines in the docstring. But feel free to ignore this comment.

@GaelVaroquaux
Copy link
Member

@romaniukm : sorry for the slow reply, it's just that I am swamped. The amount of activity that scikit-learn pull requests create is crazy, and they add to my day work.

@@ -1192,13 +1202,25 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,

X_train, y_train = _safe_split(estimator, X, y, train)
X_test, y_test = _safe_split(estimator, X, y, test, train)
if y_train is None:
estimator.fit(X_train, **fit_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've lost the lines calling the estimator without y_train when y was None. Do we have a clear view of the consequences? If not, I'd rather keep them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is fair enough: we can't be sure that all unsupervised downstream estimators have a second argument.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to replace this with the following?

        if y_train is None:
            estimator.fit(X_train, **fit_params)
        else:
            estimator.fit(X_train, y_train, **fit_params)

@GaelVaroquaux
Copy link
Member

OK, aside the two comments above (y_train being none and elif clause) this is good to go as far as I am concerned. Good job!

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling 60ceb95 on romaniukm:gridsearch-failing-classifiers-2 into d0f6052 on scikit-learn:master.

@romaniukm
Copy link
Author

@GaelVaroquaux I rebased and made the changes as requested.

I also decided to do some blatant self-promotion by adding my name to the contributors list in whats_new.rst ... I'm not sure though if I contributed enough so it's up to you to leave it or delete it.

Thanks for the feedback!

@romaniukm
Copy link
Author

The Travis CI build seems to be failing but only in the configuration DISTRIB="ubuntu" PYTHON_VERSION="2.7" INSTALL_ATLAS="true" COVERAGE="true". Is there a way to see what went wrong with that build?

On my machine, two tests fail but I didn't change those files and they seem to be completely unrelated to my code:

======================================================================
FAIL: sklearn.feature_extraction.tests.test_image.test_connect_regions
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/feature_extraction/tests/test_image.py", line 63, in test_connect_regions
    assert_equal(ndimage.label(mask)[1], connected_components(graph)[0])
AssertionError: 777 != 767
    '777 != 767' = '%s != %s' % (safe_repr(777), safe_repr(767))
    '777 != 767' = self._formatMessage('777 != 767', '777 != 767')
>>  raise self.failureException('777 != 767')

======================================================================
FAIL: sklearn.feature_extraction.tests.test_image.test_connect_regions_with_grid
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/feature_extraction/tests/test_image.py", line 70, in test_connect_regions_with_grid
    assert_equal(ndimage.label(mask)[1], connected_components(graph)[0])
AssertionError: 777 != 767
    '777 != 767' = '%s != %s' % (safe_repr(777), safe_repr(767))
    '777 != 767' = self._formatMessage('777 != 767', '777 != 767')
>>  raise self.failureException('777 != 767')

@ogrisel
Copy link
Member

ogrisel commented Jun 6, 2014

Can you please try to rebase your whole branch on top of the current master. I think those failures have been fixed there in the mean time.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 01392cb on romaniukm:gridsearch-failing-classifiers-2 into b16ffbb on scikit-learn:master.

@romaniukm
Copy link
Author

@ogrisel So I rebased and the Travis build passed this time, but those two tests still fail on my machine... Do you know where I can find some information about what causes this problem?

@romaniukm
Copy link
Author

Does [MRG+1] mean it's planned to be merged in a future release?

@jnothman
Copy link
Member

MRG+1 means one reviewer has said this should be accepted. Generally, a PR requires a second +1 before merging.

])
gs = GridSearchCV(p, {'classifier__foo_param': [1, 2, 3]}, cv=2).fit(X, y)
class FailingClassifier(BaseEstimator):
""" Classifier that raises a ValueError on fit() """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cosmetics: please remove the whitespaces at the beginning and the end of the docstring to be consistent with the PEP 257 style.

@ogrisel
Copy link
Member

ogrisel commented Aug 1, 2014

Please update the what's new file to move that change to the block of the 0.16 release (you might need to rebase again on current master first).

Also please squash this PR as a single commit.

Then +1 for merge on my side as well.

@ogrisel
Copy link
Member

ogrisel commented Aug 1, 2014

git fetch upstream
git rebase -i upstream/master

pick the first commit and squash the other and put a descriptive commit message.

then git push -f romaniukm gridsearch-failing-classifiers-2

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling b29a54c on romaniukm:gridsearch-failing-classifiers-2 into 0a7bef6 on scikit-learn:master.

@romaniukm
Copy link
Author

@jnothman I'm not sure how your DOC commit ended up in my local master but not the github one... Was it rolled back? In that case, what should I do about it?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling bf8c7b6 on romaniukm:gridsearch-failing-classifiers-2 into 22cafa6 on scikit-learn:master.

@jnothman
Copy link
Member

jnothman commented Aug 5, 2014

Somehow you have cherry-picked in that commit (or based on-top of it); in master it includes a merge commit that appears to be absent here. You should be able to do a rebase on an updated master, but don't worry about it too much

@romaniukm
Copy link
Author

It looks like this branch went out of sync with master again...

@jnothman
Copy link
Member

jnothman commented Oct 1, 2014

Okay. This PR has 3 +1s, and the title should reflect this... I'll try to rebase and merge today.

@jnothman jnothman changed the title [MRG+1] Enable grid search with classifiers that may fail on individual fits. [MRG+2] Enable grid search with classifiers that may fail on individual fits. Oct 1, 2014
@jnothman
Copy link
Member

jnothman commented Oct 2, 2014

Merged as 58be184. Thanks @romaniukm (and for your patience!)

@jnothman jnothman closed this Oct 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants