Skip to content

[MRG] Add deprecation warning for iid in BaseSearchCV #9103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ldirer
Copy link
Contributor

@ldirer ldirer commented Jun 10, 2017

Reference Issue

#9085

What does this implement/fix? Explain your changes.

Deprecate iid param.

Any other comments?

  • I did not find references to iid in the doc/benchmark/examples.
  • This will emit a couple additional DeprecationWarnings when running the tests. I'm not sure it's worth removing them.

@ldirer ldirer changed the title [WIP] Add deprecation warning for iid in BaseSearchCV [MRG] Add deprecation warning for iid in BaseSearchCV Jun 10, 2017
@vene
Copy link
Member

vene commented Jun 10, 2017

sorry, what is the connection between this and #9031?

@@ -398,6 +398,11 @@ def __init__(self, estimator, scoring=None,
self.pre_dispatch = pre_dispatch
self.error_score = error_score

if not self.iid:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iid should be set to None and if it's not none, the warning should be shown, and if it is None, it should behave as True (as that's the current behavior)

@ldirer
Copy link
Contributor Author

ldirer commented Jun 10, 2017

Sorry about linking to the wrong issue, failed tab completion on my side.

@vene
Copy link
Member

vene commented Jun 10, 2017

lol, I was hoping the corresponding issue would have some links to resources that I could use to authoritatively convince others why this kind of cv is wrong. Thanks though!

@ldirer ldirer changed the title [MRG] Add deprecation warning for iid in BaseSearchCV [WIP] Add deprecation warning for iid in BaseSearchCV Jun 10, 2017
@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 10, 2017 via email

@vene
Copy link
Member

vene commented Jun 11, 2017

This is a great writeup, thanks @GaelVaroquaux!

@ldirer ldirer force-pushed the deprecate-iid-gridsearch branch from e513b8f to 451811b Compare June 17, 2017 15:01
@ldirer ldirer force-pushed the deprecate-iid-gridsearch branch from 451811b to 7ad4b06 Compare June 17, 2017 15:23
@ldirer ldirer changed the title [WIP] Add deprecation warning for iid in BaseSearchCV [MRG] Add deprecation warning for iid in BaseSearchCV Jun 17, 2017
..deprecated:: 0.19
Parameter ``iid`` has been deprecated in version 0.19 and
will be removed in 0.21.
Future (and default) behavior is equivalent to `iid=true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't @GaelVaroquaux saying that this behaviour is inappropriate? #9103 (comment)

Rather we should be both deprecating the parameter and changing the default.

@raghavrv
Copy link
Member

raghavrv commented Jul 13, 2017

Needs a merge with master

@amueller
Copy link
Member

Ok this is really bad. We want this to be False. So what we need to do, and need to do before 0.19, is to warn people that the default behavior will change.
So set the default to None, if it's None, warn and behave as True and say that the default will be False in 0.21 and it will be removed in 0.23. 😢

This will likely raise a warning for everybody using GridSearchCV.

@amueller amueller added this to the 0.19 milestone Jul 15, 2017
@amueller
Copy link
Member

Actually, we should detect whether score is accuracy or the score method of ClassifierMixin and not raise a warning in that case.

@amueller
Copy link
Member

Hmm questions is whether we will special case the pipeline score to avoid warning in that case if possible.

@amueller
Copy link
Member

wait, I thought the iid was doing something different from what it was doing. It's just doing a weighted mean instead of a non-weighted mean? So that is independent of what metric we are using.... and we need to warn whenever the sample numbers are not the same?

@amueller
Copy link
Member

hm... not sure that is actually a blocker... I thought it would to actually equal treatment of all data points, but it doesn't really.

@agramfort
Copy link
Member

agramfort commented Jul 16, 2017 via email

@jnothman
Copy link
Member

jnothman commented Jul 16, 2017 via email

@agramfort
Copy link
Member

agramfort commented Jul 17, 2017 via email

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2017

+1 for deprecating the iid parameter and changing the default behavior (to do the equivalent of iid=False) right away without a FutureWarning but with proper documentation in the whats_new.rst.

In practice, it's very unlikely to significantly change the outcome of a grid search as most of our CV strategies will yield CV-folds with approximately equal sizes. The impact will probably be at noise level.

@jnothman
Copy link
Member

jnothman commented Jul 17, 2017 via email

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2017

I am afraid that the FutureWarning is going to be very annoying. It cannot be silenced by passing iid=False explicitly because this would trigger a DeprecationWarning and we don't want to encourage our users to explicit use the iid parameter in their code.

@jnothman
Copy link
Member

jnothman commented Jul 17, 2017 via email

@jnothman
Copy link
Member

jnothman commented Jul 17, 2017 via email

@amueller
Copy link
Member

amueller commented Jul 18, 2017

So we don't really have a consensus, right?
Options are

1- warn to change in two versions (my PR: #9379). Noisy but can be silenced but takes 4 releases to go away.
2- change now and raise ChangedBehaviorWarning (or something) for two versions. Noisy and can't be silenced - except for setting a filter for ChangedBehaviorWarning -- which isn't that bad?
3- Change now and document in whatsnew.rst and don't warn.

When I created my PR I thought the change was much more significant. I agree that it will be very small in most cases. It's hard to warn selectively as even KFold and StratifiedKFold can create these warnings.

One option would be to only warn for splitters where the folds have very uneven sizes, so the grouped ones, the TimeSeriesSplit and any custom ones.
So I guess that's a fourth option - or rather fourth and fifth, because it could be combined with the "change now" (option 4) or the "warn about future change" (option 5).

@ogrisel
Copy link
Member

ogrisel commented Jul 18, 2017

@GaelVaroquaux what's your opinion?

@jnothman
Copy link
Member

jnothman commented Jul 18, 2017 via email

@ogrisel ogrisel modified the milestones: 0.20, 0.19 Jul 26, 2017
@amueller
Copy link
Member

It would be good to get input from @GaelVaroquaux before releasing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants