Skip to content

[MRG+1] MAINT Reuse unaltered classes/functions from model_selection #5568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

raghavrv
Copy link
Member

Also clean up some of the cross_validation and grid_search tests.

NOTE: Lot more lines can be removed from more redundant tests, but I feel leaving them as such (which will cost a 3-4 extra seconds) will ensure that we haven't regressed on the old modules). To be more clear - if we remove a redundant test for say cross_val_score and later modify the model_selection's version of the function and test, people using the deprecated module will be surprised by that change since we simply import that from model_selection... Having the test will make sure that such a situation won't happen by failing the old tests and forcing us to leave a copy of the old implementation at the old module...

@amueller @vene @jnothman @GaelVaroquaux @ogrisel reviews?

from .utils.metaestimators import if_delegate_has_method
from .metrics.scorer import check_scoring
from .exceptions import ChangedBehaviorWarning
from .model_selection.search import ParameterGrid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need to specify .search

@jnothman
Copy link
Member

jnothman commented Nov 1, 2015

In general I approve of this cut-down on lines of code, and it also avoids people modifying the deprecated versions. There are test failures though.

@raghavrv
Copy link
Member Author

raghavrv commented Nov 1, 2015

@jnothman all the tests should pass now!

@raghavrv
Copy link
Member Author

raghavrv commented Nov 7, 2015

@jnothman :P

@jnothman
Copy link
Member

jnothman commented Nov 8, 2015

my feelings about reviewing right now

@raghavrv
Copy link
Member Author

hahaha 😆

@@ -61,6 +62,16 @@
'train_test_split']


class FitFailedWarning(FitFailedWarning_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be _FitFailedWarning according to your tab completion explanation before....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this should be FitFailedWarning so people can import this just like before, but with a deprecation warning of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I meant FitFailedWarning_ -> _FitFailedWarning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that yes.

@raghavrv
Copy link
Member Author

raghavrv commented Feb 8, 2016

@MechCoder Thanks heaps for the review!

@raghavrv
Copy link
Member Author

Fixed both your comments!

from abc import ABCMeta, abstractmethod

import numpy as np
import scipy.sparse as sp
import sklearn.model_selection as mod_sel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use relative imports in modules (and absolute imports in tests), see http://scikit-learn.org/stable/developers/contributing.html#coding-guidelines.

You may as well move this import after the from scipy.misc import comb. As far as I can see imports tend to be roughly ordered like this: stdlib, numpy, scipy, sklearn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay! Thanks...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also frown on "import as" for non standard imports: it makes reading the code harder because people need to scroll up to understand what the name of the module means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice! (BTW if you can spare a few minutes, would you be able to review this? This has been around for a long time ;( )

@raghavrv raghavrv force-pushed the reuse_mod_sel branch 2 times, most recently from 121612c to 9266ca4 Compare February 17, 2016 14:30
@raghavrv raghavrv changed the title [MRG+2] MAINT Reuse unaltered classes/functions from model_selection [MRG+1] MAINT Reuse unaltered classes/functions from model_selection Feb 18, 2016
@raghavrv
Copy link
Member Author

I hope you meant to rename it to MRG+1 and not MRG+2, although it would have been awesome ;)

@MechCoder
Copy link
Member

I thought @lesteve also gave a +1

@@ -25,6 +25,8 @@
with warnings.catch_warnings():
warnings.simplefilter('ignore')
from sklearn import cross_validation as cval
# We should be able to import this from cross_validation
from sklearn.cross_validation import FitFailedWarning as _FFW
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not using _FFW anywhere ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to test if FitFailedWarning can be imported from our old path (which was public and needs to be supported for atleast 2 more versions)

from sklearn.cross_validation import FitFailedWarning as _ would be better in that case then.

@raghavrv
Copy link
Member Author

@lesteve Apart from these comments, could I consider the review a +1 from your side?

@jnothman
Copy link
Member

I woke from my nap, it seems. In hindsight not sure it was worth the effort, but there are indeed a lot of lines saved! Not able to look through this in detail right now, but tests seem to pass...

@raghavrv
Copy link
Member Author

Haha!! :D

And this PR basically does 90% of what we do after the deprecations expire for cross_validation...

One huge advantage is that when patching minor issues, we can do it once for the model_selection alone and not worry about fixing cross_validation.

@lesteve
Copy link
Member

lesteve commented Feb 25, 2016

@lesteve Apart from these comments, could I consider the review a +1 from your side?

I don't have the necessary rights to change the PR title, so let's say it is a honorary +1 ;-).

One huge advantage is that when patching minor issues, we can do it once for the model_selection alone and not worry about fixing cross_validation.

I wholeheartedly agree on this one.

TST Clean up the cross_validation and grid_search tests.
ENH Use scipy's binomial coefficient function comb for calucation of nCk
@raghavrv
Copy link
Member Author

@lesteve Thanks again for the great catch! Have modified the test to make sure that the old (deprecated) FitFailedWarning is tested properly...

@amueller @ogrisel one final review please?

if not np.all(hit):
return False
return True
return model_selection.cross_val_predict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the signature of this one changed? Or why didn't you just import it? Though maybe this is more clear and allows changes in the new function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because the order of the labels parameter was changed in the new module... Previously it was (X, y, cv...). At model_selection, it is (X, y, labels...)...

@amueller
Copy link
Member

looks good except we can't remove GridSearchCV because we want to make backward-incompatible changes to grid_scores_ for multiple metric support. An interesting question is whether we want to make similar backward-incompatible changes to cross_val_score and learning_curve, too.
I feel returning a dict in all cases would be easier in the multiple metric case.

@raghavrv
Copy link
Member Author

looks good except we can't remove GridSearchCV because we want to make backward-incompatible changes to grid_scores_ for multiple metric support.

Yes you are correct! The main reason why the old tests are not removed is exactly that! To allow backward incompatible changes to any of model_selection classes and functions...

An interesting question is whether we want to make similar backward-incompatible changes to cross_val_score and learning_curve, too.

I feel we can merge this PR as such and re-add the old classes/functions with their old functionality (and ensure that the old tests pass) as and when we make more backward incompatible changes to the model_selection module.

The reason why I suggest that this PR be merged is that it is tough to anticipate in advance and retain the old code for all the functions/classes which could get a backward incompatible change in the future. I think it would be easier to copy the old code back in the PR (that does the backward incompatible change) itself.

WDYT? @amueller @jnothman @MechCoder

@amueller
Copy link
Member

So you want to remove code and then add it back in later? Why?
I agree that it's hard to anticipate what we want to change, so I would just leave it in.

@raghavrv
Copy link
Member Author

So you want to remove code and then add it back in later? Why?
I agree that it's hard to anticipate what we want to change, so I would just leave it in.

And revisit this PR at the end of all the model_selection changes?

@amueller
Copy link
Member

Aka just before 0.18

@raghavrv
Copy link
Member Author

Should we revisit this one now? @amueller

@jnothman
Copy link
Member

I'm now +0 on this idea. I don't think there's anything broken here except for contributors changing deprecated files.

@raghavrv
Copy link
Member Author

Okay. And there are a lot of changes too. I'm not sure how cleanly we can avoid code duplication... I'm closing this PR. Let me know if this needs to be done...

@raghavrv raghavrv closed this Sep 14, 2016
@raghavrv raghavrv deleted the reuse_mod_sel branch September 14, 2016 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants