Skip to content

[MRG] BUG: check equality instead of identity in check_cv #12155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 25, 2018

Conversation

glemaitre
Copy link
Member

closes #12154

Check for equality instead of identity which does not hold when n_jobs=-1 due to some copy.

@glemaitre glemaitre changed the title BUG: check equality instead of identity in check_cv [MRG] BUG: check equality instead of identity in check_cv Sep 25, 2018
@glemaitre
Copy link
Member Author

I did not put any entry in what's new since that we introduce it in 0.20 and that we should add it inside the upcoming release.

Need to be back-ported to #11948

@glemaitre glemaitre added this to the 0.20 milestone Sep 25, 2018
@glemaitre
Copy link
Member Author

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR and sorry for the bug.

@qinhanmin2014
Copy link
Member

If this is included in 0.20.1, maybe we need an entry. But I prefer to hurry it into 0.20 since it seems to be serious.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this!

There are a few other similar cases there that might need to be fixed,

$ rg "\w+ is '"
model_selection/_split.py
425:        if n_splits is 'warn':
496:        if n_splits is 'warn':
597:        if n_splits is 'warn':
751:        if n_splits is 'warn':
1942:    if cv is None or cv is 'warn':

# the parameters leading to a failure in check_cv due to cv is 'warn'
# instead of cv == 'warn'.
X, y = load_iris(return_X_y=True)
clf = SVC(gamma='auto')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use a LogisticRegresion? (Trying to gain ms on the test suite run, also just makes sense to use the simplest estimator there is since it's not relevant)

X, y = load_iris(return_X_y=True)
clf = SVC(gamma='auto')
grid = GridSearchCV(clf, param_grid={'C': [1, 10]})
cross_validate(grid, X, y, n_jobs=-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_jobs=2 should be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

@glemaitre
Copy link
Member Author

sorry for the bug

Don't worry I think that this one of the side of sprinting :)

@jorisvandenbossche
Copy link
Member

+1 on including this in 0.20.0 (so re-doing the tag)

@qinhanmin2014
Copy link
Member

More generally, maybe we should avoid comparing strings with is in the repo.

@rth
Copy link
Member

rth commented Sep 25, 2018

More generally, maybe we should avoid comparing strings with is in the repo.

Haven't seen other occurrences, but I may have missed something (e.g. when a = 'something' and comparing b is a, but I don't think it's a typical pattern).

@glemaitre
Copy link
Member Author

We cannot tests n_splits is 'warns' because it will be modify in __init__ which is make it works. However, I think that it is much better to change it.

@rth
Copy link
Member

rth commented Sep 25, 2018

+1 for fixing it even without tests..

@qinhanmin2014
Copy link
Member

I'm fine with changing is to == without tests since is is not stable.
I'm not with my PC now and will do a git grep in the repo soon.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's merge when green.

@qinhanmin2014 qinhanmin2014 merged commit 5431a1a into scikit-learn:master Sep 25, 2018
@ogrisel
Copy link
Member

ogrisel commented Sep 25, 2018

I am ok with redoing the tag as well to include this as long as nothing has been pushed to pypi yet.

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Nov 29, 2018
* tag '0.20.0': (77 commits)
  ColumnTransformer generalization to work on empty lists (scikit-learn#12084)
  add sparse_threshold to make_columns_transformer (scikit-learn#12152)
  [MRG] Convert ColumnTransformer input list to numpy array (scikit-learn#12104)
  Change version to 0.20.0
  BUG: check equality instead of identity in check_cv (scikit-learn#12155)
  [MRG] Fix FutureWarnings in logistic regression examples (scikit-learn#12114)
  [MRG] Update test_metaestimators to pass y parameter when calling score (scikit-learn#12089)
  DOC Removed duplicated doc in tree.rst (scikit-learn#11922)
  [MRG] DOC covariance doctest examples (scikit-learn#12124)
  typo and formatting fixes in 0.20 doc (scikit-learn#11963)
  DOC Replaced the deprecated early_stopping parameter with n_iter_no_change. (scikit-learn#12133)
  [MRG +1] ColumnTransformer: store evaluated function column specifier during fit (scikit-learn#12107)
  Fix typo (scikit-learn#12126)
  DOC Typo in OneHotEncoder
  DOC Update fit_transform docstring of OneHotEncoder (scikit-learn#12117)
  DOC Removing quotes from variant names. (scikit-learn#12113)
  DOC BaggingRegressor missing default value for oob_score in docstring (scikit-learn#12108)
  [MRG] MNT Re-enable PyPy CI (scikit-learn#12039)
  MNT Only checks warnings on latest depedendencies versions in CI (scikit-learn#12048)
  TST Ignore warnings in common test to avoid collection errors (scikit-learn#12093)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Nov 29, 2018
* releases: (77 commits)
  ColumnTransformer generalization to work on empty lists (scikit-learn#12084)
  add sparse_threshold to make_columns_transformer (scikit-learn#12152)
  [MRG] Convert ColumnTransformer input list to numpy array (scikit-learn#12104)
  Change version to 0.20.0
  BUG: check equality instead of identity in check_cv (scikit-learn#12155)
  [MRG] Fix FutureWarnings in logistic regression examples (scikit-learn#12114)
  [MRG] Update test_metaestimators to pass y parameter when calling score (scikit-learn#12089)
  DOC Removed duplicated doc in tree.rst (scikit-learn#11922)
  [MRG] DOC covariance doctest examples (scikit-learn#12124)
  typo and formatting fixes in 0.20 doc (scikit-learn#11963)
  DOC Replaced the deprecated early_stopping parameter with n_iter_no_change. (scikit-learn#12133)
  [MRG +1] ColumnTransformer: store evaluated function column specifier during fit (scikit-learn#12107)
  Fix typo (scikit-learn#12126)
  DOC Typo in OneHotEncoder
  DOC Update fit_transform docstring of OneHotEncoder (scikit-learn#12117)
  DOC Removing quotes from variant names. (scikit-learn#12113)
  DOC BaggingRegressor missing default value for oob_score in docstring (scikit-learn#12108)
  [MRG] MNT Re-enable PyPy CI (scikit-learn#12039)
  MNT Only checks warnings on latest depedendencies versions in CI (scikit-learn#12048)
  TST Ignore warnings in common test to avoid collection errors (scikit-learn#12093)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Nov 29, 2018
* dfsg: (77 commits)
  ColumnTransformer generalization to work on empty lists (scikit-learn#12084)
  add sparse_threshold to make_columns_transformer (scikit-learn#12152)
  [MRG] Convert ColumnTransformer input list to numpy array (scikit-learn#12104)
  Change version to 0.20.0
  BUG: check equality instead of identity in check_cv (scikit-learn#12155)
  [MRG] Fix FutureWarnings in logistic regression examples (scikit-learn#12114)
  [MRG] Update test_metaestimators to pass y parameter when calling score (scikit-learn#12089)
  DOC Removed duplicated doc in tree.rst (scikit-learn#11922)
  [MRG] DOC covariance doctest examples (scikit-learn#12124)
  typo and formatting fixes in 0.20 doc (scikit-learn#11963)
  DOC Replaced the deprecated early_stopping parameter with n_iter_no_change. (scikit-learn#12133)
  [MRG +1] ColumnTransformer: store evaluated function column specifier during fit (scikit-learn#12107)
  Fix typo (scikit-learn#12126)
  DOC Typo in OneHotEncoder
  DOC Update fit_transform docstring of OneHotEncoder (scikit-learn#12117)
  DOC Removing quotes from variant names. (scikit-learn#12113)
  DOC BaggingRegressor missing default value for oob_score in docstring (scikit-learn#12108)
  [MRG] MNT Re-enable PyPy CI (scikit-learn#12039)
  MNT Only checks warnings on latest depedendencies versions in CI (scikit-learn#12048)
  TST Ignore warnings in common test to avoid collection errors (scikit-learn#12093)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: check_cv check for cv is 'warn' instead of cv == 'warn' which is raising an exception with n_jobs=-1
5 participants