Skip to content

[MRG+2] Label K-Fold #5190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 8, 2015
Merged

[MRG+2] Label K-Fold #5190

merged 4 commits into from
Sep 8, 2015

Conversation

glouppe
Copy link
Contributor

@glouppe glouppe commented Aug 30, 2015

This supersedes #4444

@glouppe glouppe changed the title [MRG+1] Label K-Fold [WIP] Label K-Fold Aug 30, 2015
@glouppe
Copy link
Contributor Author

glouppe commented Aug 30, 2015

I am switching this back to WIP, as I am not sure the label_kfold function, in addition to LabelKFold class is needed. I will refactor that to follow more closely the structure of the other iterators.

@JeanKossaifi
Copy link
Contributor

About the name, the reason we had decided disjointLabelKFold and not simply LabelKFold is that it is more informative. In that instance the goal was to perform subject/label independent experiment. It doesn't really matter whether the method acts on labels or something else, the main point is that the resulting folds are disjoint. Isn't it clearer to the user if the name indicates that?

@glouppe glouppe changed the title [WIP] Label K-Fold [MRG] Label K-Fold Aug 31, 2015
@glouppe
Copy link
Contributor Author

glouppe commented Aug 31, 2015

I removed the standalone function and moved its logic inside LabelKFold.__init__. This is ready for review. I kept @JeanKossaifi original code as much as possible.

For the naming, I have no string opinion, but we should aim for something consistent across all CV iterators. For now "label" appears to be the used terminology, but I agree that "grouped" is maybe clearer.

@JeanKossaifi
Copy link
Contributor

Seems good to me.

@glouppe glouppe changed the title [MRG] Label K-Fold [MRG+1] Label K-Fold Sep 1, 2015

>>> labels = [1, 1, 1, 2, 2, 2, 3, 3, 3, 3]

>>> lkf = LabelKFold(labels, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use lkf = LabelKFold(labels, n_folds=3).

@arjoly
Copy link
Member

arjoly commented Sep 3, 2015

I made some nitpicks, but overall the code looks good.

The naming questions shouldn't be neglected.

@glouppe
Copy link
Contributor Author

glouppe commented Sep 3, 2015

Indeed the code needs some polishing (I kept most of #4444), I'll address your comments tomorrow.

JeanKossaifi and others added 3 commits September 7, 2015 09:53
Changed SubjectIndependentKFold to DisjointGroupKFold

cosmetic changes  test (fix seed correctly, use assert_equal for
meaningful error messages)

Changed name to DisjointLabelKFold

Added example of use

FIX: whitespace related doctest failure

FIX: Python 2.6 requires the field numbers in print

FIX: change docstring to comment in test function

DOC: moved docstring from function to class

FIX: added call to parent class

FIX: error in calling the parent

DOC: fixed doctest

FIX: doctest

Cosmetic changes (minor refactoring)

Optimised code (use np.bincount)

Cosmetic: use samples instead of weight for clarity

Minor fix: removed shuffle parameter

Cosmetic

Use mergesort instead of quicksort for reproducibility.

Changed variable name 'y' to 'label'.

Added test for degenerate case where n_folds > n_labels.

Documented the requirement n_labels > n_folds.

DOC: improved description + added see also sections.

Fixed dtype of temporary arrays.

Improved test: check that one label is not in both test and training.

Added documentation for DisjoinLabelKFold.
COSMIT: doc, pep8, etc

Refactor code
@glouppe
Copy link
Contributor Author

glouppe commented Sep 7, 2015

@arjoly I think a addressed all your comments.

@arjoly
Copy link
Member

arjoly commented Sep 7, 2015

thanks!

@jnothman
Copy link
Member

jnothman commented Sep 7, 2015

LGTM... Are we certain we need this and the new LabelShuffleSplit, though?

@JeanKossaifi
Copy link
Contributor

Yes, they have different uses: LabelShuffleSplit doesn't maintain balance between folds whereas this class creates K disjoint approximately equilibrated folds usable, for instance for person specific K-fold cross validation.

@jnothman
Copy link
Member

jnothman commented Sep 7, 2015

Well apart from that nitpick about attribute initialisation above, I think we can merge this...

@glouppe
Copy link
Contributor Author

glouppe commented Sep 8, 2015

@jnothman Indeed, thanks for noticing! This has been fixed.

I added a what's new entry. Waiting for CI to give the green light to merge.

@glouppe glouppe changed the title [MRG+1] Label K-Fold [MRG+2] Label K-Fold Sep 8, 2015
glouppe added a commit that referenced this pull request Sep 8, 2015
@glouppe glouppe merged commit 44b1f3a into scikit-learn:master Sep 8, 2015
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Sep 13, 2015
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Sep 14, 2015
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 5, 2015
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 5, 2015
From scikit-learn#5161
 - MAINT remove redundant p variable
 - Add check for sparse prediction in cross_val_predict
From scikit-learn#5201 - DOC improve random_state param doc
From scikit-learn#5190 - LabelKFold and test
From scikit-learn#4583 - LabelShuffleSplit and tests
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 5, 2015
From scikit-learn#5161
 - MAINT remove redundant p variable
 - Add check for sparse prediction in cross_val_predict
From scikit-learn#5201 - DOC improve random_state param doc
From scikit-learn#5190 - LabelKFold and test
From scikit-learn#4583 - LabelShuffleSplit and tests
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 5, 2015
From scikit-learn#5161
 - MAINT remove redundant p variable
 - Add check for sparse prediction in cross_val_predict
From scikit-learn#5201 - DOC improve random_state param doc
From scikit-learn#5190 - LabelKFold and test
From scikit-learn#4583 - LabelShuffleSplit and tests
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 5, 2015
From scikit-learn#5161
 - MAINT remove redundant p variable
 - Add check for sparse prediction in cross_val_predict
From scikit-learn#5201 - DOC improve random_state param doc
From scikit-learn#5190 - LabelKFold and test
From scikit-learn#4583 - LabelShuffleSplit and tests
From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 5, 2015
From scikit-learn#5161
 - MAINT remove redundant p variable
 - Add check for sparse prediction in cross_val_predict
From scikit-learn#5201 - DOC improve random_state param doc
From scikit-learn#5190 - LabelKFold and test
From scikit-learn#4583 - LabelShuffleSplit and tests
From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests

Other minor changes
-------------------

Fix cross_validation reference
Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 9, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 12, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 15, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 16, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
@glouppe glouppe deleted the labelkfold branch October 20, 2015 07:39
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 20, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 20, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 20, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 21, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
  - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 21, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
  - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 22, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
  - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 22, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
  - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 23, 2015
Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From scikit-learn#5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From scikit-learn#5201 - DOC improve random_state param doc
  - From scikit-learn#5190 - LabelKFold and test
  - From scikit-learn#4583 - LabelShuffleSplit and tests
  - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
  - From scikit-learn#5458 - Remove shuffle from LabelKFold
  - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
  - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
amueller pushed a commit that referenced this pull request Oct 23, 2015
--------------------

* ENH Reogranize classes/fn from grid_search into search.py
* ENH Reogranize classes/fn from cross_validation into split.py
* ENH Reogranize cls/fn from cross_validation/learning_curve into validate.py

* MAINT Merge _check_cv into check_cv inside the model_selection module
* MAINT Update all the imports to point to the model_selection module
* FIX use iter_cv to iterate throught the new style/old style cv objs
* TST Add tests for the new model_selection members
* ENH Wrap the old-style cv obj/iterables instead of using iter_cv

* ENH Use scipy's binomial coefficient function comb for calucation of nCk
* ENH Few enhancements to the split module
* ENH Improve check_cv input validation and docstring
* MAINT _get_test_folds(X, y, labels) --> _get_test_folds(labels)
* TST if 1d arrays for X introduce any errors
* ENH use 1d X arrays for all tests;
* ENH X_10 --> X (global var)

Minor
-----

* ENH _PartitionIterator --> _BaseCrossValidator;
* ENH CVIterator --> CVIterableWrapper
* TST Import the old SKF locally
* FIX/TST Clean up the split module's tests.
* DOC Improve documentation of the cv parameter
* COSMIT consistently hyphenate cross-validation/cross-validator
* TST Calculate n_samples from X
* COSMIT Use separate lines for each import.
* COSMIT cross_validation_generator --> cross_validator

Commits merged manually
-----------------------

* FIX Document the random_state attribute in RandomSearchCV
* MAINT Use check_cv instead of _check_cv
* ENH refactor OVO decision function, use it in SVC for sklearn-like
  decision_function shape
* FIX avoid memory cost when sampling from large parameter grids

ENH Major to Minor incremental enhancements to the model_selection

Squashed commit messages - (For reference)

Major
-----

* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
    To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
    pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
  - From #5161
  -  - MAINT remove redundant p variable
  -  - Add check for sparse prediction in cross_val_predict
  - From #5201 - DOC improve random_state param doc
  - From #5190 - LabelKFold and test
  - From #4583 - LabelShuffleSplit and tests
  - From #5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
  - From #5378 - Make the GridSearchCV docs more accurate.
  - From #5458 - Remove shuffle from LabelKFold
  - From #5466(#4270) - Gaussian Process by Jan Metzen
  - From #4826 - Move custom error / warnings into sklearn.exception

Minor
-----

* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc

FIX/DOC/MAINT Addressing the review comments by Arnaud and Andy

COSMIT Sort the members alphabetically
COSMIT len_cv --> n_splits
COSMIT Merge 2 if; FIX Use kwargs
DOC Add my name to the authors :D
DOC make labels parameter consistent
FIX Remove hack for boolean indices; + COSMIT idx --> indices; DOC Add Returns
COSMIT preds --> predictions
DOC Add Returns and neatly arrange X, y, labels
FIX idx(s)/ind(s)--> indice(s)
COSMIT Merge if and else to elif
COSMIT n --> n_samples
COSMIT Use bincount only once
COSMIT cls --> class_i / class_i (ith class indices) -->
perm_indices_class_i

FIX/ENH/TST Addressing the final reviews

COSMIT c --> count
FIX/TST make check_cv raise ValueError for string cv value
TST nested cv (gs inside cross_val_score) works for diff cvs
FIX/ENH Raise ValueError when labels is None for label based cvs;
TST if labels is being passed correctly to the cv and that the
ValueError is being propagated to the cross_val_score/predict and grid
search
FIX pass labels to cross_val_score
FIX use make_classification
DOC Add Returns; COSMIT Remove scaffolding
TST add a test to check the _build_repr helper
REVERT the old GS/RS should also be tested by the common tests.
ENH Add a tuple of all/label based CVS
FIX raise VE even at get_n_splits if labels is None
FIX Fabian's comments
PEP8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants