Skip to content

Conversation

jeremiedbb
Copy link
Member

Complement of #13443 in order to clean the deprecations for 0.21

I moved the _split part a this separate PR because the change is not straightforward and needs more work.

I've set test_size default value to None, which will be interpreted as the default value (depending on the estimator) when train_size is also None.

@jeremiedbb jeremiedbb changed the title [WIP] Update default test_size for 0.21 [WIP] Update default test_size of ShuffleSplit for 0.21 Mar 20, 2019
@jnothman
Copy link
Member

The not doing validation in __init__ is a convention for objects supporting set_params. CV splitters do not, so it doesn't apply here. But I'm not sure if it matters whether we validate then. Certainly can factor the code nicer.

@jeremiedbb jeremiedbb changed the title [WIP] Update default test_size of ShuffleSplit for 0.21 [MRG] Update default test_size of ShuffleSplit for 0.21 Mar 21, 2019
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good but I think we need new tests in test_train_test_split to cover the new behavior when train_size is provided but test_size is left to it's default value.

    # simple test
    split = train_test_split(X, y, test_size=None, train_size=.5)
    X_train, X_test, y_train, y_test = split
    assert_equal(len(y_test), len(y_train))

should be changed to:

    # simple test
    split = train_test_split(X, y, train_size=.5)
    X_train, X_test, y_train, y_test = split
    assert_equal(len(y_test), len(y_train))

and probably something similar for ShuffleSplit / StratifiedShuffleSplit.

if train_size is not None and train_size_type not in ('i', 'f'):
raise ValueError("Invalid value for train_size: {}".format(train_size))
if test_size is not None and test_size_type not in ('i', 'f'):
raise ValueError("Invalid value for test_size: {}".format(test_size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it should be a TypeError but I guess it's too late to change now as it was already raising a ValueError for this case in released versions of scikit-learn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I guess so...

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found another problem:

@jeremiedbb
Copy link
Member Author

I added tests for the new default value of test_size for all these classes/functions.

@ogrisel
Copy link
Member

ogrisel commented Mar 24, 2019

@jeremiedbb the circle ci status come from your account instead of scikit-learn's. Therefore I cannot restart those to check if the sphinx-gallery error can go away with a rebuild.

@jeremiedbb
Copy link
Member Author

I changed it and reran the CI.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeremiedbb!

@jnothman jnothman merged commit 358c692 into scikit-learn:master Mar 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants