Skip to content

StratifiedKFold with shuffle is not reproducible in 0.19. #10274

@canard0328

Description

@canard0328

Description

The result of StratifiedKFold with shuffle in 0.19 differ from that in 0.18.

Code to Reproduce

from sklearn.utils import shuffle
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold

iris = load_iris()
X, y = shuffle(iris.data, iris.target, random_state=0)

clf = LinearSVC(random_state=0)

cv = StratifiedKFold()
print(cross_val_score(clf, X, y, cv=cv).mean())
cv = StratifiedKFold(shuffle=True, random_state=0)
print(cross_val_score(clf, X, y, cv=cv).mean())

Results of 0.18.2

0.959967320261
0.960375816993

Results of 0.19.1

0.959967320261
0.966503267974

Versions

Windows-10-10.0.16299-SP0
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.18.2

and

Windows-10-10.0.16299-SP0
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.1

An idea

I think the difference come from below:

model_selection._split.py (0.18.2)

class StratifiedKFold(_BaseKFold):
    ...
    def _make_test_folds(self, X, y=None, groups=None):
        if self.shuffle:
            rng = check_random_state(self.random_state)
        else:
            rng = self.random_state
    ...

model_selection._split.py (0.19.1)

class StratifiedKFold(_BaseKFold):
    ...
    def _make_test_folds(self, X, y=None):
        rng = self.random_state
    ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions