-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Description
Description
The result of StratifiedKFold with shuffle in 0.19 differ from that in 0.18.
Code to Reproduce
from sklearn.utils import shuffle
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
iris = load_iris()
X, y = shuffle(iris.data, iris.target, random_state=0)
clf = LinearSVC(random_state=0)
cv = StratifiedKFold()
print(cross_val_score(clf, X, y, cv=cv).mean())
cv = StratifiedKFold(shuffle=True, random_state=0)
print(cross_val_score(clf, X, y, cv=cv).mean())
Results of 0.18.2
0.959967320261
0.960375816993
Results of 0.19.1
0.959967320261
0.966503267974
Versions
Windows-10-10.0.16299-SP0
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.18.2
and
Windows-10-10.0.16299-SP0
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.1
An idea
I think the difference come from below:
model_selection._split.py (0.18.2)
class StratifiedKFold(_BaseKFold):
...
def _make_test_folds(self, X, y=None, groups=None):
if self.shuffle:
rng = check_random_state(self.random_state)
else:
rng = self.random_state
...
model_selection._split.py (0.19.1)
class StratifiedKFold(_BaseKFold):
...
def _make_test_folds(self, X, y=None):
rng = self.random_state
...
Metadata
Metadata
Assignees
Labels
No labels