-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Description
Description
Nested CV of LeaveOneGroupOut fails in permutation_test_score
Steps/Code to Reproduce
from sklearn.model_selection import LeaveOneGroupOut, GridSearchCV, permutation_test_score
folds_groups = ['group1', 'group2', 'group3']
clf = GridSearchCV(clf_object, clf_params, cv=LeaveOneGroupOut())
perm_res = permutation_test_score(
clf, sample_x, labels_y, scoring='accuracy', cv=LeaveOneGroupOut(),
n_permutations=5000, groups=folds_groups,
n_jobs=-1)
Expected Results
N being the total number of groups, the inner cross validation trains on all possible combinations of N-2 groups and validates in one group. Then the outer permutation test loop trains the best classifier of the inner group in N-1 groups and runs the validation in the left-out group.
Actual Results
The inner cv loop is not passed the groups and raises ValueError
.
Traceback (most recent call last):
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/bin/mriqc_fit", line 9, in <module>
load_entry_point('mriqc', 'console_scripts', 'mriqc_fit')()
File "/home/oesteban/workspace/mriqc/mriqc/classifier/cli.py", line 95, in main
cvhelper.fit(folds=folds)
File "/home/oesteban/workspace/mriqc/mriqc/classifier/cv.py", line 181, in fit
n_jobs=self.n_jobs)
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 609, in permutation_test_score
score = _permutation_test_score(clone(estimator), X, y, groups, cv, scorer)
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 627, in _permutation_test_score
estimator.fit(X[train], y[train])
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 945, in fit
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 543, in _fit
n_splits = cv.get_n_splits(X, y, groups)
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 810, in get_n_splits
raise ValueError("The groups parameter should not be None")
ValueError: The groups parameter should not be None
Versions
Linux-4.4.0-53-generic-x86_64-with-debian-stretch-sid
('Python', '2.7.11 |Continuum Analytics, Inc.| (default, Dec 6 2015, 18:08:32) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.11.2')
('SciPy', '0.17.0')
('Scikit-Learn', '0.18.1')
Comments
Does it make any sense this nested cross validation scheme? I would appreciate any comments on this end.
Particularly, I have a binary classification problem, with ~1000 samples, split in ~20 groups. Each group has 20-300 samples. I want it to generalize well if an unseen new group (20-300 samples) is received. Does it sound about right?