Skip to content

Nested CV of LeaveOneGroupOut fails in permutation_test_score #8127

@oesteban

Description

@oesteban

Description

Nested CV of LeaveOneGroupOut fails in permutation_test_score

Steps/Code to Reproduce

from sklearn.model_selection import LeaveOneGroupOut, GridSearchCV, permutation_test_score

folds_groups = ['group1', 'group2', 'group3']
clf = GridSearchCV(clf_object, clf_params, cv=LeaveOneGroupOut())
perm_res = permutation_test_score(
    clf, sample_x, labels_y, scoring='accuracy', cv=LeaveOneGroupOut(),
    n_permutations=5000, groups=folds_groups,
    n_jobs=-1)

Expected Results

N being the total number of groups, the inner cross validation trains on all possible combinations of N-2 groups and validates in one group. Then the outer permutation test loop trains the best classifier of the inner group in N-1 groups and runs the validation in the left-out group.

Actual Results

The inner cv loop is not passed the groups and raises ValueError.

Traceback (most recent call last):
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/bin/mriqc_fit", line 9, in <module>
    load_entry_point('mriqc', 'console_scripts', 'mriqc_fit')()
  File "/home/oesteban/workspace/mriqc/mriqc/classifier/cli.py", line 95, in main
    cvhelper.fit(folds=folds)
  File "/home/oesteban/workspace/mriqc/mriqc/classifier/cv.py", line 181, in fit
    n_jobs=self.n_jobs)
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 609, in permutation_test_score
    score = _permutation_test_score(clone(estimator), X, y, groups, cv, scorer)
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 627, in _permutation_test_score
    estimator.fit(X[train], y[train])
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 945, in fit
    return self._fit(X, y, groups, ParameterGrid(self.param_grid))
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 543, in _fit
    n_splits = cv.get_n_splits(X, y, groups)
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 810, in get_n_splits
    raise ValueError("The groups parameter should not be None")
ValueError: The groups parameter should not be None

Versions

Linux-4.4.0-53-generic-x86_64-with-debian-stretch-sid
('Python', '2.7.11 |Continuum Analytics, Inc.| (default, Dec  6 2015, 18:08:32) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.11.2')
('SciPy', '0.17.0')
('Scikit-Learn', '0.18.1')

Comments

Does it make any sense this nested cross validation scheme? I would appreciate any comments on this end.

Particularly, I have a binary classification problem, with ~1000 samples, split in ~20 groups. Each group has 20-300 samples. I want it to generalize well if an unseen new group (20-300 samples) is received. Does it sound about right?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions