Unable to pass splits to SequentialFeatureSelector #25957

krimha · 2023-03-23T20:23:07Z

Describe the bug

This runs fine with e.g. cv=5, but according to the documentation, it should also be able to take an iterable of splits.
However, passing splits from the cross validator fails

Im fairly certain I have done similar things in the past to other classes in scikit-learn requiring a cv parameter.

If somebody could confirm wether this is a bug, or I'm doing something wrong, that would great. Sorry if this is unneeded noise in the feed.

Steps/Code to Reproduce

from sklearn.datasets import make_classification
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import LeaveOneGroupOut

import numpy as np

X, y = make_classification()


groups = np.zeros_like(y, dtype=int)
groups[y.size//2:] = 1

cv = LeaveOneGroupOut()
splits = cv.split(X, y, groups=groups)

clf = KNeighborsClassifier(n_neighbors=5)

seq = SequentialFeatureSelector(clf, n_features_to_select=5, scoring='accuracy', cv=splits)
seq.fit(X, y)

Expected Results

Expected to run without errors

Actual Results

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

[<ipython-input-18-d4c8f5222560>](https://localhost:8080/#) in <module>
     19 
     20 seq = SequentialFeatureSelector(clf, n_features_to_select=5, scoring='accuracy', cv=splits)
---> 21 seq.fit(X, y)

4 frames

[/usr/local/lib/python3.9/dist-packages/sklearn/model_selection/_validation.py](https://localhost:8080/#) in _aggregate_score_dicts(scores)
   1928         if isinstance(scores[0][key], numbers.Number)
   1929         else [score[key] for score in scores]
-> 1930         for key in scores[0]
   1931     }

IndexError: list index out of range

Versions

1.2.2

The text was updated successfully, but these errors were encountered:

glemaitre · 2023-03-23T22:17:36Z

The internal algorithm will use the cv parameter in a for loop. If cv is a generator, it will be consumed at the first iteration only. Later it trigger the error because we did not complete the other iteration of the for loop.

Passing a list (e.g. cv=list(splits)) will solve the problem because we can reuse it.

I think that there is no obvious way to make a clone of the generator. Instead, I think that the best solution would be to alternate the documentation and mention that the iterable need to be a list and not a generator.

krimha · 2023-03-24T06:04:29Z

Thank you! Passing a list works. Updating the documentation seems like a good idea.

Charlie-XIAO · 2023-03-24T13:17:44Z

Hi, is anyone working on updating the documentation? If not I'm willing to do that. It should be an API documentation for the ·SequentialFeatureSelector· class right? For instance, add

NOTE that when using an iterable, it should not be a generator.

By the way, is it better to also add something to _parameter_constraints? Though that may involve modifying _CVObjects or create another class such as _CVObjectsNotGenerator and use something like inspect.isgenerator to make the check.

glemaitre · 2023-03-24T16:16:42Z

Thinking a bit more about it, we could call check_cv on self.cv and transform it into a list if the output is a generator. We should still document it since it will take more memory but we would be consistent with other cv objects.

Charlie-XIAO · 2023-03-25T00:55:17Z

/take

Charlie-XIAO · 2023-03-25T01:05:56Z

@glemaitre Just to make sure: should we

note that a generator is accepted but not recommended
call check_cv to transform self.cv into a list if it is a generator
create nonregression test to make sure no exception would occur in this case

or

note that a generator is not accepted
do not do any modification to the code

adrinjalali · 2023-03-25T10:26:36Z

We don't need a warning, check_cv already accepts an iterable, and we don't warn on other classes such as GridSearchCV. The accepted values and the docstring of cv should be exactly the same as *SearchCV classes.

Charlie-XIAO · 2023-03-25T12:35:31Z

Okay I understand, thanks for your explanation.

krimha added Bug Needs Triage Issue requires triage labels Mar 23, 2023

glemaitre added Documentation and removed Bug Needs Triage Issue requires triage labels Mar 23, 2023

github-actions bot assigned Charlie-XIAO Mar 25, 2023

Charlie-XIAO mentioned this issue Mar 25, 2023

FIX SequentialFeatureSelector throws IndexError when cv is a generator #25973

Merged

thomasjpfan closed this as completed in #25973 Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to pass splits to SequentialFeatureSelector #25957

Unable to pass splits to SequentialFeatureSelector #25957

krimha commented Mar 23, 2023

glemaitre commented Mar 23, 2023

krimha commented Mar 24, 2023

Charlie-XIAO commented Mar 24, 2023 •

edited

Loading

glemaitre commented Mar 24, 2023

Charlie-XIAO commented Mar 25, 2023

Charlie-XIAO commented Mar 25, 2023 •

edited

Loading

adrinjalali commented Mar 25, 2023

Charlie-XIAO commented Mar 25, 2023

Unable to pass splits to SequentialFeatureSelector #25957

Unable to pass splits to SequentialFeatureSelector #25957

Comments

krimha commented Mar 23, 2023

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented Mar 23, 2023

krimha commented Mar 24, 2023

Charlie-XIAO commented Mar 24, 2023 • edited Loading

glemaitre commented Mar 24, 2023

Charlie-XIAO commented Mar 25, 2023

Charlie-XIAO commented Mar 25, 2023 • edited Loading

adrinjalali commented Mar 25, 2023

Charlie-XIAO commented Mar 25, 2023

Charlie-XIAO commented Mar 24, 2023 •

edited

Loading

Charlie-XIAO commented Mar 25, 2023 •

edited

Loading