FeatureSelector for Pipeline (New Feature)

Hi,
I was wondering if it would be worthwhile to add a simple `FeatureSelector` class that can be used in scikit's `Pipeline`. For example if a user wants to select particular columns (useful for cross-validation for example) and compare it against feature selection techniques etc. 

E.g., 

```
clf1 = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('reduce_dim', FeatureSelector(cols=(1,3))),    # select feature 2 and 4
    ('classifier', GaussianNB())   
    ]) 

clf2 = Pipeline(steps=[
    ('scaler', StandardScaler()),    
    ('reduce_dim', PCA(n_components=2)),
    ('classifier', GaussianNB())   
    ])

...
```

The code could be as simple as:

```
import numpy as np

class FeatureSelector(object):

    def __init__(self, cols):
        self.cols = cols

    def transform(self, X, y=None):
        col_list = []
        for c in self.cols:
            col_list.append(X[:, c:c+1])
        return np.concatenate(col_list, axis=1)

    def fit(self, X, y=None):
        return self
```

PS: How would I add a label to this issue track (I read in the scikit doc. that labels are recommended)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FeatureSelector for Pipeline (New Feature) #3560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

FeatureSelector for Pipeline (New Feature) #3560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions