-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Description
Hi,
I was wondering if it would be worthwhile to add a simple FeatureSelector
class that can be used in scikit's Pipeline
. For example if a user wants to select particular columns (useful for cross-validation for example) and compare it against feature selection techniques etc.
E.g.,
clf1 = Pipeline(steps=[
('scaler', StandardScaler()),
('reduce_dim', FeatureSelector(cols=(1,3))), # select feature 2 and 4
('classifier', GaussianNB())
])
clf2 = Pipeline(steps=[
('scaler', StandardScaler()),
('reduce_dim', PCA(n_components=2)),
('classifier', GaussianNB())
])
...
The code could be as simple as:
import numpy as np
class FeatureSelector(object):
def __init__(self, cols):
self.cols = cols
def transform(self, X, y=None):
col_list = []
for c in self.cols:
col_list.append(X[:, c:c+1])
return np.concatenate(col_list, axis=1)
def fit(self, X, y=None):
return self
PS: How would I add a label to this issue track (I read in the scikit doc. that labels are recommended)