Pipeline object does not have a `get_feature_names` method - intentional? #6421

elgehelge · 2016-02-22T15:35:57Z

The Pipeline object does not have a get_feature_names method. Is this intentional?

A get_feature_names method is useful when dealing with parallel feature extraction like in this blog post or in the short example below:

from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import make_pipeline
from sklearn.base import TransformerMixin

class WordFeatureExtractor(TransformerMixin):
    def transform(self, X, **transform_params):
        return [dict(self._extract_features(data.items())) for data in X]

    def fit(self, X, y=None, **fit_params):
        return self

    def _extract_features(self, key_value_pair):
        for key, value in key_value_pair:
            if isinstance(value, basestring):
                for word in value.split():
                    yield '%s.word=%s' % (key, word), True

feature_extractor = FeatureUnion([
    ('dict', DictVectorizer()),
    ('word', make_pipeline(
        WordFeatureExtractor(),
        DictVectorizer())
    )
])

feature_extractor.fit_transform([{'number': 123, 'text': 'foo bar'}, {'number': 321}])
feature_extractor.get_feature_names()

The above code example results in AttributeError: Transformer word does not provide get_feature_names.

I would be happy to create a pullrequest is someone who has an overview of the project could verify that this functionality is actually wanted. Looking at Pipeline source, it looks like we only need 3 lines of code:

    @if_delegate_has_method(delegate='_final_estimator')
    def get_feature_names(self):
        return self.steps[-1][-1].get_feature_names()

The text was updated successfully, but these errors were encountered:

jnothman · 2016-02-22T22:45:41Z

Please search for prior issues before creating new ones. See #2007.

It is more nuanced than your suggestion. We will be implementing something along the lines of #2007 (comment), as per #6372 for another transformer. You're welcome to have a go at the implementation and submit a PR (eventually with tests, docs, etc.).

jnothman closed this as completed Feb 22, 2016

jnothman mentioned this issue Feb 23, 2016

RFC generalised Pipeline.get_feature_names #6424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline object does not have a `get_feature_names` method - intentional? #6421

Pipeline object does not have a `get_feature_names` method - intentional? #6421

elgehelge commented Feb 22, 2016

jnothman commented Feb 22, 2016

Pipeline object does not have a get_feature_names method - intentional? #6421

Pipeline object does not have a get_feature_names method - intentional? #6421

Comments

elgehelge commented Feb 22, 2016

jnothman commented Feb 22, 2016

Pipeline object does not have a `get_feature_names` method - intentional? #6421

Pipeline object does not have a `get_feature_names` method - intentional? #6421