Skip to content

Pipeline object does not have a get_feature_names method - intentional? #6421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elgehelge opened this issue Feb 22, 2016 · 1 comment
Closed

Comments

@elgehelge
Copy link

The Pipeline object does not have a get_feature_names method. Is this intentional?

A get_feature_names method is useful when dealing with parallel feature extraction like in this blog post or in the short example below:

from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import make_pipeline
from sklearn.base import TransformerMixin

class WordFeatureExtractor(TransformerMixin):
    def transform(self, X, **transform_params):
        return [dict(self._extract_features(data.items())) for data in X]

    def fit(self, X, y=None, **fit_params):
        return self

    def _extract_features(self, key_value_pair):
        for key, value in key_value_pair:
            if isinstance(value, basestring):
                for word in value.split():
                    yield '%s.word=%s' % (key, word), True

feature_extractor = FeatureUnion([
    ('dict', DictVectorizer()),
    ('word', make_pipeline(
        WordFeatureExtractor(),
        DictVectorizer())
    )
])

feature_extractor.fit_transform([{'number': 123, 'text': 'foo bar'}, {'number': 321}])
feature_extractor.get_feature_names()

The above code example results in AttributeError: Transformer word does not provide get_feature_names.

I would be happy to create a pullrequest is someone who has an overview of the project could verify that this functionality is actually wanted. Looking at Pipeline source, it looks like we only need 3 lines of code:

    @if_delegate_has_method(delegate='_final_estimator')
    def get_feature_names(self):
        return self.steps[-1][-1].get_feature_names()
@jnothman
Copy link
Member

Please search for prior issues before creating new ones. See #2007.

It is more nuanced than your suggestion. We will be implementing something along the lines of #2007 (comment), as per #6372 for another transformer. You're welcome to have a go at the implementation and submit a PR (eventually with tests, docs, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants