You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Pipeline object does not have a get_feature_names method. Is this intentional?
A get_feature_names method is useful when dealing with parallel feature extraction like in this blog post or in the short example below:
from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import make_pipeline
from sklearn.base import TransformerMixin
class WordFeatureExtractor(TransformerMixin):
def transform(self, X, **transform_params):
return [dict(self._extract_features(data.items())) for data in X]
def fit(self, X, y=None, **fit_params):
return self
def _extract_features(self, key_value_pair):
for key, value in key_value_pair:
if isinstance(value, basestring):
for word in value.split():
yield '%s.word=%s' % (key, word), True
feature_extractor = FeatureUnion([
('dict', DictVectorizer()),
('word', make_pipeline(
WordFeatureExtractor(),
DictVectorizer())
)
])
feature_extractor.fit_transform([{'number': 123, 'text': 'foo bar'}, {'number': 321}])
feature_extractor.get_feature_names()
The above code example results in AttributeError: Transformer word does not provide get_feature_names.
I would be happy to create a pullrequest is someone who has an overview of the project could verify that this functionality is actually wanted. Looking at Pipeline source, it looks like we only need 3 lines of code:
Please search for prior issues before creating new ones. See #2007.
It is more nuanced than your suggestion. We will be implementing something along the lines of #2007 (comment), as per #6372 for another transformer. You're welcome to have a go at the implementation and submit a PR (eventually with tests, docs, etc.).
The Pipeline object does not have a
get_feature_names
method. Is this intentional?A
get_feature_names
method is useful when dealing with parallel feature extraction like in this blog post or in the short example below:The above code example results in
AttributeError: Transformer word does not provide get_feature_names.
I would be happy to create a pullrequest is someone who has an overview of the project could verify that this functionality is actually wanted. Looking at Pipeline source, it looks like we only need 3 lines of code:
The text was updated successfully, but these errors were encountered: