Skip to content

RFC generalised Pipeline.get_feature_names #6424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Feb 23, 2016 · 4 comments
Closed

RFC generalised Pipeline.get_feature_names #6424

jnothman opened this issue Feb 23, 2016 · 4 comments

Comments

@jnothman
Copy link
Member

There has been some demand for Pipeline.get_feature_names (#2007, #5172, #6421) for the case where the last element in the pipeline is a feature extractor. Following on from #6372, we instead shall make get_feature_names able to transform some list of input features in the general case. I propose the following behaviour:

  1. Pipeline.get_feature_names may be called with a list input_features as an argument only if all its estimators support get_feature_names with an argument. The initial input_features is transformed iteratively through the estimators.
  2. Pipeline.get_feature_names may be called without an argument only if a suffix of its estimators support get_feature_names. The first of that suffix may or may not accept input_features, and the remainder must accept input_features; the output of the first get_feature_names call is iteratively modified by downstream transformers' get_feature_names.
    • To be cautious until we find a use-case otherwise, get_feature_names will not be supported in the case that get_feature_names is available before (but not adjacent to) that suffix.
  3. Otherwise, a ValueError is raised. Or: should the attribute become invisible, as for predict et al.?
@amueller
Copy link
Member

agreed on 1) and 2).
For three: maybe an AttriubuteError: the last step has no get_feature_names

@jnothman
Copy link
Member Author

Do you mean an AttributeError if the last step has no get_feature_names? The problem with the AttributeError is that the definition currently allows for get_feature_names that does not take an argument. Testing for this when doing the attribute lookup is fairly heavy. (Though I suspect that we will require get_feature_names to take an argument, even if unused, in any estimator where the pipeline functionality is sought.)

@amueller
Copy link
Member

Ah, I didn't think about that. But these are two different errors, right? one is there is no post-fix with get_features_names and the other is feature_names was passed and there is no post-fix that takes feature_names.

@thomasjpfan
Copy link
Member

Now that we released Pipeline.get_feature_names_out in 1.0, I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants