Skip to content

MRG add n_features_out_ attribute #14241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 32 commits into from

Conversation

amueller
Copy link
Member

@amueller amueller commented Jul 2, 2019

This should be easier once we have an n_features_in_ attribute #13603, and use a OneToOneMixin or something like that.

@amueller
Copy link
Member Author

amueller commented Jul 2, 2019

Does this look good, and do we want it like this?
right now the implementation via the property is a bit weird because it means if someone is inheriting from BaseEstimator they might get an error if they don't implement any of the things that tell us the number of output features.
So it adds a weird thing to the API contract for 3rd parties that says "if we can't figure out how to get the number of output dimensions you have to define _n_features_out.

I also don't like that we can't just overwrite the property :-/

@amueller amueller changed the title WIP add n_features_out_ attribute RFC add n_features_out_ attribute Jul 2, 2019
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't like that we can't just overwrite the property :-/

Why not provide a setter??

@n_features_out_.setter
def _(self, val):
    self._n_features_out = val

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't like that we can't just overwrite the property :-/

Why not provide a setter??

@n_features_out_.setter
def _(self, val):
    self._n_features_out = val

@@ -558,6 +558,35 @@ def fit_transform(self, X, y=None, **fit_params):
# fit method of arity 2 (supervised transformation)
return self.fit(X, y, **fit_params).transform(X)

@property
def n_features_out_(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I like this magic determination. I'd rather it be done by specialised mixins for decomposition and feature selection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I could do that.
I have to check how much of these are actually in the decomposition module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I prefer having these be mixins or base classes. It seems unlikely you want to mix those and base classes make the code shorter.

@amueller
Copy link
Member Author

Ok so I did everything with mixins. That seems a bit verbose right now, but I'm pretty sure once we add feature names in some way, this will pay off.
I'm not entirely certain if we should make the mixin public. We could leave it for now but by the time the release rolls around we need to have a plan. But I expect a couple of things will change before then.

I'm really not sure why we'd use mixins here instead of base classes tbh. I would rather use base-classes

@amueller amueller changed the title RFC add n_features_out_ attribute MRG add n_features_out_ attribute Jul 30, 2019
@amueller
Copy link
Member Author

@adrinjalali can you please review this? I think having this will be helpful for feature names, and this one is actually one of the easier parts.

sklearn/base.py Outdated
@property
def n_features_out_(self):
if not hasattr(self, 'transform'):
raise AttributeError("{} doesn't have n_features_out_"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was working on feature_names_out_, I realized you could think of the feature_names_out_ of classifier xxx, as xxx0 to xxx{n-1} for a multiclass classification for instance. That would also make sense in the context of a stacking estimator. What are the feature_names_in_ of a stacking estimator if we're talking about interpreting the model? Should it not be {classifier_{i}_{c} | for i in classifiers and c in classes}, for instance?

Although you could implement a stacking estimator also with a meta-estimator which defines transform and returns the output of the estimator, do the union of those, then have a model on top of that. Then the problem's solved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure what your point is. This is for the ClusterMixin.
And I thought we only wanted to define feature names out for transformers.
Stacking should either be a transformer or a meta-estimator.
Also check what this PR does for VotingClassifier btw. But I'm really not sure I follow.

@adrinjalali adrinjalali self-assigned this Jul 31, 2019
n_features = self.n_components
elif hasattr(self, 'components_'):
n_features = self.components_.shape[0]
return n_features
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a "default" value here? Maybe None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here? This is for ComponentsMixin. If it doesn't have components it probably shouldn't have the ComponentsMixin.

In general: the route I'm taking here is to require the user to set it to None and error otherwise in the tests. That allowed me to test that it's actually implemented.
What we want to enforce for third-party estimators is a slightly different discussion. This PR currently adds new requirements in check_estimator.

@adrinjalali
Copy link
Member

adrinjalali commented Aug 15, 2019 via email

@amueller
Copy link
Member Author

@jnothman thoughts?

@adrinjalali
Copy link
Member

@amueller merge conflicts in the meantime.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller, this is looking pretty good, but you wrote that "This PR implements it for all estimators but FunctionTransformer, which sets it explicitly to None."... but it doesn't include vectorizers.

It also doesn't add n_features_out_ for Pipeline, FeatureUnion, GridSearchCV, etc., as far as I can see.

class ComponentsMixin:
@property
def n_features_out_(self):
if hasattr(self, 'n_components_'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider deprecating n_components_, given the availability of n_features_out_??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider it ;)

# Conflicts:
#	sklearn/cluster/birch.py
#	sklearn/decomposition/base.py
#	sklearn/decomposition/factor_analysis.py
#	sklearn/decomposition/fastica_.py
#	sklearn/decomposition/kernel_pca.py
#	sklearn/decomposition/nmf.py
#	sklearn/decomposition/online_lda.py
#	sklearn/decomposition/sparse_pca.py
#	sklearn/decomposition/truncated_svd.py
#	sklearn/kernel_approximation.py
#	sklearn/manifold/isomap.py
#	sklearn/manifold/locally_linear.py
#	sklearn/neighbors/nca.py
#	sklearn/neural_network/rbm.py
#	sklearn/preprocessing/data.py
#	sklearn/random_projection.py
@amueller
Copy link
Member Author

amueller commented Sep 9, 2019

updated to reflect changes from #14812.

@amueller
Copy link
Member Author

amueller commented Sep 9, 2019

It also doesn't add n_features_out_ for Pipeline, FeatureUnion, GridSearchCV, etc., as far as I can see.

Correct, cause they are not covered in common tests :(

@amueller
Copy link
Member Author

amueller commented Sep 9, 2019

Should I add this to meta-estimators in this PR as well?

@jnothman
Copy link
Member

jnothman commented Sep 9, 2019 via email

@amueller
Copy link
Member Author

amueller commented Sep 10, 2019

@jnothman they are tested in #9741 now ;) - at least somewhat.

And yes, I agree that we want the attributes. I wasn't sure if I should add them here, but happy to do so.

def n_features_out_(self):
if hasattr(self, 'n_components_'):
# n_components could be auto or None
# this is more likely to be an int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also include the isinstance(..., numbers.Integral) check here to be sure.

@adrinjalali
Copy link
Member

I guess since we're doing a SLEP for n_features_in_, we should also do one for this one.

@amueller
Copy link
Member Author

@adrinjalali we could. Or we could do one for both? The issues are basically the same. The API is pretty easy and obvious, I think the only questions are about backward compatibility of check_estimator.

@amueller
Copy link
Member Author

still needs pipeline (and XSearchCV? And CountVectorizer?).
Pipeline is a bit strange in that we need to find the last step that's not passthrough and if all of them are then we need n_features_in_...

@cmarmo cmarmo added the Needs Decision Requires decision label Sep 8, 2020
Base automatically changed from master to main January 22, 2021 10:51
@amueller
Copy link
Member Author

Closing as it might be less useful since we have get_feature_names_out now

@amueller amueller closed this Jul 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision Requires decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants