-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MRG add n_features_out_ attribute #14241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does this look good, and do we want it like this? I also don't like that we can't just overwrite the property :-/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like that we can't just overwrite the property :-/
Why not provide a setter??
@n_features_out_.setter
def _(self, val):
self._n_features_out = val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like that we can't just overwrite the property :-/
Why not provide a setter??
@n_features_out_.setter
def _(self, val):
self._n_features_out = val
@@ -558,6 +558,35 @@ def fit_transform(self, X, y=None, **fit_params): | |||
# fit method of arity 2 (supervised transformation) | |||
return self.fit(X, y, **fit_params).transform(X) | |||
|
|||
@property | |||
def n_features_out_(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I like this magic determination. I'd rather it be done by specialised mixins for decomposition and feature selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I could do that.
I have to check how much of these are actually in the decomposition
module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I prefer having these be mixins or base classes. It seems unlikely you want to mix those and base classes make the code shorter.
# Conflicts: # sklearn/impute/_base.py
# Conflicts: # sklearn/utils/estimator_checks.py
Ok so I did everything with mixins. That seems a bit verbose right now, but I'm pretty sure once we add feature names in some way, this will pay off. I'm really not sure why we'd use mixins here instead of base classes tbh. I would rather use base-classes |
@adrinjalali can you please review this? I think having this will be helpful for feature names, and this one is actually one of the easier parts. |
sklearn/base.py
Outdated
@property | ||
def n_features_out_(self): | ||
if not hasattr(self, 'transform'): | ||
raise AttributeError("{} doesn't have n_features_out_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was working on feature_names_out_
, I realized you could think of the feature_names_out_
of classifier xxx, as xxx0
to xxx{n-1}
for a multiclass classification for instance. That would also make sense in the context of a stacking estimator. What are the feature_names_in_
of a stacking estimator if we're talking about interpreting the model? Should it not be {classifier_{i}_{c} | for i in classifiers and c in classes}
, for instance?
Although you could implement a stacking estimator also with a meta-estimator which defines transform
and returns the output of the estimator, do the union of those, then have a model on top of that. Then the problem's solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure what your point is. This is for the ClusterMixin.
And I thought we only wanted to define feature names out for transformers.
Stacking should either be a transformer or a meta-estimator.
Also check what this PR does for VotingClassifier
btw. But I'm really not sure I follow.
n_features = self.n_components | ||
elif hasattr(self, 'components_'): | ||
n_features = self.components_.shape[0] | ||
return n_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a "default" value here? Maybe None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here? This is for ComponentsMixin
. If it doesn't have components
it probably shouldn't have the ComponentsMixin
.
In general: the route I'm taking here is to require the user to set it to None
and error otherwise in the tests. That allowed me to test that it's actually implemented.
What we want to enforce for third-party estimators is a slightly different discussion. This PR currently adds new requirements in check_estimator
.
Yes, I'm happy with this now :)
…On Thu., Aug. 15, 2019, 19:08 Andreas Mueller, ***@***.***> wrote:
So is that approval from you @adrinjalali <https://github.com/adrinjalali>
? ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14241?email_source=notifications&email_token=AAMWG6F5RT6HZ2G53TZ7K7TQEWEPFA5CNFSM4H474WSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4MMV4A#issuecomment-521718512>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMWG6BTT5E2OPNA6EFIGULQEWEPFANCNFSM4H474WSA>
.
|
@jnothman thoughts? |
@amueller merge conflicts in the meantime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller, this is looking pretty good, but you wrote that "This PR implements it for all estimators but FunctionTransformer, which sets it explicitly to None."... but it doesn't include vectorizers.
It also doesn't add n_features_out_
for Pipeline, FeatureUnion, GridSearchCV, etc., as far as I can see.
class ComponentsMixin: | ||
@property | ||
def n_features_out_(self): | ||
if hasattr(self, 'n_components_'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider deprecating n_components_
, given the availability of n_features_out_
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider it ;)
# Conflicts: # sklearn/cluster/birch.py # sklearn/decomposition/base.py # sklearn/decomposition/factor_analysis.py # sklearn/decomposition/fastica_.py # sklearn/decomposition/kernel_pca.py # sklearn/decomposition/nmf.py # sklearn/decomposition/online_lda.py # sklearn/decomposition/sparse_pca.py # sklearn/decomposition/truncated_svd.py # sklearn/kernel_approximation.py # sklearn/manifold/isomap.py # sklearn/manifold/locally_linear.py # sklearn/neighbors/nca.py # sklearn/neural_network/rbm.py # sklearn/preprocessing/data.py # sklearn/random_projection.py
updated to reflect changes from #14812. |
Correct, cause they are not covered in common tests :( |
Should I add this to meta-estimators in this PR as well? |
I think if we are certain we want the attribute on meta-estimators, it
makes sense to do them here.
I think even if those estimators are tested, they're not tested as
transformers... and GridSearchCV vary rarely would be used as a transformer.
|
def n_features_out_(self): | ||
if hasattr(self, 'n_components_'): | ||
# n_components could be auto or None | ||
# this is more likely to be an int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also include the isinstance(..., numbers.Integral)
check here to be sure.
I guess since we're doing a SLEP for |
@adrinjalali we could. Or we could do one for both? The issues are basically the same. The API is pretty easy and obvious, I think the only questions are about backward compatibility of |
still needs pipeline (and XSearchCV? And CountVectorizer?). |
Closing as it might be less useful since we have get_feature_names_out now |
This should be easier once we have an
n_features_in_
attribute #13603, and use aOneToOneMixin
or something like that.