MRG add n_features_out_ attribute #14241

amueller · 2019-07-02T21:19:38Z

This should be easier once we have an n_features_in_ attribute #13603, and use a OneToOneMixin or something like that.

amueller · 2019-07-02T21:28:10Z

Does this look good, and do we want it like this?
right now the implementation via the property is a bit weird because it means if someone is inheriting from BaseEstimator they might get an error if they don't implement any of the things that tell us the number of output features.
So it adds a weird thing to the API contract for 3rd parties that says "if we can't figure out how to get the number of output dimensions you have to define _n_features_out.

I also don't like that we can't just overwrite the property :-/

jnothman

I also don't like that we can't just overwrite the property :-/

Why not provide a setter??

@n_features_out_.setter
def _(self, val):
    self._n_features_out = val

jnothman

I also don't like that we can't just overwrite the property :-/

Why not provide a setter??

@n_features_out_.setter
def _(self, val):
    self._n_features_out = val

jnothman · 2019-07-02T23:35:06Z

sklearn/base.py

@@ -558,6 +558,35 @@ def fit_transform(self, X, y=None, **fit_params):
            # fit method of arity 2 (supervised transformation)
            return self.fit(X, y, **fit_params).transform(X)

+    @property
+    def n_features_out_(self):


I can't say I like this magic determination. I'd rather it be done by specialised mixins for decomposition and feature selection.

Sure, I could do that.
I have to check how much of these are actually in the decomposition module

I'm not sure if I prefer having these be mixins or base classes. It seems unlikely you want to mix those and base classes make the code shorter.

# Conflicts: # sklearn/impute/_base.py

# Conflicts: # sklearn/utils/estimator_checks.py

amueller · 2019-07-30T19:01:37Z

Ok so I did everything with mixins. That seems a bit verbose right now, but I'm pretty sure once we add feature names in some way, this will pay off.
I'm not entirely certain if we should make the mixin public. We could leave it for now but by the time the release rolls around we need to have a plan. But I expect a couple of things will change before then.

I'm really not sure why we'd use mixins here instead of base classes tbh. I would rather use base-classes

amueller · 2019-07-30T19:10:45Z

@adrinjalali can you please review this? I think having this will be helpful for feature names, and this one is actually one of the easier parts.

adrinjalali · 2019-07-31T12:48:47Z

sklearn/base.py

+    @property
+    def n_features_out_(self):
+        if not hasattr(self, 'transform'):
+            raise AttributeError("{} doesn't have n_features_out_"


When I was working on feature_names_out_, I realized you could think of the feature_names_out_ of classifier xxx, as xxx0 to xxx{n-1} for a multiclass classification for instance. That would also make sense in the context of a stacking estimator. What are the feature_names_in_ of a stacking estimator if we're talking about interpreting the model? Should it not be {classifier_{i}_{c} | for i in classifiers and c in classes}, for instance?

Although you could implement a stacking estimator also with a meta-estimator which defines transform and returns the output of the estimator, do the union of those, then have a model on top of that. Then the problem's solved.

Sorry, I'm not sure what your point is. This is for the ClusterMixin.
And I thought we only wanted to define feature names out for transformers.
Stacking should either be a transformer or a meta-estimator.
Also check what this PR does for VotingClassifier btw. But I'm really not sure I follow.

sklearn/decomposition/dict_learning.py

adrinjalali · 2019-07-31T13:15:55Z

sklearn/base.py

+            n_features = self.n_components
+        elif hasattr(self, 'components_'):
+            n_features = self.components_.shape[0]
+        return n_features


should there be a "default" value here? Maybe None?

Here? This is for ComponentsMixin. If it doesn't have components it probably shouldn't have the ComponentsMixin.

In general: the route I'm taking here is to require the user to set it to None and error otherwise in the tests. That allowed me to test that it's actually implemented.
What we want to enforce for third-party estimators is a slightly different discussion. This PR currently adds new requirements in check_estimator.

adrinjalali · 2019-08-15T20:16:09Z

Yes, I'm happy with this now :)

…

On Thu., Aug. 15, 2019, 19:08 Andreas Mueller, ***@***.***> wrote: So is that approval from you @adrinjalali <https://github.com/adrinjalali> ? ;) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14241?email_source=notifications&email_token=AAMWG6F5RT6HZ2G53TZ7K7TQEWEPFA5CNFSM4H474WSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4MMV4A#issuecomment-521718512>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMWG6BTT5E2OPNA6EFIGULQEWEPFANCNFSM4H474WSA> .

amueller · 2019-08-21T19:07:10Z

@jnothman thoughts?

adrinjalali · 2019-09-09T13:28:44Z

@amueller merge conflicts in the meantime.

jnothman

@amueller, this is looking pretty good, but you wrote that "This PR implements it for all estimators but FunctionTransformer, which sets it explicitly to None."... but it doesn't include vectorizers.

It also doesn't add n_features_out_ for Pipeline, FeatureUnion, GridSearchCV, etc., as far as I can see.

sklearn/base.py

jnothman · 2019-09-09T13:54:12Z

sklearn/base.py

+class ComponentsMixin:
+    @property
+    def n_features_out_(self):
+        if hasattr(self, 'n_components_'):


Should we consider deprecating n_components_, given the availability of n_features_out_??

I would consider it ;)

sklearn/utils/estimator_checks.py

# Conflicts: # sklearn/cluster/birch.py # sklearn/decomposition/base.py # sklearn/decomposition/factor_analysis.py # sklearn/decomposition/fastica_.py # sklearn/decomposition/kernel_pca.py # sklearn/decomposition/nmf.py # sklearn/decomposition/online_lda.py # sklearn/decomposition/sparse_pca.py # sklearn/decomposition/truncated_svd.py # sklearn/kernel_approximation.py # sklearn/manifold/isomap.py # sklearn/manifold/locally_linear.py # sklearn/neighbors/nca.py # sklearn/neural_network/rbm.py # sklearn/preprocessing/data.py # sklearn/random_projection.py

amueller · 2019-09-09T15:00:21Z

updated to reflect changes from #14812.

amueller · 2019-09-09T15:01:12Z

It also doesn't add n_features_out_ for Pipeline, FeatureUnion, GridSearchCV, etc., as far as I can see.

Correct, cause they are not covered in common tests :(

amueller · 2019-09-09T16:01:59Z

Should I add this to meta-estimators in this PR as well?

jnothman · 2019-09-09T21:00:11Z

I think if we are certain we want the attribute on meta-estimators, it makes sense to do them here. I think even if those estimators are tested, they're not tested as transformers... and GridSearchCV vary rarely would be used as a transformer.

amueller · 2019-09-10T16:09:20Z

@jnothman they are tested in #9741 now ;) - at least somewhat.

And yes, I agree that we want the attributes. I wasn't sure if I should add them here, but happy to do so.

thomasjpfan · 2019-09-13T19:45:29Z

sklearn/base.py

+    def n_features_out_(self):
+        if hasattr(self, 'n_components_'):
+            # n_components could be auto or None
+            # this is more likely to be an int


We can also include the isinstance(..., numbers.Integral) check here to be sure.

adrinjalali · 2019-09-23T17:46:22Z

I guess since we're doing a SLEP for n_features_in_, we should also do one for this one.

amueller · 2019-09-23T18:30:03Z

@adrinjalali we could. Or we could do one for both? The issues are basically the same. The API is pretty easy and obvious, I think the only questions are about backward compatibility of check_estimator.

amueller · 2019-09-24T21:44:41Z

still needs pipeline (and XSearchCV? And CountVectorizer?).
Pipeline is a bit strange in that we need to find the last step that's not passthrough and if all of them are then we need n_features_in_...

amueller · 2022-07-17T15:33:29Z

Closing as it might be less useful since we have get_feature_names_out now

amueller added 4 commits July 2, 2019 16:51

start on n_features_out_

c47227d

make sure common tests for transformers respect pairwise

06b4a08

fix number of features in quantile transformer

b13b57e

only check n_features_out_ if it's not None?

ac8d243

amueller changed the title ~~WIP add n_features_out_ attribute~~ RFC add n_features_out_ attribute Jul 2, 2019

jnothman reviewed Jul 2, 2019

View reviewed changes

amueller mentioned this pull request Jul 30, 2019

feature names - NamedArray #14315

Closed

amueller added 12 commits July 30, 2019 13:46

provide setter for n_features_out_

822dae6

Merge branch 'master' into n_features_out

211ebd5

# Conflicts: # sklearn/impute/_base.py

Merge branch 'master' into n_features_out

2933b8b

# Conflicts: # sklearn/utils/estimator_checks.py

typo

42e5017

more typos

124e325

fix some input validation

041dfff

move feature selection n_features_out_ to mixin

c1d47a1

remove linear discriminant analysis special case

7735d18

remove special case for clustering

aef2283

I have no idea how this passed?!

6a56572

remove scaler special case, fix in imputation

5c267ce

removed the last bit of magic

9a2e80c

amueller changed the title ~~RFC add n_features_out_ attribute~~ MRG add n_features_out_ attribute Jul 30, 2019

pep8

375c130

add n_features_out_ to voting classifier

0c2bb8a

adrinjalali reviewed Jul 31, 2019

View reviewed changes

adrinjalali self-assigned this Jul 31, 2019

adrinjalali reviewed Jul 31, 2019

View reviewed changes

sklearn/decomposition/dict_learning.py Show resolved Hide resolved

adrinjalali reviewed Jul 31, 2019

View reviewed changes

adrinjalali approved these changes Aug 16, 2019

View reviewed changes

jnothman reviewed Sep 9, 2019

View reviewed changes

amueller added 2 commits September 9, 2019 11:54

explitictly set n_features_out_ in clustering

344d01e

add n_features_out_ to knnimputer

df82f64

amueller mentioned this pull request Sep 9, 2019

[MRG] run check_estimator on meta-estimators #9741

Closed

thomasjpfan reviewed Sep 13, 2019

View reviewed changes

amueller added 4 commits September 24, 2019 15:08

Merge branch 'master' into n_features_out

816b677

add n_features_out_ to neighbors transformers

dcba760

add n_features_out_ to feature union, add test

d3b02e4

add n_features_out_ to ColumnTransformer and DictVectorizer

e02d118

amueller added 3 commits September 25, 2019 17:29

add n_features_out_ to stacking regressor and stacking classifier

c8db102

add _last_non_passthrough_estimator

4b539af

add n_features_out_ to pipeline

43e9c96

adrinjalali mentioned this pull request Feb 12, 2020

SLEP013: n_features_out_ scikit-learn/enhancement_proposals#29

Merged

cmarmo added the Needs Decision Requires decision label Sep 8, 2020

Base automatically changed from master to main January 22, 2021 10:51

amueller closed this Jul 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRG add n_features_out_ attribute #14241

MRG add n_features_out_ attribute #14241

amueller commented Jul 2, 2019

amueller commented Jul 2, 2019

jnothman left a comment

jnothman left a comment

jnothman Jul 2, 2019

amueller Jul 24, 2019

amueller Jul 30, 2019

amueller commented Jul 30, 2019

amueller commented Jul 30, 2019

adrinjalali Jul 31, 2019

amueller Jul 31, 2019

adrinjalali Jul 31, 2019

amueller Jul 31, 2019

adrinjalali commented Aug 15, 2019 via email

amueller commented Aug 21, 2019

adrinjalali commented Sep 9, 2019

jnothman left a comment

jnothman Sep 9, 2019

amueller Sep 9, 2019

amueller commented Sep 9, 2019

amueller commented Sep 9, 2019

amueller commented Sep 9, 2019

jnothman commented Sep 9, 2019 via email

amueller commented Sep 10, 2019 •

edited

Loading

thomasjpfan Sep 13, 2019

adrinjalali commented Sep 23, 2019

amueller commented Sep 23, 2019

amueller commented Sep 24, 2019

amueller commented Jul 17, 2022

MRG add n_features_out_ attribute #14241

MRG add n_features_out_ attribute #14241

Conversation

amueller commented Jul 2, 2019

amueller commented Jul 2, 2019

jnothman left a comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jul 30, 2019

amueller commented Jul 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali commented Aug 15, 2019 via email

amueller commented Aug 21, 2019

adrinjalali commented Sep 9, 2019

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Sep 9, 2019

amueller commented Sep 9, 2019

amueller commented Sep 9, 2019

jnothman commented Sep 9, 2019 via email

amueller commented Sep 10, 2019 • edited Loading

Choose a reason for hiding this comment

adrinjalali commented Sep 23, 2019

amueller commented Sep 23, 2019

amueller commented Sep 24, 2019

amueller commented Jul 17, 2022

amueller commented Sep 10, 2019 •

edited

Loading