[WIP] Implemented SelectFromModel meta-transformer #3011

maheshakya · 2014-03-27T21:34:08Z

Fix for #2160
Implemented a meta-transformer to with _LearntSelectorMixin. Test cases are included.
Documentation(with examples) needs to be completed.

jnothman · 2014-03-27T22:51:52Z

Thanks for tackling this. A number of points:

please put this in the same file as _LearntSelectorMixin
although I suggested inheriting from _LearntSelectorMixin, it isn't sufficient for correct operation. _LearntSelectorMixin.transform inspects the estimator directly, e.g. to inspect the penalty parameter. You will need to turn _LearntSelectorMixin.transform into one or more functions.
You shouldn't set any underscore-suffixed attributes in your estimator's __init__. This should be done in fit. It's fine to postpone validation to fit.
You shouldn't need to have estimator_params. clone will keep any settings, and BaseEstimator will handle the delegation of attribute setting so that this can work in grid search.
SelectFromModel should accept two parameters: estimator and threshold.
I am not sure about duplicating the coefficients here to store them as an attribute. It would be more relevant to store the aggregate feature importances: because _LearntSelectorMixin currently has to sum over the coefficients for multiclass linear models, it might be useful to see the summed features.) But it's not essential to store these locally.
It would also be nice to have an attribute threshold_ since _LearntSelectorMixin's calculation of the actual threshold can be non-trivial.
SelectFromModel should inherit from sklearn.feature_selection.base.SelectorMixin and implement _get_support_mask.
This PR should include the deprecation of _LearntSelectorMixin. Use the @deprecated decorator on transform.

I'm changing this PR to a [WIP]. I hope that's alright with you.

jnothman · 2014-03-27T22:56:41Z

Also, the failing tests happen because this can't be constructed as it is. I think it should be included here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/testing.py#L441.

And although it does nothing at the moment, please inherit from sklearn.base.MetaEstimatorMixin.

maheshakya · 2014-03-30T07:24:09Z

I have separated out the section where threshold, importances and penalty is retrieved in _LearntSelectorMixin into two functions. One of them uses @depricated decorator.

Other issues are fixed.

jnothman · 2014-03-30T09:22:10Z

sklearn/feature_selection/from_model.py

+
+        return estimator
+
+    def __init__(self, estimator=None, threshold=None):


Please move this method to right below the class docstring.

estimator=None is not useful. Leave it without a default.

maheshakya · 2014-03-30T09:41:36Z

Since threshold in now a parameter to SelectFromModel, transform function should use that value. So in that case the value taken from the transform function will be useless.
Is this a correct way?

When transform function is called from another estimator, it will act in the usual way.

coveralls · 2014-03-30T09:56:51Z

Coverage remained the same when pulling 9506120 on maheshakya:select_from_model into eb10c4c on scikit-learn:master.

jnothman · 2014-03-30T10:38:57Z

So in that case the value taken from the transform function will be useless.

The correct way is to redefine transform in the metaestimator not to take a threshold parameter.

jnothman · 2014-03-30T10:41:34Z

sklearn/feature_selection/from_model.py

+        if self.estimator_ is None:
+            raise ValueError("estimator cannot be None")
+
+    def _make_estimator(self):


Please inline this method.

jnothman · 2014-03-30T10:47:12Z

I don't think it's a good idea to store support_mask_. It is inconsistent with other feature selectors, duplicates the get_support method provided by SelectorMixin, and means you can't change the threshold and just call get_support to reevaluate it.

jnothman · 2014-03-30T10:47:38Z

sklearn/feature_selection/from_model.py

+        self.estimator = self._make_estimator()
+
+        # Convert data
+        X, y = check_arrays(X, y)


I don't think this belongs here. It's the base estimator's business.

jnothman · 2014-03-30T10:57:12Z

Btw, it may be cleaner to implement this with the metaestimator not inheriting from the mixin. Please feel free to do it that way if you think it improves the code.

maheshakya · 2014-03-30T17:26:06Z

I couldn't find a way to implement _get_support_mask function without storing the mask in the transform function without duplicating the code. So I made it a private member.

coveralls · 2014-03-30T17:31:56Z

Coverage remained the same when pulling be4f070 on maheshakya:select_from_model into fbe974b on scikit-learn:master.

jnothman · 2014-03-30T23:17:19Z

This should probably support partial_fit. It should possibly also support warm_start, which would involve not using clone.

maheshakya · 2014-03-31T07:34:54Z

@jnothman, I apologize for the inconvenience, Can you explain what exactly needs to be done in partial_fit. I checked out several estimators that defines it. But those seem to be performing different tasks.

glouppe · 2014-03-31T07:43:28Z

sklearn/feature_selection/from_model.py

+        importances = estimator.feature_importances_
+        if importances is None:
+            raise ValueError("Importance weights not computed. Please set "
+                             "the compute_importances parameter before fit.")


This is deprecated in forests. You can remove this if-statement.

jnothman · 2014-03-31T10:22:33Z

partial_fit allows a model to be trained without keeping all the training data in memory (or providing it) at once. Here, partial_fit should create estimator_ only on the first call, and call estimator_.partial_fit for each call.

The only reason I say it's necessary is because it's a way the current mixin can be used.

maheshakya · 2014-03-31T18:12:32Z

I have added partial_fit and warm_start.

BTW is there something wrong with Travis CI? It doesn't seem to be building my last two commits.

jnothman · 2014-04-01T03:35:34Z

BTW is there something wrong with Travis CI? It doesn't seem to be building my last two commits.

It's highly non user-friendly, but this means that it can't automatically merge your work with master, which it does before testing. Could you please rebase on the current master and force-push the rebased branch?

coveralls · 2014-04-01T10:48:03Z

Coverage remained the same when pulling 9a41f2f on maheshakya:select_from_model into fec2867 on scikit-learn:master.

coveralls · 2014-04-01T11:33:12Z

Coverage remained the same when pulling 1469c48 on maheshakya:select_from_model into fec2867 on scikit-learn:master.

jnothman · 2014-04-01T11:53:35Z

sklearn/feature_selection/from_model.py

+            Returns self.
+        """
+        if not hasattr(self.estimator, "partial_fit"):
+            raise(AttributeError, "estimator does not have"


That ( can't be there for this to be correct syntax in Python 3.

But I think it's fine if you don't explicitly check this case. The AttributeError produced by self.estimator_.partial_fit (e.g. ''LinearSVC' object has no attribute 'partial_fit'') is clear enough.

Yes, I agree.
I removed the test case for that as well.

coveralls · 2014-04-01T12:16:06Z

Coverage remained the same when pulling ccbab10 on maheshakya:select_from_model into fec2867 on scikit-learn:master.

coveralls · 2014-04-01T12:30:56Z

Coverage remained the same when pulling 77b30e4 on maheshakya:select_from_model into fec2867 on scikit-learn:master.

jnothman · 2014-04-01T13:06:32Z

I'm not sure the tests are quite satisfactory yet (and I might not be around for the next few days to take another look), but I think we need to seek opinions as to whether a meta-estimator improves on the mixin. @glouppe, WDYT, and who else is likely to be opinionated on this?

maheshakya · 2014-04-01T15:35:52Z

What are the tests that need to be improved. I can work on those.

jnothman · 2014-04-02T23:21:21Z

I'm not able to look in great detail at the tests right now, but in order
for this patch to be complete, internal uses of the old (mixin) behaviour
need to be changed, including in documentation and examples, including
https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/feature_selection.rst#l1-based-feature-selection,
https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/feature_selection.rst#tree-based-feature-selectionand
probably other places.

On 2 April 2014 02:35, maheshakya notifications@github.com wrote:

What are the tests that need to be improved. I can work on those.

Reply to this email directly or view it on GitHubhttps://github.com//pull/3011#issuecomment-39220058
.

maheshakya · 2014-04-04T06:00:45Z

Thanks. I will give it a shot. (for improved test cases, examples and documentation)

coveralls · 2014-04-15T06:20:40Z

Coverage remained the same when pulling 7f12af3 on maheshakya:select_from_model into fec2867 on scikit-learn:master.

coveralls · 2014-04-15T07:12:47Z

Coverage remained the same when pulling 11d1e81 on maheshakya:select_from_model into fec2867 on scikit-learn:master.

maheshakya · 2014-04-15T15:07:04Z

I suppose we need to change every example that uses feature selection based on LearntSelectorMixin, right?

jnothman · 2014-04-16T13:17:53Z

Yes, unfortunately.

MechCoder · 2015-02-12T07:07:08Z

It seems that there are unrelated changes. I'm not sure how adding a meta-transformer should affect travis.yml

MechCoder · 2015-02-12T10:01:13Z

Closed in favor of #4242 . I rescued this PR from git hell!

jnothman changed the title ~~[MRG] Implemented SelectFromModel meta-transformer~~ [WIP] Implemented SelectFromModel meta-transformer Mar 27, 2014

jnothman reviewed Mar 30, 2014
View reviewed changes

glouppe reviewed Mar 31, 2014
View reviewed changes

maheshakya added 6 commits April 1, 2014 14:52

Implemented SelectFromModel meta-transformer

c1e7bb1

Fixed issues in SelectFromModel

45d8f18

Fixed minor issues and pep8 errors

3f044a9

from_model and __init__ updated

8c37747

Moved __init__ to top. Test failure fixed?

60c494a

Fixed type

3e204cf

jnothman and others added 4 commits April 1, 2014 14:52

ENH restructure SelectFromModel and _LearntSelectorMixin

d3d9c00

Removed check for deprecated .

91800a9

Added partial_fit and warm_start

941831f

Fixed documentation.Removed obfuscating code

f7f839b

modified warm_start check

3486c6b

jnothman reviewed Apr 1, 2014
View reviewed changes

Updated tests

f6be07b

maheshakya added 3 commits April 15, 2014 18:59

Removed the check for partial_fit attribute.

7f6fbaa

Updated tests: feature_importances when sample weights involved

e995d2b

Added example

5b5ff49

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

jnothman mentioned this pull request Nov 23, 2014

Resampler estimators that change the sample size in fitting #3855

Open

jnothman mentioned this pull request Feb 12, 2015

[MRG] Inherit LinearModels from _LearntSelectorMixin #4241

Closed

MechCoder mentioned this pull request Feb 12, 2015

[MRG+2] Implemented SelectFromModel meta-transformer #4242

Merged

MechCoder closed this Feb 12, 2015


		return estimator

		def __init__(self, estimator=None, threshold=None):

Uh oh!

[WIP] Implemented SelectFromModel meta-transformer #3011

[WIP] Implemented SelectFromModel meta-transformer #3011

Uh oh!

Conversation

maheshakya commented Mar 27, 2014

Uh oh!

jnothman commented Mar 27, 2014

Uh oh!

jnothman commented Mar 27, 2014

Uh oh!

maheshakya commented Mar 30, 2014

Uh oh!

jnothman Mar 30, 2014

Choose a reason for hiding this comment

Uh oh!

maheshakya commented Mar 30, 2014

Uh oh!

coveralls commented Mar 30, 2014

Uh oh!

jnothman commented Mar 30, 2014

Uh oh!

jnothman Mar 30, 2014

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 30, 2014

Uh oh!

jnothman Mar 30, 2014

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 30, 2014

Uh oh!

maheshakya commented Mar 30, 2014

Uh oh!

coveralls commented Mar 30, 2014

Uh oh!

jnothman commented Mar 30, 2014

Uh oh!

maheshakya commented Mar 31, 2014

Uh oh!

glouppe Mar 31, 2014

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 31, 2014

Uh oh!

maheshakya commented Mar 31, 2014

Uh oh!

jnothman commented Apr 1, 2014

Uh oh!

coveralls commented Apr 1, 2014

Uh oh!

coveralls commented Apr 1, 2014

Uh oh!

jnothman Apr 1, 2014

Choose a reason for hiding this comment

Uh oh!

maheshakya Apr 1, 2014

Choose a reason for hiding this comment

Uh oh!

coveralls commented Apr 1, 2014

Uh oh!

coveralls commented Apr 1, 2014

Uh oh!

jnothman commented Apr 1, 2014

Uh oh!

maheshakya commented Apr 1, 2014

Uh oh!

jnothman commented Apr 2, 2014

Uh oh!

maheshakya commented Apr 4, 2014

Uh oh!

coveralls commented Apr 15, 2014

Uh oh!

coveralls commented Apr 15, 2014

Uh oh!

maheshakya commented Apr 15, 2014

Uh oh!

jnothman commented Apr 16, 2014

Uh oh!

MechCoder commented Feb 12, 2015