[MRG] Refactor pipeline namespace to make it more reusable #11446

caioaao · 2018-07-05T22:44:18Z

Related: #8960
Explanation: #8960 (comment)

Basically Pipeline and FeatureUnion classes are really useful for combining models and making the code more extensible may provide a nice starting point for other features. My personal use case is building a stacked generalization framework of of it.

Summary of changes

Extracted code to stack estimators' outputs in a private method (_stack_results);
Extracted snippet responsible for applying weights to results (_apply_weights`);
Added functions that were extracted for parallel computation as attributes of FeatureUnion and Pipeline so they can be overridden when needed.

rth

Thanks for your work on stacking @caioaao !

A few comments are below, but generally this LGTM. Note that if you create a new package that depends on this, there is no guarantee that these private functions won't change in the future.

rth · 2018-07-10T21:43:51Z

sklearn/pipeline.py

+    return _apply_weight(res, weight)
+
+
+def _fit_transform_one(transformer, X, y, weight, **fit_params):


It suppose it doesn't work if you leave _fit_transform_one in the same place as where it was before?

rth · 2018-07-10T21:52:57Z

sklearn/pipeline.py

+    @property
+    def _fit_transform_one(self):
+        # property needed because `memory.cache` doesn't accept class methods
+        return _fit_transform_one


maybe,

_fit_transform_one = property(_fit_transform_one)

above the __init__, to be more concise.

No sorry, I meant,

_fit_transform_one = staticmethod(_fit_transform_one)

Would that work?

I'm not sure but I don't think so, since I think it needs to return _fit_transform_one function for parallel to work. I'll give it a try anyway

EDIT: it works :)

rth · 2018-07-10T21:53:56Z

sklearn/pipeline.py

+
+    @property
+    def _transform_one(self):
+        return _transform_one


Same comment as above. Add a comment that these are defined to facilitate customizing behavior by subclassing.

rth · 2018-07-10T22:00:18Z

sklearn/pipeline.py

            for _, trans, _ in self._iter())
        self._update_transformer_list(transformers)
        return self

+    def _stack_results(self, Xs):


Decorate with @staticmethod ?

rth · 2018-07-10T22:04:48Z

sklearn/pipeline.py

+    # if we have a weight for this transformer, multiply output
+    if weight is None:
+        return Xt
+    return Xt * weight


I'm not sure there is really a need to factorize the code into this additional function. Here we only have 2 repetitions with little code, and a rule of thumb (source forgotten) is to factorize after 3..

looking at the code base right now I'd agree with you, but it won't hurt to keep it as a static method (and I use this as well 😬 )

Yes, but if it's only used in _transform_one and _fit_transform_one you could always overwrite those two functions and define _apply_weight as need, can't you? (or copy this function into your package).

I could, but what would be the benefit of doing so? Another benefit from wrapping snippets in functions, besides reusability, is readability. This way you don't need a comment describing the purpose of a snippet

caioaao · 2018-07-10T22:38:44Z

@rth thanks! that's something I haven't considered, but I guess I'll have to cross that bridge when we come to it. hopefully we'll have stacking in scikit learn at that point 🤞

rth · 2018-07-10T23:05:02Z

sklearn/pipeline.py

-            Xs = np.hstack(Xs)
-        return Xs
+
+        return self._stack_results(Xs)


Maybe call this _stack_arrays ?

rth · 2018-07-10T23:07:49Z

sklearn/pipeline.py

+    # if we have a weight for this transformer, multiply output
+    if weight is None:
+        return Xt
+    return Xt * weight


Yes, but if it's only used in _transform_one and _fit_transform_one you could always overwrite those two functions and define _apply_weight as need, can't you? (or copy this function into your package).

amueller · 2019-08-05T19:30:33Z

Is this still relevant?

thomasjpfan · 2022-04-24T13:31:14Z

Now that stacking is merged in #11047, I do not think this PR is required anymore. With that in mind, I am closing this PR.

Refactor pipeline namespace to make it more reusable

40fc722

caioaao changed the title ~~Refactor pipeline namespace to make it more reusable~~ [MRG] Refactor pipeline namespace to make it more reusable Jul 9, 2018

caioaao mentioned this pull request Jul 10, 2018

[MRG+1] Stacking classifier with pipelines API #8960

Closed

7 tasks

rth reviewed Jul 10, 2018

View reviewed changes

Properties -> staticmethods

b946c33

rth reviewed Jul 10, 2018

View reviewed changes

rename method

0fcb79d

github-actions bot added the module:pipeline label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:50

thomasjpfan closed this Apr 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Refactor pipeline namespace to make it more reusable #11446

[MRG] Refactor pipeline namespace to make it more reusable #11446

caioaao commented Jul 5, 2018 •

edited

Loading

rth left a comment

rth Jul 10, 2018

rth Jul 10, 2018

rth Jul 10, 2018

caioaao Jul 10, 2018 •

edited

Loading

rth Jul 10, 2018

rth Jul 10, 2018

rth Jul 10, 2018

caioaao Jul 10, 2018

rth Jul 10, 2018

caioaao Jul 11, 2018 •

edited

Loading

caioaao commented Jul 10, 2018

rth Jul 10, 2018

rth Jul 10, 2018

amueller commented Aug 5, 2019

thomasjpfan commented Apr 24, 2022

		return _apply_weight(res, weight)


		def _fit_transform_one(transformer, X, y, weight, **fit_params):

[MRG] Refactor pipeline namespace to make it more reusable #11446

[MRG] Refactor pipeline namespace to make it more reusable #11446

Conversation

caioaao commented Jul 5, 2018 • edited Loading

Summary of changes

rth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caioaao Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caioaao Jul 11, 2018 • edited Loading

Choose a reason for hiding this comment

caioaao commented Jul 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Aug 5, 2019

thomasjpfan commented Apr 24, 2022

caioaao commented Jul 5, 2018 •

edited

Loading

caioaao Jul 10, 2018 •

edited

Loading

caioaao Jul 11, 2018 •

edited

Loading