-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Pipeline can now be sliced or indexed #2568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have a strong dislike for overriding "__" methods in order to give a domain-specific behavior to objects. I find that it leads to code that is not explicit and hard to read unless you know the package very well. Of course, a method with an explicite name ("get_estimators") would not raise such criticism from me. However, when discussing @schwarty's usecase, the reason that we had envisaged a "get_estimated" method was that, for the better or the worst, it could be useful in many composite estimators other than the pipeline, for instance the GridSearch, and it would somewhat abstract the details of the compositing done by the |
And sorry, I forgot to commit changes to |
I have a strong like for APIs that are convenient and intuitive with minimal surprises, and in that, magic methods are little different. I don't find it surprising that a Pipeline should have syntactic behaviours akin to a list or an ordered dict (had Pipeline explicitly declared itself a Python I find the code resulting from this proposal much more intuitive, unsurprising and explicit than |
I actually like this idea because it follows a Python convention, but it still needs to be documented somehow. In particular, it's not immediately obvious that slicing a pipeline makes a shallow copy: settings the steps in a slice changes the slice, but fitting one of the estimators changes the original. |
Certainly a Also, the semantics are no different from other Python containers, but that On Fri, Nov 22, 2013 at 8:45 PM, Lars Buitinck notifications@github.comwrote:
|
No, but the semantics are different from those of NumPy arrays. For simplicity's sake, I think a shallow copy is fine, it's just that we have to spell it out somewhere. |
Well, I don't think we'll support |
I agree with @larsmans on this one - if it's documented well I'm quite +1 on this PR as it's quite intuitive. Though I see @GaelVaroquaux 's point too. Where are we in terms of moving forward on either this PR or the alternatives? |
Added to docstrings/tests, narrative doc, example |
I am busy preparing a course for statistics in Python for beginners. The Think about this PR is this respect: how will students or beginners To quote the zen of Python: Explicit is better than implicit. I am aware that "Beautiful is better than ugly" could be applied here. I In addition, I don't believe that this PR will not solve a very general |
As with any religious scripture, we can agree on the text of the Zen, but only marginally on its interpretation. We can probably also agree that numpy, matplotlib and probably pandas aren't exemplary disciples of Zen, where other priorities (chiefly compatibility) came into play. So I wish you luck in teaching people the "one obvious way to do it" in that context, knowing that they will read plenty of code that differs. Anyway, if there should be one obvious way to do it, let's consider the alternatives to pipeline.steps[-1][1] is both inexplicit and ugly, but we see it often enough. We could convert step tuples to namedtuples (does the API design allow us to do this in pipeline.steps[-1].estimator which is explicit and not ugly, but verbose. Or we could provide a For getting a sub-pipeline, |
PS: I can't find that error in the Travis log. Any clues? |
@jnothman regarding travis, I assume it's the doctest one:
Though this doesn't give much on what or how -- trying to see if I can narrow it down |
Here you go, I think:
|
That's the only failure |
Oh. I must have modified it after testing it locally. Thanks. On Tue, Nov 26, 2013 at 10:25 PM, Jaques Grobler
|
I also find that indexing pipeline is intuitive as it's fundamentally an ordered sequence. To me this is the one obvious way to do it, that is to construct a sub-pipeline without leading or trailing estimators and visualize the partial (inverse) transformations the produce on test data for model inspection purpose. |
:). I like that analogy.
Yes, and it's a big problem for beginners.
We shouldn't make things worse. They are already pretty bad.
I'd like to stress that this proposed solution doesn't help at all the |
I have some embarrassing environment issues on my laptop preventing me from running tests easily... so excuse me if I do this through more commits than I should |
just reading the PR number makes me laugh too 2568 :)
… |
I'm still not sold on overriding the dunder methods (after all these years :D ). I heard the arguments against a method called "get_slice" (which are that "slice" is a word that non-Python users might not identify with what we are doing here). I would suggest "get_segment", or "get_portion" (I prefer "get_segment". |
And continue using named_steps to pull the coefficients out?
I don't particularly like get_segment but will do it if others do.
Could also consider split_before but it doesn't make it easier to get the
coefficients out of the tail since two Pipelines would be returned.
|
I am personally fine with the current proposal.
… |
I feel like any possible word will be less intuitive than using slicing syntax and will make it more complicated. |
@@ -188,6 +199,26 @@ def _iter(self, with_final=True): | |||
if trans is not None and trans != 'passthrough': | |||
yield idx, name, trans | |||
|
|||
def __getitem__(self, ind): | |||
"""Returns a sub-pipeline or a single esimtator in the pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in "estimator"
returns another Pipeline instance which copies a slice of this | ||
Pipeline. This copy is shallow: modifying (or fitting) estimators in | ||
the sub-pipeline will affect the larger pipeline and vice-versa. | ||
However, replacing a value in `step` will not affect a copy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps clarify: replacing a value in step in the original pipeline instance of the sub-pipeline instance.
assert pipe['transf'] == transf | ||
assert pipe[-1] == clf | ||
assert pipe['clf'] == clf | ||
assert_raises(IndexError, lambda: pipe[3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps another test could be added for sub-pipeline index over several steps that exceeds the max. The present test gets at the case where a single estimator is returned, but not the case where a sub-pipeline is returned as a Pipeline() instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't allow pushing to the branch so I suggested changes ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good apart from my suggestions.
Co-Authored-By: jnothman <joel.nothman@gmail.com>
Co-Authored-By: jnothman <joel.nothman@gmail.com>
Co-Authored-By: jnothman <joel.nothman@gmail.com>
hm can you fix it up or allow me to push? |
see amueller@3733569 |
I haven't worked out how to allow you to push (it's a very old PR). I'm
leaving Berlin tonight, and I'll probably be able to pay these things a
little more attention soon. For now, just merging your branch.
…On Mon, 4 Mar 2019 at 17:10, Andreas Mueller ***@***.***> wrote:
see ***@***.***
<amueller@3733569>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz60GgvtZg7zR6IurdksS3IE6Vt7aQks5vTUWCgaJpZM4BKcwO>
.
|
right, you're still travelling. Sorry to bug you. I should have known given that they just send me your boarding pass ;) |
but looks good to merge? @rth @ogrisel @qinhanmin2014 any thoughts / wanna press the green button? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good :)
Co-Authored-By: jnothman <joel.nothman@gmail.com>
oO
|
* ENH Pipeline can now be sliced or indexed * Additional assertion imports for testing * DOC Documentation and example for Pipeline slicing * FIX put doctest lines in correct order * DOC improve compose Pipeline docs * Fix doctest * Fix merge error * DOCs improved after Alex's comments * This is not the right place to change to LinearSVC * missed one * DOC add what's new * Fix doctest * doctest tweaks Co-Authored-By: jnothman <joel.nothman@gmail.com> * doctest tweaks Co-Authored-By: jnothman <joel.nothman@gmail.com> * doctest tweaks Co-Authored-By: jnothman <joel.nothman@gmail.com> * fix doctests * Correct step name * Update doc/whats_new/v0.21.rst Co-Authored-By: jnothman <joel.nothman@gmail.com>
This reverts commit a125129.
This reverts commit a125129.
* ENH Pipeline can now be sliced or indexed * Additional assertion imports for testing * DOC Documentation and example for Pipeline slicing * FIX put doctest lines in correct order * DOC improve compose Pipeline docs * Fix doctest * Fix merge error * DOCs improved after Alex's comments * This is not the right place to change to LinearSVC * missed one * DOC add what's new * Fix doctest * doctest tweaks Co-Authored-By: jnothman <joel.nothman@gmail.com> * doctest tweaks Co-Authored-By: jnothman <joel.nothman@gmail.com> * doctest tweaks Co-Authored-By: jnothman <joel.nothman@gmail.com> * fix doctests * Correct step name * Update doc/whats_new/v0.21.rst Co-Authored-By: jnothman <joel.nothman@gmail.com>
This PR offers an alternative to #2561 and #2562, making it easy to apply inverse transforms (or transforms) over only a sub-sequence of steps in a pipeline. Thus:
@schwarty, is this sufficient for your needs?
Closes #8431, an alternative API
Closes #8448, an alternative API