-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] FIX Pipelined fitting of Clustering algorithms, scoring of K-Means in pipelines #4064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Okay, I guess that's one option; the alternative is to not pass |
I am just now a bit confused. |
we should do it consistently, but I guess it is to late for that now? |
I actually prefer the |
import numpy as np,sklearn
from sklearn.pipeline import Pipeline
from sklearn.cluster import MeanShift
X=np.random.rand(100,2)
p=Pipeline([('clf',MeanShift())])
p.fit(X) breaks and makes me sad. |
For the pipeline, it could be that previous steps in the pipeline use |
Don't be sad. As Gaël likes to say: it's not version 1 yet, stuff's allowed I can't say I have the answer here, yet, but I think you're thinking about On 8 January 2015 at 13:35, Andreas Mueller notifications@github.com
|
Yeah, I think this is the only way to make pipeline work as expected. Testing |
9f2ec74
to
f89c72f
Compare
f89c72f
to
d55d272
Compare
reviews on this one would be nice as it makes testing somewhat simpler. |
y = (X[:, 0] * 4).astype(np.int) | ||
y = multioutput_estimator_convert_y_2d(name, y) | ||
with warnings.catch_warnings(record=True): | ||
estimator = Estimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add a warnings.simplefilter('always')
here so that the registry doesn't catch some warnings and create heisenbugs... Better, decorate the whole test with @ignore_warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not particular to this test, though, right? and I'm not sure the whole tests should be decorated. These are more for the deprecation warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be reasonable to ignore all sklearn
warnings, at least, in common tests. Perhaps we should change ignore_warnings
to only apply to sklearn
modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly enough, if I put @ignore_warnings
on the test (not the check function) not test is run :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about adding a test for partial_fit
, but I'm not sure about it. The pipelines need to look quite a bit different to support partial_fit
, and there is no pressing reason currently to enforce taking a y
in the API....
27ddff9
to
f02335a
Compare
Firstly, doc/developers/index.rst needs fixing. Secondly, I think this change should encompass |
What is the benefit in |
Consistency with |
Fair enough. |
Actually, there was a comment about |
Oh, I didn't look at it closely enough. It actually gives an invocation On 29 January 2015 at 10:32, Andreas Mueller notifications@github.com
|
Obscurely, it shows 1-ary On 29 January 2015 at 14:05, Joel Nothman joel.nothman@sydney.edu.au
|
@@ -716,8 +716,11 @@ is not met, an exception of type ``ValueError`` should be raised. | |||
``y`` might be ignored in the case of unsupervised learning. However, to | |||
make it possible to use the estimator as part of a pipeline that can | |||
mix both supervised and unsupervised transformers, even unsupervised | |||
estimators are kindly asked to accept a ``y=None`` keyword argument in | |||
estimators need to accept a ``y=None`` keyword argument in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this doc was already here. I guess I only looked at a glance at what I thought was an issue, but not really.
Great. This LGTM apart from your test failure, which I take it will be fixed before the next reviewer gives a +1 ;) |
Thanks for the review. Damn, I thought I just fixed travis. Will look at it again. |
e37f56a
to
fa184fe
Compare
Fixed. |
Any other reviews? @ogrisel maybe? |
As the motivation of this change was to make it possible to call the clustering models in pipelines I would like to integrate a non-regression "integration" test in the spirit of #4064 (comment) in |
Other than that +1 on my side. |
Hum, I totally wanted to do that integration test with pipelines, seems I forgot about it. Not sure what should be tested, probably "everything". This PR has tests for the "API definition' of Pipeline, not doing the integration. |
I was thinking of a bunch of sanity checks such as:
and the similar clustering and regressors and maybe others as long as they implement |
I thought about which functions we want to test, I guess that this issue only exists for |
|
cfee0a1
to
a105327
Compare
added integration test, rebased. I am a bit confused.
That y is never used anywhere and not required by any API, is it? |
…f clustering algorithms!
a105327
to
cd63acc
Compare
Anyhow, this PR is good to go, I think. I removed the requirement on |
Great, merging this PR. Thanks again @amueller ! |
[MRG+1] FIX Pipelined fitting of Clustering algorithms, scoring of K-Means in pipelines
:) Rest assured this fix is in accordance with what's been in the docs for some time... |
Fixes #4063.
Also fixes the embarrassing bug where clustering algorithms can't be used in pipelines.