-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Pipeline on labels_ instead of just transforms #4543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some clusterers can be used as transformers, e.g. centroid distance in On 8 April 2015 at 10:06, selwyth notifications@github.com wrote:
|
I'm not entirely sure I understand the question. Do you want the |
Yes, want Trying to be more explicit here: say I want to classify the |
I can see what you are trying to do, and it will probably be possible soonish. |
Right. I get what you're doing now. It's not hard to construct a meta-estimator that does this, and I think it's a fairly common way of making clusterers inductive. I've not tested it, but: from sklearn.base import clone, BaseEstimator
from sklearn.utils.metaestimators import if_delegate_has_method
class InductiveClusterer(BaseEstimator):
def __init__(self, clusterer, classifier):
self.clusterer = clusterer
self.classifier = classifier
def fit(self, X, y=None):
self.clusterer_ = clone(self.clusterer)
self.classifier_ = clone(self.classifier)
y = self.clusterer_.fit_predict(X)
self.classifier_.fit(X, y)
return self
@if_delegate_has_method(delegate='classifier_')
def predict(self, X):
return self.classifier_.predict(X)
@if_delegate_has_method(delegate='classifier_')
def decision_function(self, X):
return self.classifier_.decision_function(X)
# etc...
Not if it's a clustering evaluation....? |
PS: with |
Also, @amueller, this could be used to learn a transformed feature space, an L1 feature selector etc that are informed by the clustering, rather than merely for the purpose of clustering. |
@selwyth, if you find the above code snippet useful, I think it would make a helpful example for sklearn's documentation. Feel free to flesh out an illustrative example of the technique using a shared dataset and submit a PR. |
@amueller That's a good point. Perhaps have to warn user to limit searches to higher @jnothman Thank you for the snippet. I understand it, will study it some more, think through the other methods in the 'etc' and test before submitting a PR for documentation on this example. Forgot to mention the feature space is different for both steps in my example, but I see how to easily work that into your meta-estimator. |
Ah, I misunderstood the setting a bit. If you use an unsupervised clustering metric in the end, it makes sense. |
I guess the point is then to add a |
Well, for the feature space to be different, you'll almost certainly need custom code. However, you could do that with both the clusterer and the classifier being a Pipeline incorporating feature extraction / transformation. However, currently Pipeline doesn't support fit_predict. New issue at #4572 |
This issue has helped identify #4572, which has since been fixed! But otherwise, I don't think we need heavy internal changes to support this case, but rather should either add the likes of |
Ah, rereading, I see that @selwyth intends to submit a PR. I look forward to it. Regarding "think through the other methods in the 'etc'", see the set of methods delegated by |
Feel free to close. |
My current use case is to cluster a bunch of data to generate labels, then use a classifier to draw decision boundaries and make label predictions on new data.
I love using pipeline and gridsearch packages in tandem, but can't use it for this because clustering does not have a transform method for X. Is it worthwhile to make pipeline components chain-able by the labels_ attribute in addition to transform?
The text was updated successfully, but these errors were encountered: