-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] Classifier chain #7602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Classifier chain #7602
Conversation
The appveyor test failure seems related. |
@@ -0,0 +1,39 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create an example that has a file name starting with plot_
and plots things? Plots make for more engaging examples, and only examples that start with plot_
are automatically run. Also, if you want to add a new folder to the examples, you need to create an empty README.txt file in that folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will work on adding an example with plots. Also please see my comment below regarding the README.txt file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I having a hard time coming up with something to plot here. I think the most compelling result is a comparison of the Jaccard similarity scores for independent models, a single classifier chain, and an ensemble of classifier chains. I suppose I could make a bar chart of those three numbers but that seems rather trivial. I experimented with plotting precision-recall curves (or ROC curves) for individual classes but the results are not that interesting to look at. I'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to suggest the confusion matrix, but I think we haven't implemented the multi-label confusion matrix yet.
It's actually not really important to add a plot for it to be a plot_
example. That just ensures it runs. But it requires that the example runs quickly enough. If it doesn't, maybe try yeast instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to yeast. I like it. It's much faster (the example runs in a few seconds). And it makes for a more convincing example.
there shouldn't be an |
self.classifiers = [] | ||
self.chain_order = None | ||
|
||
def fit(self, X, Y, chain_order=None, shuffle=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm I'm not sure if we use capital Y for labels (though it would be logical).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to lower case it or change it to anything else. Just LMK what the correct convention is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth the convention of capitalizing the variable names of multi-label arrays already appears in the documentation here:
http://scikit-learn.org/stable/auto_examples/plot_multilabel.html#sphx-glr-auto-examples-plot-multilabel-py
""" | ||
|
||
def __init__(self, | ||
base_estimator=LogisticRegression(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if having a mutable default is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I got rid of it.
Thanks, this looks interesting. Can you please also add an entry to the user guide? And maybe ping @arjoly ? |
removed logistic regression dependency
I removed the init.py file from the examples folder and added an README.txt file. The following exception is now getting thrown on CircleCI.
I tried copying the format of the README.txt files in other /examples directories but that had no effect. Any ideas? |
Put a docstring at the beginning of your example with a title and a text
that describe what the example does.
|
Ah. Thanks @GaelVaroquaux ! |
How do I go about adding an entry to the user guide? |
Edit |
============================ | ||
|
||
Demonstrate that a single classifier chain | ||
out performs 10 independent classifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out-performs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also format so that it's about 79 char each line (hopefully your editor can do that for you)
from sklearn.datasets import fetch_rcv1 | ||
|
||
# Get the Reuters Corpus Volume I dataset | ||
rcv1 = fetch_rcv1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, can you actually check how long this thing runs? If it's too long, it doesn't qualify as a plot_
example because it would make the doc build too slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I was worried about that. With my connection I believe it took a couple minutes to download. Of course that time is depends on connection speed. But it doesn't seem like something that should be run automatically.
Previously I was using make_multilabel_classification to generate a fake dataset which is obviously much faster. But I don't think this works because I believe the labels in the fake dataset are uncorrelated which means there is no advantage to using a classifier chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually not talking about the download (though that's also an issue) but about the computation. Have you tried yeast? That should have some correlations. And if make_multilabel_classification
is not helpful for multi-label classification we should change/fix it.
---------- | ||
classifiers_ : array | ||
List of classifiers, which will be used to chain prediction. | ||
chain_order : list of ints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that's estimated during training should have a trailing underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be okay with dropping chain_
and just making this order_
|
||
""" | ||
|
||
def __init__(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on one line?
|
||
Parameters | ||
---------- | ||
X : {array-like, sparse matrix}, shape (n_samples, n_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same nitpick here.
return Y_pred | ||
|
||
@if_delegate_has_method('base_estimator') | ||
def predict_proba(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this could use a little bit more documentation. Specifically, you may want to clarify that predict
and predict_proba
don't change the input to the later models in the chain, it always uses predict.
|
||
Parameters | ||
---------- | ||
base_estimator : estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on allowing arbitrary lists of classifiers? For instance, one could use a logistic regression for one task, a Random Forest classifier for another, etc.. This is not something that would need to be implemented for this PR to get merged, I'm just asking out of curiosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. We'd have to think about how this would work with random chain orders but I could imagine this being a useful feature.
Classifier Chain | ||
============================ | ||
Example of using classifier chain on a multilabel dataset. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to elaborate on the connections between classifier chains and model stacking. Might it be worth it to also mention that regression chains are possible? Maybe that's better saved for when someone implements it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my preference is to keep the text concise and focused on the example. If there is a specific point you think we should make here I'd consider adding a sentence or two. Otherwise I'm inclined to leave it as is.
I think we should stay away from dealing with regression models in this PR. But I agree that is a good direction to go.
voting ensemble of classifier chains by averaging the binary predictions of | ||
the chains and apply a threshold of 0.5. The Jaccard similarity score of the | ||
ensemble is greater than that of the independent models and tends to exceed | ||
the score of each chain in the ensemble (although this is not guarenteed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guaranteed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
Overall, like I've said before, this is strong work and I'm glad that we're close to merging it. At the risk of being reviewer 3, I'm wondering if it might be worthwhile to consider the idea of regression chains here, and abstract out a base chain object. @adamklec what are your thoughts? Are regression chains as popular as classification chains? |
What do you mean by "at the risk of being reviewer 3"? |
I'm curious, @jmschrei: Do you expect that there is any benefit in waiting until regression chains are implemented before we merge this? I.e. do you expect it will change the API or help find bugs in this implementation? Here is what I hope to create issues for once this is merged:
|
When submitting a paper reviewer 3 tends to always be the annoying one that makes you do a bunch of work. I am OK with not having regression chains supported in this PR, I was just interested in discussing it. Otherwise this LGTM once my minor issues are taken care of. |
@jmschrei I made the changes you requested. I did not do anything related to regression chains though. That seems like it could be a good idea but I think it should be addressed in another PR. @jnothman to your list of issues to be addressed I would add
In |
Okay. I am in agreement with most of your comments, but will do a full review tonight (and hopefully merge!) I do think it might be valuable somewhere to mention the connection between stacking and classifier chains, since they are similar but not identical. Is there a place you think this might fit? |
how does support for multilabel ensembles depend on this issue?
…On 29 Jun 2017 5:25 am, "Jacob Schreiber" ***@***.***> wrote:
Okay. I am in agreement with most of your comments, but will do a full
review tonight (and hopefully merge!) I do think it might be valuable
somewhere to mention the connection between stacking and classifier chains,
since they are similar but not identical. Is there a place you think this
might fit?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7602 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-joTEln3T7CxWbsYJENZlfZ2m2Zks5sIqijgaJpZM4KRXOp>
.
|
It doesn't. Like I said, I will review the changes when I get home. In the
meantime I was asking a question.
On Wed, Jun 28, 2017 at 3:51 PM Joel Nothman <notifications@github.com>
wrote:
… how does support for multilabel ensembles depend on this issue?
On 29 Jun 2017 5:25 am, "Jacob Schreiber" ***@***.***>
wrote:
> Okay. I am in agreement with most of your comments, but will do a full
> review tonight (and hopefully merge!) I do think it might be valuable
> somewhere to mention the connection between stacking and classifier
chains,
> since they are similar but not identical. Is there a place you think this
> might fit?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#7602 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AAEz6-joTEln3T7CxWbsYJENZlfZ2m2Zks5sIqijgaJpZM4KRXOp
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7602 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADvEEPSRGyowsRH7Nv14LAxP67-Dkflhks5sItjSgaJpZM4KRXOp>
.
|
LGTM! |
Congratulations, @adamklec!
…On 29 Jun 2017 3:34 pm, "Jacob Schreiber" ***@***.***> wrote:
Merged #7602 <#7602>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7602 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yFUVq21C1iXlz6ciFHyvkyQse7cks5sIzddgaJpZM4KRXOp>
.
|
Awesome! Thanks everyone. |
[MRG+2] Classifier chain
[MRG+2] Classifier chain
[MRG+2] Classifier chain
[MRG+2] Classifier chain
[MRG+2] Classifier chain
[MRG+2] Classifier chain
[MRG+2] Classifier chain
[MRG+2] Classifier chain
[MRG+2] Classifier chain
This PR implements Classifier Chains for multi-label classification as described here:
http://www.cs.waikato.ac.nz/ml/publications/2009/chains.pdf
#3727 is a similar PR that seems to be stalled. This PR has one significant improvement over #3727. Since in general it is impossible to know the optimal ordering of classifiers for a particular problem a priori it is common to use an ensemble of randomly ordered classifier chains (as explained in the above paper). This implementation of classifier chains supports randomizing the chain order.
I included an example that demonstrates that on a multi-label dataset an ensemble of 10 randomly ordered classifier chains out-performs independently trained models.