-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add criterion log_loss as alternative to entropy in trees and forests #23047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add criterion log_loss as alternative to entropy in trees and forests #23047
Conversation
I don't think "log_loss" is a good name for a splitting criterion. As explained in: #18248 (comment) log loss is a cross-entropy (KL divergence between an empirical distribution and the distribution induced by the predictions of the model) while the entropy criterion is an quantity computed on the PMF of a samples at a given node in the tree. The criterion is about a maximum change of entropy values before and after a split, but this is not explicitly computing a cross-entropy at any point. |
Following #18248 (comment), when we have both names, |
|
I should read first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'll add a what's new entry. It does seem the right thing to do to mention the new equivalent criterion log_loss. |
Merged! |
Reference Issues/PRs
Partially addresses #18248
What does this implement/fix? Explain your changes.
This PR adds
criterion="log_loss"
as alternative tocriterion="entropy"
inDecisionTreeClassifier
,ExtraTreeClassifier
,RandomForestClassifier
,ExtraTreesClassifier
.Any other comments?
No entry needed as
criterion="gini"
stays the default and"entropy"
stays available. New is the equivalent option"log_loss"
.