-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Allows plotting max class for multiclass in DecisionBoundaryDisplay
#29797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm also going to try amend example: Logistic Regression 3-class Classifier to show a plot using this enhancement. Though as we use the same |
I would almost argue that we be removing the I'm thinking that we could instead modify this one: https://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html#sphx-glr-auto-examples-classification-plot-classification-probability-py and add an additional column to the plot with the max. Like this we could remove the OvR LR (it is deprecated), and instead use a Nystroem with LR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups I see I didn't post some review that I had. This should be partial comments.
DecisionBoundaryDisplay
DecisionBoundaryDisplay
Do you want me to do this in this PR or a separate PR (since removing OvR LR may technically not be considered part of this work)? |
We can do it in this PR and limit the change to this single example. |
Cool. I will have a look now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Maybe @ogrisel could have a look since we discussed that feature some time ago. |
@ogrisel gentle ping about this 😬 Thank you! |
@lucyleeow I pushed two commits of changes I wanted to do while reviewing the PR:
I will do a full review tomorrow but this LGTM. |
# plot the probability estimate provided by the classifier | ||
disp = DecisionBoundaryDisplay.from_estimator( | ||
classifier, | ||
X, | ||
X_train, | ||
response_method="predict_proba", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pitfall of this method. By default, it would use response_method="decision_function"
which is much harder to interpret in my opinion (especially when comparing different model classes).
I think we should change the "auto" policy to use favor predict_proba
when available but this should rather be done in a dedicated follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if we use response_method="predict_proba"
, maybe the vmin=0
and vmax=1
parameters could be set automatically to make the use DecisionBoundaryDisplay
terser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks indeed like a good suggestion.
FYI @ogrisel I'm just going to fix the merge conflicts |
…oid raising warning
Thanks @lucyleeow! |
Thanks for finishing this one off @ogrisel ! |
Thanks @ogrisel |
@lucyleeow If you are looking for follow-up improvements, please consider those two comments: |
FYI this broke the CI when matplotlib is not installed, fix in #30971 |
Reference Issues/PRs
Towards #27462
What does this implement/fix? Explain your changes.
Allows
DecisionBoundaryDisplay
to represent all classes for multiclassdecision_function
andpredict_probas
by plotting the class with the max response at each point.Have closely followed the code here: #27291 (comment)
Not 100% on what the colour API should be, open to change.
contour
andcontourf
have acolors
parameter, which can be used instead ofcmap
BUTpcolormesh
only hascmap
. For simplicity, I've decided to only allow users to passcmap
. This probably makes more sense in this context vs a list of colors to cycle through.Any other comments?
WIP - need to add tests once we're happy with plot appearance and API