ENH accept class_of_interest in DecisionBoundaryDisplay to inspect multiclass classifiers #27291

glemaitre · 2023-09-04T19:15:11Z

This PR proposes an improvement to the DecisionBoundaryDisplay to address a TODO from the code.

We expose a class_of_interest parameter allowing us to plot the output of predict_proba or decision_function with binary and multiclass classifiers.

github-actions · 2023-09-04T19:16:32Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: c990014. Link to the linter CI: here}

ogrisel

I have no strong opinion as about which strategy between this PR and the one of #26995 is the best. However I think we choose the one that makes implementing what I suggest in the following the most natural:

doc/whats_new/v1.4.rst

examples/classification/plot_classification_probability.py

sklearn/inspection/_plot/decision_boundary.py

examples/classification/plot_classification_probability.py

ogrisel · 2023-09-05T16:32:36Z

BTW, I am ok to implement the max case in a follow-up PR but I would like to make sure that the design decisions made in this PR will not prevent a natural implementation of that case.

sklearn/inspection/_plot/decision_boundary.py

glemaitre · 2023-09-06T16:28:23Z

@ogrisel I quickly implemented your suggestion and it provides the following diff:

diff --git a/sklearn/inspection/_plot/decision_boundary.py b/sklearn/inspection/_plot/decision_boundary.py
index 6ac2816946..3a8b044bfd 100644
--- a/sklearn/inspection/_plot/decision_boundary.py
+++ b/sklearn/inspection/_plot/decision_boundary.py
@@ -41,16 +41,7 @@ def _check_boundary_response_method(estimator, response_method, class_of_interes
         msg = "Multi-label and multi-output multi-class classifiers are not supported"
         raise ValueError(msg)
 
-    if has_classes and len(estimator.classes_) > 2:
-        if response_method not in {"auto", "predict"} and class_of_interest is None:
-            msg = (
-                "Multiclass classifiers are only supported when response_method is"
-                " 'predict' or 'auto', or you must provide `class_of_interest` to "
-                " select a specific class to plot the decision boundary."
-            )
-            raise ValueError(msg)
-        prediction_method = "predict" if response_method == "auto" else response_method
-    elif response_method == "auto":
+    if response_method == "auto":
         prediction_method = ["decision_function", "predict_proba", "predict"]
     else:
         prediction_method = response_method
@@ -78,7 +69,8 @@ class DecisionBoundaryDisplay:
     xx1 : ndarray of shape (grid_resolution, grid_resolution)
         Second output of :func:`meshgrid <numpy.meshgrid>`.
 
-    response : ndarray of shape (grid_resolution, grid_resolution)
+    response : ndarray of shape (grid_resolution, grid_resolution) or \
+            (grid_resolution, grid_resolution, n_classes)
         Values of the response function.
 
     xlabel : str, default=None
@@ -89,7 +81,7 @@ class DecisionBoundaryDisplay:
 
     Attributes
     ----------
-    surface_ : matplotlib `QuadContourSet` or `QuadMesh`
+    surface_ : matplotlib `QuadContourSet` or `QuadMesh` or list of such objects
         If `plot_method` is 'contour' or 'contourf', `surface_` is a
         :class:`QuadContourSet <matplotlib.contour.QuadContourSet>`. If
         `plot_method` is 'pcolormesh', `surface_` is a
@@ -170,6 +162,7 @@ class DecisionBoundaryDisplay:
             Object that stores computed values.
         """
         check_matplotlib_support("DecisionBoundaryDisplay.plot")
+        import matplotlib as mpl
         import matplotlib.pyplot as plt  # noqa
 
         if plot_method not in ("contourf", "contour", "pcolormesh"):
@@ -181,7 +174,26 @@ class DecisionBoundaryDisplay:
             _, ax = plt.subplots()
 
         plot_func = getattr(ax, plot_method)
-        self.surface_ = plot_func(self.xx0, self.xx1, self.response, **kwargs)
+
+        if self.response.ndim == 2:
+            self.surface_ = plot_func(self.xx0, self.xx1, self.response, **kwargs)
+        else:  # self.response.ndim == 3
+            # create the colormap for each class
+            viridis = mpl.colormaps["viridis"].resampled(self.response.shape[-1])
+
+            self.surface_ = []
+            for class_idx, primary_color in enumerate(viridis.colors):
+                r, g, b, _ = primary_color
+                cmap = mpl.colors.LinearSegmentedColormap.from_list(
+                    f"colormap_{class_idx}", [(1.0, 1.0, 1.0, 1.0), (r, g, b, 1.0)]
+                )
+                response = np.ma.array(
+                    self.response[:, :, class_idx],
+                    mask=~(self.response.argmax(axis=2) == class_idx),
+                )
+                self.surface_.append(
+                    plot_func(self.xx0, self.xx1, response, cmap=cmap, **kwargs)
+                )
 
         if xlabel is not None or not ax.get_xlabel():
             xlabel = self.xlabel if xlabel is None else xlabel
@@ -379,11 +391,16 @@ class DecisionBoundaryDisplay:
             if is_regressor(estimator):
                 raise ValueError("Multi-output regressors are not supported")
 
-            # For the multiclass case, `_get_response_values` returns the response
-            # as-is. Thus, we have a column per class and we need to select the column
-            # corresponding to the positive class.
-            col_idx = np.flatnonzero(estimator.classes_ == class_of_interest)[0]
-            response = response[:, col_idx]
+            if class_of_interest is not None:
+                # For the multiclass case, `_get_response_values` returns the response
+                # as-is. Thus, we have a column per class and we need to select the column
+                # corresponding to the positive class.
+                col_idx = np.flatnonzero(estimator.classes_ == class_of_interest)[0]
+                response = response[:, col_idx].reshape(*xx0.shape)
+            else:
+                response = response.reshape(*xx0.shape, response.shape[-1])
+        else:
+            response = response.reshape(*xx0.shape)
 
         if xlabel is None:
             xlabel = X.columns[0] if hasattr(X, "columns") else ""
@@ -394,7 +411,7 @@ class DecisionBoundaryDisplay:
         display = DecisionBoundaryDisplay(
             xx0=xx0,
             xx1=xx1,
-            response=response.reshape(xx0.shape),
+            response=response,
             xlabel=xlabel,
             ylabel=ylabel,
         )

I feed that the changes are quite reasonable and good news is that it works for "contour", "contourf", and "pcolormesh":

While the changes are minor, having it in an additional PR would be better. We need to think about exposing parameters for the colormap and it might not be straightforward. However, I am pretty happy with the default that I implemented (resampling viridis that is the default colormap used in a scatter plot)

ogrisel · 2023-09-06T16:32:04Z

This looks great, ok for a follow-up PR then.

sklearn/inspection/_plot/decision_boundary.py

sklearn/inspection/_plot/tests/test_boundary_decision_display.py

ogrisel

Another pass of feedback. Besides, LGTM.

sklearn/inspection/_plot/decision_boundary.py

sklearn/inspection/_plot/tests/test_boundary_decision_display.py

sklearn/utils/tests/test_response.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…_label

ogrisel

LGTM once the following points are resolved.

sklearn/inspection/_plot/tests/test_boundary_decision_display.py

…_label

glemaitre · 2023-10-02T09:35:27Z

@adrinjalali Would you mind to have a look at this PR. It similar to a previous PR that we open in August after a PyLadies Berlin. As previously stated, it does yet handle one of the case that I will implement later.

adrinjalali · 2023-10-05T20:04:04Z

sklearn/inspection/_plot/decision_boundary.py

@@ -248,6 +250,14 @@ def from_estimator(
            For multiclass problems, :term:`predict` is selected when
            `response_method="auto"`.

+        class_of_interest : int, float, bool or str, default=None


I'm not sure what float and bool here mean? do we accept floats for classification targets? is bool accepted when target is boolean?

do we accept floats for classification targets?

(unfortunatelly) yes we do. The type reported here are the same than the one accepted as pos_label in thresholded metric.

adrinjalali · 2023-10-05T20:08:13Z

sklearn/inspection/_plot/decision_boundary.py

+        except AttributeError as exc:
+            # re-raise the AttributeError as a ValueError for backward compatibility
+            raise ValueError(str(exc)) from exc


this seems bad to me. Attribute error should be an attribute error, not converted to a ValueError. I would just change this. These displays are used in interactive mode, the type of exception could be changed here IMO.

Both ValueError and AttributeError are valid. AttributeError is valid since the estimator does not expose the attribute but the ValueError is valid because this is a parameter value passed by the user.

Since we are already raising a ValueError that is not semantically wrong, I agree with @ogrisel that this is a pity to break the backward compatibility.

I dislike the kind of code which is only there because somebody did something in the past and it's there for historical reasons. Makes code less maintainable over time. And I don't consider this a major backward compatibility issue since it's not like we were working and we don't now. We were failing, and we still fail, but a different error type. If you really want, we can add a message here that the type of the error will be change to an AttributeError in 1.6, but I don't think we should keep such code in the long term in the code base.

So let's take the second option of the @ogrisel comment: #27291 (comment)

Let's acknowledge the change in the changelog only.

I buy the fact that the display are usually used in a more interactive mode and this is I don't foresee a use case where someone will catch the error to do something else.

ArturoAmorQ

Some comments about the docstrings. Otherwise LGTM :)

sklearn/inspection/_plot/decision_boundary.py

ArturoAmorQ · 2023-10-09T14:19:37Z

sklearn/inspection/_plot/decision_boundary.py

                "Multiclass classifiers are only supported when response_method is"
-                " 'predict' or 'auto'"
+                " 'predict' or 'auto', or you must provide `class_of_interest` to "
+                " select a specific class to plot the decision boundary."


"Multiclass classifiers are only supported when `response_method`" " is 'predict' or 'auto'. Else you must provide `class_of_interest`" " to plot the decision boundary of a specific class."

sklearn/inspection/_plot/decision_boundary.py

examples/classification/plot_classification_probability.py

sklearn/inspection/_plot/decision_boundary.py

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

…_label

…lticlass classifiers (scikit-learn#27291) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

ENH accept in

66eb147

github-actions bot added module:inspection module:utils labels Sep 4, 2023

change pr number

fad3a1c

glemaitre added 3 commits September 4, 2023 21:17

less diff

82a4150

modify example where the feature is useful

20461de

TST add dedicated test

a08fbf7

ogrisel reviewed Sep 5, 2023

View reviewed changes

rename pos_label to class_of_interest

c825ac4

glemaitre commented Sep 6, 2023

View reviewed changes

sklearn/inspection/_plot/decision_boundary.py Outdated Show resolved Hide resolved

glemaitre added 2 commits September 6, 2023 14:04

better error message

6cb3a55

revert to original size

687b098

ogrisel reviewed Sep 6, 2023

View reviewed changes

sklearn/inspection/_plot/decision_boundary.py Outdated Show resolved Hide resolved

sklearn/inspection/_plot/tests/test_boundary_decision_display.py Outdated Show resolved Hide resolved

TST add separate test for binary and multiclass

bcf6b52

glemaitre mentioned this pull request Sep 7, 2023

ENH handle mutliclass with scores and probailities in DecisionBoundaryDisplay #26995

Closed

ogrisel changed the title ~~ENH accept pos_label in DecisionBoundaryDisplay~~ ENH accept class_of_interest in DecisionBoundaryDisplay to inspect multiclass classifiers Sep 7, 2023

ogrisel reviewed Sep 8, 2023

View reviewed changes

glemaitre and others added 6 commits September 8, 2023 17:35

Apply suggestions from code review

010a292

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

iter

38e7628

iter

6377573

iter

2ce357f

TST properly tests _check_boundary_decision_response_method

0746325

Merge remote-tracking branch 'origin/main' into decision_boundary_pos…

6b43da5

…_label

ogrisel approved these changes Sep 22, 2023

View reviewed changes

sklearn/inspection/_plot/tests/test_boundary_decision_display.py Outdated Show resolved Hide resolved

sklearn/inspection/_plot/tests/test_boundary_decision_display.py Show resolved Hide resolved

glemaitre added 2 commits September 25, 2023 11:49

address ogrisel comment

52d877f

Merge remote-tracking branch 'origin/main' into decision_boundary_pos…

d3a9bc4

…_label

glemaitre mentioned this pull request Sep 25, 2023

Adding more functionality to DecisionBoundaryDisplay #27462

Open

5 tasks

adrinjalali reviewed Oct 5, 2023

View reviewed changes

ArturoAmorQ approved these changes Oct 9, 2023

View reviewed changes

glemaitre and others added 4 commits October 10, 2023 10:20

Apply suggestions from code review

d986e6a

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

add api change

7084b23

Merge remote-tracking branch 'origin/main' into decision_boundary_pos…

cac74dd

…_label

change error message

c990014

adrinjalali approved these changes Oct 11, 2023

View reviewed changes

adrinjalali merged commit 3597f0b into scikit-learn:main Oct 11, 2023

glemaitre mentioned this pull request Oct 11, 2023

FIX handle outlier detector in _get_response_values #27565

Merged

lucyleeow mentioned this pull request Sep 6, 2024

ENH Allows plotting max class for multiclass in DecisionBoundaryDisplay #29797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH accept class_of_interest in DecisionBoundaryDisplay to inspect multiclass classifiers #27291

ENH accept class_of_interest in DecisionBoundaryDisplay to inspect multiclass classifiers #27291

glemaitre commented Sep 4, 2023 •

edited by ogrisel

Loading

github-actions bot commented Sep 4, 2023 •

edited

Loading

ogrisel left a comment

ogrisel commented Sep 5, 2023

glemaitre commented Sep 6, 2023

ogrisel commented Sep 6, 2023

ogrisel left a comment

ogrisel left a comment

glemaitre commented Oct 2, 2023

adrinjalali Oct 5, 2023

glemaitre Oct 6, 2023

adrinjalali Oct 5, 2023

glemaitre Oct 6, 2023

adrinjalali Oct 9, 2023

glemaitre Oct 10, 2023

glemaitre Oct 10, 2023

ArturoAmorQ left a comment

ArturoAmorQ Oct 9, 2023

ENH accept class_of_interest in DecisionBoundaryDisplay to inspect multiclass classifiers #27291

ENH accept class_of_interest in DecisionBoundaryDisplay to inspect multiclass classifiers #27291

Conversation

glemaitre commented Sep 4, 2023 • edited by ogrisel Loading

github-actions bot commented Sep 4, 2023 • edited Loading

✔️ Linting Passed

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Sep 5, 2023

glemaitre commented Sep 6, 2023

ogrisel commented Sep 6, 2023

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

glemaitre commented Oct 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturoAmorQ left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Sep 4, 2023 •

edited by ogrisel

Loading

github-actions bot commented Sep 4, 2023 •

edited

Loading