[MRG] ENH Adds plot_confusion matrix #15083

thomasjpfan · 2019-09-24T16:08:29Z

Reference Issues/PRs

Related to #7116

What does this implement/fix? Explain your changes.

Adds plotting function for the confusion matrix.

amueller · 2019-09-24T18:40:58Z

lint ;)

…trix_v2

amueller · 2019-09-25T16:10:29Z

sklearn/metrics/_plot/confusion_matrix.py

+            fmt = '.2f' if self.normalize else 'd'
+            thresh = cm.max() / 2.
+            for i, j in product(range(cm.shape[0]), range(cm.shape[1])):
+                color = "white" if cm[i, j] < thresh else "black"


I think that's weird as it doesn't depend on the colormap.
Here's how I usually do it:
https://github.com/amueller/mglearn/blob/master/mglearn/tools.py#L76

without depending on the colormap there's no way this works, right? because someone could use greys and greys_r and they clearly need the opposite colors.

I think it should be pcolormesh not pcolor, though.

also: shouldn't this go in a separate helper function? It's probably not the only time we want to show a heatmap (grid search will need this as well). The main question then is if that will be public or not :-/

For reference: https://matplotlib.org/3.1.1/gallery/images_contours_and_fields/image_annotated_heatmap.html#sphx-glr-gallery-images-contours-and-fields-image-annotated-heatmap-py

Updated PR, with an alternative: it uses the colormap to get the colors for the text.

Looks reasonable.
Can you maybe add a test? Like calling ConfusionMatrixDisplay with np.eye(2) and plt.cm.greys and check that the text colors are black white black white and with plt.cm.greys_r and check that the text colors are white black white black?

amueller · 2019-09-25T16:14:29Z

examples/model_selection/plot_confusion_matrix.py

+titles_options = [("Confusion matrix, without normalization", False),
+                  ("Normalized confusion matrix", True)]
+for title, normalize in titles_options:
+    fig, ax = plt.subplots()


Why? There's no reason to pass ax, right?
For setting the title you could just do plt.gca().set_title(title).
Or do you knot like using the state like that?

Updated with not having to define ax and passing it in and using the axes stored in the Display object.

ah, even better.

amueller · 2019-09-25T16:59:22Z

how about adding it in all the examples that use confusion_matrix:
https://scikit-learn.org/dev/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

…trix_v2

jnothman · 2019-10-10T20:42:26Z

sklearn/metrics/_plot/confusion_matrix.py

+        select a subset of labels. If `None` is given, those that appear at
+        least once in `y_true` or `y_pred` are used in sorted order.
+
+    target_names : array-like of shape (n_classes,), default=None


Don't call this target names. That implies multiple targets. Rather, display_labels will be sufficient?

Hmmm, do you think classes or class_names would be better?

jnothman · 2019-10-10T20:43:14Z

sklearn/metrics/_plot/confusion_matrix.py

+        Includes values in confusion matrix.
+
+    normalize : bool, default=False
+        Normalizes confusion matrix.


The user might want to normalise over either axis, or altogether.

Four options? I guess we can do 'row', 'column', 'all', None?

I'm okay to not provide this flexibility, too. Another way to specify it is "all", "recall", "precision", None.

Would it make sense to use "truth" and "predicted" instead of "recall" and "precision"?

Updated PR to use 'truth' and 'predicted'. Almost feels like this should be in confusion_matrix itself.

thomasjpfan · 2019-10-23T18:59:23Z

Updated with using display_labels and using the dtype of confusion matrix to infer if the matrix is normalized.

thomasjpfan · 2019-10-25T17:35:13Z

CC @NicolasHug

NicolasHug

Needs a what's new
The example doesn't render properly https://79914-843222-gh.circle-artifacts.com/0/doc/auto_examples/model_selection/plot_confusion_matrix.html. Also I think the original color map was nicer but no strong opinion
The list at the end of https://79914-843222-gh.circle-artifacts.com/0/doc/visualizations.html#visualizations should be updated
The User guide about confusion matrix shoudl be updated too

NicolasHug · 2019-10-28T20:33:05Z

sklearn/metrics/_plot/confusion_matrix.py

+                          cmap='viridis', ax=None):
+    """Plot Confusion Matrix.
+
+    Read more in the :ref:`User Guide <visualizations>`.


This should probably link to https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix instead.

…trix_v2

sklearn/metrics/_plot/confusion_matrix.py

glemaitre · 2019-10-30T15:59:19Z

sklearn/metrics/_plot/confusion_matrix.py

+            Rotation of xtick labels.
+
+        values_format : str, default=None
+            Format specification for values in confusion matrix. If None,


Suggested change

Format specification for values in confusion matrix. If None,

Format specification for values in confusion matrix. If `None`,

Jus this nitpick

sklearn/metrics/_plot/confusion_matrix.py

thomasjpfan · 2019-11-06T19:53:38Z

Actually our estimators (to be checked in fact) would work if you pass target as an array or a list of string.

@glemaitre yes our classifiers work with list of strings, but out simple example using load_iris returns the integer encoding and not the strings. A user using load_iris will need to pass in the display_labels=iris.target_names to get the expected labeling.

thomasjpfan · 2019-11-06T19:54:14Z

Updated PR by adding normalize={'all', 'truth', 'predicted'} and None support.

amueller · 2019-11-06T20:07:58Z

Conceptually normalize should be in confusion_matrix, but maybe it's fine to keep it in the plotting for this PR, to move forward faster?

amueller · 2019-11-06T20:08:56Z

lmk if you need reviews.

jnothman · 2019-11-06T21:09:10Z

Let's call them "true", "pred" for consistency?

glemaitre

With the latest changes, LGTM

glemaitre · 2019-11-07T14:30:24Z

sklearn/metrics/_plot/confusion_matrix.py

+            Rotation of xtick labels.
+
+        values_format : str, default=None
+            Format specification for values in confusion matrix. If None,


Jus this nitpick

glemaitre · 2019-11-07T14:35:04Z

@glemaitre yes our classifiers work with list of strings, but out simple example using load_iris returns the integer encoding and not the strings. A user using load_iris will need to pass in the display_labels=iris.target_names to get the expected labeling.

So it seems that we need them in case we want to overwrite it. So we can keep it has it is until by default we don't need to specify it.

NicolasHug

Thanks @thomasjpfan , mostly looks good.

I'm slightly concerned about testing time and coupling though

NicolasHug · 2019-11-07T16:48:24Z

sklearn/metrics/_plot/confusion_matrix.py

+        Includes values in confusion matrix.
+
+    normalize : {'true', 'pred', 'all'}, default=None
+        Normalizes confusion matrix over the true, predicited conditions or


Just a suggestion

Suggested change

Normalizes confusion matrix over the true, predicited conditions or

Normalizes confusion matrix over the true (rows), predicited conditions (columns) or

NicolasHug · 2019-11-07T16:52:12Z

sklearn/metrics/_plot/confusion_matrix.py

+                          labels=labels)
+
+    if normalize == 'true':
+        cm = cm.astype('float') / cm.sum(axis=1, keepdims=True)


I think we should not convert to float (see other msg about high coupling)

NicolasHug · 2019-11-07T16:53:49Z

sklearn/metrics/_plot/confusion_matrix.py

+
+        cm = self.confusion_matrix
+        n_classes = cm.shape[0]
+        normalized = np.issubdtype(cm.dtype, np.float_)


This logic involves a strong coupling between

confusion_matrix -> plot_confusion_matrix -> ConfusionMatrixDisplay

and might cause silent bugs in the future.

I would rather pass a is_normalized parameter (or remove, see below)

NicolasHug · 2019-11-07T17:02:27Z

sklearn/metrics/_plot/confusion_matrix.py

+        if include_values:
+            self.text_ = np.empty_like(cm, dtype=object)
+            if values_format is None:
+                values_format = '.2f' if normalized else 'd'


I think that the .2g option is what we need, and you wouldn't have to use the normalized variable anymore:

In [15]: "{:.2g} -- {:.2g} -- {:.2g}".format(2, 2.0000, 2.23425) Out[15]: '2 -- 2 -- 2.2'

sklearn/metrics/_plot/confusion_matrix.py

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

NicolasHug · 2019-11-07T17:17:52Z

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

+@pytest.mark.parametrize("normalize", ['true', 'pred', 'all', None])
+@pytest.mark.parametrize("with_sample_weight", [True, False])
+@pytest.mark.parametrize("with_labels", [True, False])
+@pytest.mark.parametrize("cmap", ['viridis', 'plasma'])
+@pytest.mark.parametrize("with_custom_axes", [True, False])
+@pytest.mark.parametrize("with_display_labels", [True, False])
+@pytest.mark.parametrize("include_values", [True, False])


Do we really need each of these combinations to be tested independently?

It seems to me that most of the checks in this test could be independent tests functions. Parametrization is nice but seems way overkill here.

This will test 256 instances, and it take about 10s on my machine which is not negligible considering small increment in testing time really add up over time.

thomasjpfan · 2019-11-07T21:46:18Z

To be consistent with the plot_roc_curve, how do you feel about names or display_names instead of names?

thomasjpfan · 2019-11-07T22:30:53Z

Ah display_labels is okay, since this is this a different context. Updated PR to reduce the number of tests and to address comments.

NicolasHug

last nits

NicolasHug · 2019-11-08T13:44:37Z

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

+    assert disp.ax_ == ax
+
+    if normalize == 'true':
+        cm = cm.astype('float') / cm.sum(axis=1, keepdims=True)


you dont need the conversion anymore right?

NicolasHug · 2019-11-08T13:46:11Z

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

+@pytest.mark.parametrize("normalize", ['true', 'pred', 'all', None])
+@pytest.mark.parametrize("with_labels", [True, False])
+@pytest.mark.parametrize("with_display_labels", [True, False])
+@pytest.mark.parametrize("include_values", [True, False])


The main reason I'm not a fan of this is that such parametrization suggests that all these 4 parameters are intertwined and are dependent one to another, but in reality this isn't the case

I think we could still remove some parametrizations, but that's fine

NicolasHug · 2019-11-08T13:49:41Z

sklearn/metrics/_plot/confusion_matrix.py

+    create a :class:`ConfusionMatrixDisplay`. All parameters are stored as
+    attributes.
+
+    Read more in the :ref:`User Guide <confusion_matrix>`.


Shouldn't this link to the visualization UG?

qinhanmin2014 · 2019-11-13T08:23:26Z

sklearn/metrics/_plot/confusion_matrix.py

+    include_values : bool, default=True
+        Includes values in confusion matrix.
+
+    normalize : {'true', 'pred', 'all'}, default=None


If we decide to support normalize here, perhaps we should also support it in confusion_matrix (See #14478).
And I can't understand why we need normalize="all".

Good remark. normalize='all' will normalize by the total support.

However, I would suggest to add it to another PR.

glemaitre · 2019-11-14T10:57:29Z

I made a push to solve the conflicts

glemaitre · 2019-11-14T11:13:01Z

and I added a similar test to the other plotting for pipeline.
@qinhanmin2014 feel free to merge when it is green

glemaitre · 2019-11-14T13:44:46Z

OK merging this one. I will open a new PR to address the problem raised by @qinhanmin2014 in #15083 (comment)

thomasjpfan added 2 commits September 24, 2019 12:02

ENH Adds plot_confusion matrix

939df89

DOC Adds attributes

f106d48

thomasjpfan added 2 commits September 25, 2019 10:41

CLN Removes unneeded tests

eb36a09

Merge remote-tracking branch 'upstream/master' into plot_confusion_ma…

10d70e3

…trix_v2

amueller reviewed Sep 25, 2019

View reviewed changes

thomasjpfan added 2 commits September 25, 2019 12:44

ENH Colormap dependent text labels

7f5f029

DOC Use better name

4b29bb4

thomasjpfan added 6 commits October 10, 2019 10:17

ENH Adds format_values

835b889

ENH Adds text formating

e1ed771

Merge remote-tracking branch 'upstream/master' into plot_confusion_ma…

ed6d2fa

…trix_v2

API Changes normalize default

3671311

DOC Adds confusion matrix to another example

6a21e80

TST Adds constrast test

48c1281

jnothman reviewed Oct 10, 2019

View reviewed changes

CLN Uses display_labels

04df25d

thomasjpfan added this to the 0.22 milestone Oct 25, 2019

DOC Fix function call

511d60c

NicolasHug requested changes Oct 28, 2019

View reviewed changes

thomasjpfan added 5 commits October 28, 2019 16:39

Merge remote-tracking branch 'upstream/master' into plot_confusion_ma…

49e5afc

…trix_v2

WIP

7db73bb

WIP

7d3a802

BUG Address some comments

4eea3b1

DOC Fixes confusion matrix style

24da8db

glemaitre reviewed Oct 30, 2019

View reviewed changes

DOC Fix

449c0a1

CLN Updates options for normalization

8265856

glemaitre approved these changes Nov 7, 2019

View reviewed changes

NicolasHug reviewed Nov 7, 2019

View reviewed changes

thomasjpfan added 2 commits November 7, 2019 17:06

CLN Reduce number of tests

3fcecf6

STY Flake8

a89f662

NicolasHug approved these changes Nov 8, 2019

View reviewed changes

NicolasHug mentioned this pull request Nov 8, 2019

[MRG] DOC mention other plotting utilities in highlights #15569

Merged

CLN Address comments

c06843d

qinhanmin2014 reviewed Nov 13, 2019

View reviewed changes

glemaitre self-assigned this Nov 14, 2019

Merge remote-tracking branch 'origin/master' into pr/thomasjpfan/15083

9fcdecc

TST check fitted error with pipeline

c13b84b

glemaitre merged commit e650a20 into scikit-learn:master Nov 14, 2019

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Nov 18, 2019

ENH Adds plot_confusion matrix (scikit-learn#15083)

3210ee3

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Nov 18, 2019

ENH Adds plot_confusion matrix (scikit-learn#15083)

9407503

adrinjalali mentioned this pull request Nov 18, 2019

missing whats_new entries #15653

Closed

adrinjalali pushed a commit that referenced this pull request Nov 19, 2019

ENH Adds plot_confusion matrix (#15083)

634db6d

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020

ENH Adds plot_confusion matrix (scikit-learn#15083)

714e445

	Format specification for values in confusion matrix. If None,
	Format specification for values in confusion matrix. If `None`,

	Normalizes confusion matrix over the true, predicited conditions or
	Normalizes confusion matrix over the true (rows), predicited conditions (columns) or

[MRG] ENH Adds plot_confusion matrix #15083

[MRG] ENH Adds plot_confusion matrix #15083

Conversation

thomasjpfan commented Sep 24, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

amueller commented Sep 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller Sep 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Sep 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan commented Oct 23, 2019

thomasjpfan commented Oct 25, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan commented Nov 6, 2019

thomasjpfan commented Nov 6, 2019

amueller commented Nov 6, 2019

amueller commented Nov 6, 2019

jnothman commented Nov 6, 2019 via email

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Nov 7, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan commented Nov 7, 2019

thomasjpfan commented Nov 7, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Nov 14, 2019

glemaitre commented Nov 14, 2019

glemaitre commented Nov 14, 2019

amueller Sep 25, 2019 •

edited

Loading