FIX: Improve c (color) kwarg checking in scatter and the related exceptions #11383

afvincent · 2018-06-05T22:45:05Z

PR Summary

Not-so-quick-anymore fix for #11373. One now tries to separate exceptions depending on if they are due to:

a size mismatch of the c argument => “simple” error message with the size of the relevant parameters;
c argument being (at least partially) unsuitable for value-mapping or as mpl color specs => more verbose error message.

Besides, I tried to make it a bit clearer in the docstring what actually happens with a 1D-sequence that can be interpreted as an RGB(A) color spec. This PR also now emits a warning when it looks like the user might have fallen into the trap.

For examples of the current behavior, please have a look at this comment below. Inputs are welcome on the direction this PR should take.

PR Checklist

Code is PEP 8 compliant: CI seems happy
Has pytest test

Original post (before 2018-06-06) [for archiving purposes]

Quick fix for #11373.

As suggested by @mwaskom, rely on np.shape(c) rather than c.shape, to be able to handle non-ndarray color sequences when printing an error message in axes.scatter.

With @ImportanceOfBeingErnest's snippet from #11373, i.e.

import matplotlib.pyplot as plt
x = [1,2,3,4,5]
c = [1,2,3,4,5,6]  # one color more than values
plt.scatter(x, x, c=c)
plt.show()

Before:

AttributeError: 'list' object has no attribute 'shape'

After:

ValueError: c of shape (6,) not acceptable as a color sequence for x with size 5, y with size 5

ImportanceOfBeingErnest · 2018-06-05T23:18:28Z

See this comment.

If you specify some wrong color name like c = "jaune" you would get

ValueError: c of shape () not acceptable as a color sequence for x with size 5, y with size 5

which is also not very helpful.

afvincent · 2018-06-06T00:34:42Z

Huh, is it expected that

import matplotlib.pyplot as plt
x = [1,2,3,4,5]
c = "rgbrgb"  # the last 'b' is superfluous
plt.scatter(x, x, c=c)
plt.show()

does not raise an exception but instead truncates the colors actually used (e.g. to "rgbrg" here)?

ImportanceOfBeingErnest · 2018-06-06T00:54:40Z

Given that arrays of numbers are not truncated this should probably be handled consistently for (lists of) strings as well.

afvincent · 2018-06-06T01:10:07Z

Well AFAICT, looking at the source code, there is actually no shape checking in the case of non-mappable sequences of colors. One may want to extend a bit the scope of this PR ^^...

timhoffm

Using np.shape(c) seemed like a quick fix. However, at a closer look there are more fundamental poblems with parameter/size checking. This needs a closer look and probably a broder fix.

afvincent · 2018-06-06T01:28:14Z

@timhoffm Agree on that.

afvincent · 2018-06-06T05:43:42Z

Shouldn't the lines

    c_rgb = (0.5, 0.5, 0.5)
    ax.scatter(x, y, c=c_rgb)

in test_axes.test_color_length_mismatch be causing a failure, accordingly to the ax.scatter docstring

        c : color, sequence, or sequence of color, optional, default: 'b'
            The marker color. Possible values:

            - A single color format string.
            - A sequence of color specifications of length n.
            - A sequence of n numbers to be mapped to colors using *cmap* and
              *norm*.
            - A 2-D array in which the rows are RGB or RGBA.

            Note that *c* should not be a single numeric RGB or RGBA sequence
            because that is indistinguishable from an array of values to be
            colormapped. If you want to specify the same RGB or RGBA value for
            all points, use a 2-D array with a single row.

?

ImportanceOfBeingErnest · 2018-06-06T09:58:39Z

I think this would only (need to) apply when the data has the same length as the color tuple. In such case the tuple would be interpreted as values to be colormapped. The note in the documentation is not wrong because this is generally the case (i.e. when writing some function that produces a scatter from arbitrary values), but on the other hand such rgb tuple should probably stay allowed for convenience reasons.

afvincent · 2018-06-07T02:26:42Z

First (and last?) overhaul of this PR, to try addressing the additional issues that were raised by @ImportanceOfBeingErnest and @timhoffm .

When running the snippet

import matplotlib.pyplot as plt

xx = yy = [0, 1, 2, 3]
print("Testing a scatter plot with {} data points.\n".format(len(xx)))

for c in (# Single letter-sequences
          "rgby",
          "rgb",
          "rgbrgb",
          ["rgby"],
          # Special cases
          "red",
          "none",
          None,
          ["r", "g", "b", "none"],
          # Non-valid color spec (FWIW, 'jaune' means yellow in French)
          "jaune",
          ["jaune"],
          ["jaune"]*4,
          # Value-mapping like
          [0.5]*3,
          [0.5]*4,
          [0.5]*5,
          # RGB values
          [[1, 0, 0]],
          [[1, 0, 0]]*3,
          [[1, 0, 0]]*4,
          [[1, 0, 0]]*5,
          # RGBA values
          [[1, 0, 0, 0.5]],
          [[1, 0, 0, 0.5]]*3,
          [[1, 0, 0, 0.5]]*4,
          [[1, 0, 0, 0.5]]*5,
          # Mix of valid color specs
          [[1, 0, 0, 0.5]]*3 + [[1, 0, 0]],
          [[1, 0, 0, 0.5], "red", "0.0"],
          [[1, 0, 0, 0.5], "red", "0.0", "C5"],
          [[1, 0, 0, 0.5], "red", "0.0", "C5", [0, 1, 0]],
          # Mix of valid and non valid color specs
          [[1, 0, 0, 0.5], "red", "jaune"],
          [[1, 0, 0, 0.5], "red", "0.0", "jaune"],
          [[1, 0, 0, 0.5], "red", "0.0", "C5", "jaune"],
         ):

    fig, ax = plt.subplots(num="debug", clear=True)
    try:
        print("With *c* = {}:".format(c))
        ax.scatter(xx, yy, c=c, edgecolors="k")
        print("=> No exception raised.")
        # Visually check a special case that caused me some troubles...
        if c == 'none':
            ax.set_facecolor("0.75")  # to be sure that markers are not filled
            fig.savefig("none_string.png")
    except ValueError as e:
        print("=> An exception has been raised:")
        print(e)
    finally:
        print("")  # to better separate the different trials

the current overhaul of this PR would results in things like:

Testing a scatter plot with 4 data points.

With *c* = rgby:
=> No exception raised.

With *c* = rgb:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = rgbrgb:
=> An exception has been raised:
'c' kwarg has 6 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = ['rgby']:  # ***** Not sure if this one should actually raise an exception :/ *****
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = ['rgby'].

With *c* = red:
=> No exception raised.

With *c* = none:  # ***** Seems now OK: see PNG export. *****
=> No exception raised.

With *c* = None:
=> No exception raised.

With *c* = ['r', 'g', 'b', 'none']:
=> No exception raised.

With *c* = jaune:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = jaune.

With *c* = ['jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = ['jaune'].

With *c* = ['jaune', 'jaune', 'jaune', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = ['jaune', 'jaune', 'jaune', 'jaune'].

With *c* = [0.5, 0.5, 0.5]:
=> No exception raised.

With *c* = [0.5, 0.5, 0.5, 0.5]:  # ***** No warning because its length matches for value-mapping ! *****
=> No exception raised.

With *c* = [0.5, 0.5, 0.5, 0.5, 0.5]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0]]:
=> No exception raised.

With *c* = [[1, 0, 0], [1, 0, 0], [1, 0, 0]]:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0]]:
=> No exception raised.

With *c* = [[1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0]]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5]]:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5]]:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5]]:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5]]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0]]:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], 'red', '0.0']:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'C5']:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'C5', [0, 1, 0]]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], 'red', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = [[1, 0, 0, 0.5], 'red', 'jaune'].

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = [[1, 0, 0, 0.5], 'red', '0.0', 'jaune'].

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'C5', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = [[1, 0, 0, 0.5], 'red', '0.0', 'C5', 'jaune'].

Edit: updated after ee7a854.

afvincent · 2018-06-07T02:49:28Z

Note to myself: the failure in matplotlib/examples/animation/rain.py looks real... There seems to be an issue with facecolor='none'.

Edit: Huh...

In [1]: from matplotlib.colors import to_rgba, to_rgba_array

In [2]: to_rgba('none')
Out[2]: (0.0, 0.0, 0.0, 0.0)  # OK

In [3]: to_rgba_array(['none'])
Out[3]: array([[ 0.,  0.,  0.,  0.]])  # OK

In [4]: to_rgba_array('none')
Out[4]: array([], shape=(0, 4), dtype=float64)  # Why ?!

The latter case ([4]) is the one causing the aforementioned error, while I would have expected something similar to the case [3], which should have been OK AFAICT.

afvincent · 2018-06-07T04:02:05Z

FWIW, the PNG that is exported in the “special” case c='none' of the snippet above:

ImportanceOfBeingErnest · 2018-06-07T09:31:13Z

The snippet is great and looks like it should be used as a test for the c argument.

timhoffm · 2018-06-07T18:09:07Z

It would be great to pull all the logic out into a static private method. Then we could run all the great test examples in almost no time (no need to create axes and figures for this).

afvincent · 2018-06-07T18:27:03Z

@timhoffm Were should live such a private method? At the root of _axes.py or as as static method of Axes?

timhoffm · 2018-06-07T18:34:37Z

Technically, both would be ok. I'd make it a static method of Axes, because then you can write it directly before scatter() so the method says close to it's only use.

afvincent · 2018-06-07T19:02:54Z

@timhoffm Just to check one more time before I start doing a second overhaul of that PR : you are suggesting to pull out more or less all of those lines into a private static method, which would then allow to test things like facecolors (i.e. colors) and edgecolors (among other things) in a more systematic manner (like the snippet above)?

Edit: Well actually it will be slightly more difficult than expected to write a static method for almost all the parameter preprocessing: there are a few calls to self in the middle of the flow :/... I still have to think about it ^^

timhoffm

Actually, you are right, it is a bit more difficult due to the interweaved self. I didn't spot them on my quick look. Most of them could be disentangled and the code could generally benfit from some rearrangements. However, it might still be tricky.

You are welcome to try to get this done, but I don't require it as part of the PR. The example demonstrates that it works.

Note: To test that a specific exception occurs, you can use

with pytest.raises(ValueError):
    # the failing code

or pytest.raises(exception, match=some_regexp) to also check that the exception message matches with some_regexp.

afvincent · 2018-06-08T05:24:17Z

@timhoffm Thanks, I knew about the basic use of pytest.raises, but not about it regexp matching feature ! To be honest I would prefer to leave out the logic refactoring for another PR, as I expect it to be a bit tricky (with things like c = self._get_patches_for_fill.get_next_color() to process...).

I refactored the tests related to scatter in test_axes, and I added a new one (test_scatter_c_kwarg) based on the example snippet above. This new test does check that the exception message (if there is one) is the expected one by matching the beginning of it. I do not know if that is very clean but it seems to do the job... FWIW, executing the whole test_axes still takes about 100-105 s on my machine: it did not seems to increase noticeably with the extra test.

ImportanceOfBeingErnest · 2018-06-08T08:11:37Z

lib/matplotlib/axes/_axes.py

                if c_array.shape in xy_shape:
                    c = np.ma.ravel(c_array)
                else:
+                    if c_array.shape in ((3,), (4,)):
+                        _log.warning(
+                            "'c' kwarg looks like a **single** numeric RGB or "


Strictly speaking c is not a kwarg, but a named argument. Same in line 3971 and 3978.

Should now be fixed by f2fc436 🐑

afvincent · 2018-06-08T17:13:21Z

@ImportanceOfBeingErnest Actually I had simply re-used the kind of phrasings already there in the code base of scatter. Hopefully I have caught all of those not-so-correct phrasings with f2fc436 ;).

timhoffm · 2018-06-09T10:33:14Z

lib/matplotlib/tests/test_axes.py

-    # Reuse testcase from above for a labeled data test
-    fig, ax = plt.subplots()
-    ax.scatter("x", "y", c="c", s="s", data=data)
+class TestScatter(object):


Is there a particular reason, you put the tests into a class?

I am no expert in test writing but at least it seems clearer to me to group all the tests related to the same “main method/function/stuff” into a single place (a very strong argument, huh \o/). Besides if I am correct, you can then do things like

pytest lib/matplotlib/tests/test_axes.py::TestScatter

to run only the relevant subset of the tests (rather than having to find all of them with autocompletion), which can be significantly faster (and more pleasant when you are still in the debugging phase) than running the whole test file (e.g. 1.5 s vs. ~80 s on my current local machine).

But TBH, I could not find any guideline for Matplotlib that suggests to use classes rather than dumping all test functions at the root of the test file : some test files like test_ticker.py use them a lot while other files do not (like test_axes.py ^^).

I don‘t think there is a policy on using or nor using Test classes. So you‘re probably right whatever you do.

Personally, I would not use classes (flat is better than nested). The grouping is already done by name ˋtest_scatter...ˋ. You can use keyword expressions ˋpytest -k scatterˋ to select a subset of tests.

Wha???? That’s amazing.

I prefer test functions over test classes (unless you need to scope fixtures to the class).

But that is not worth blocking the PR over.

timhoffm · 2018-06-09T10:51:03Z

lib/matplotlib/axes/_axes.py

-                                 "For a sequence of values to be"
-                                 " color-mapped, use the 'c' kwarg instead.")
+                                 "For a sequence of values to be color-mapped,"
+                                 " use the 'c' named argument instead.")


I don't think there is a strict distinction in the terminology between keyword arguments and named arguments. I agree that it's unfortunate to use "kwarg" here because that's the name of the variable keyword argument list.

The technical details are irrelevant here. I'd just go for "use the argument 'c' instead."

See 613b5aa :) .

BTW, if there is a native English-speaker around, would they mind answering a question that often bothers me when editing docstrings ^^ : is it better to say “the x argument” or “the argument x”?

ImportanceOfBeingErnest · 2018-06-09T21:15:59Z

lib/matplotlib/tests/test_axes.py

+            "conversion": "^'c' argument must either be valid",  # bad vals
+            }
+        x = y = [0, 1, 2, 3]
+        fig, ax = plt.subplots()


If the tests take too much time, did you consider not creating a new figure for each of them? I guess the could be possible and even quite easy given that you have a class which could store the figure as a class variable.

Well, it‘s not too bad. The test runs a little less than a second on my machine. That‘s a lot for a single test. On the other hand it‘s negligible compared to the whole test suite. Moreover, we‘ve discussed a possible refactoring of the parameter parsing to a private method. That way, one could test it without ever instantiating a figure.

This is a good idea, but lets leave it to a follow on PR.

tacaswell · 2018-07-09T20:01:53Z

~~@timhoffm This needs a rebase.~~ 🐑 sorry,

@afvincent This needs a rebase

timhoffm · 2018-07-09T20:04:38Z

@afvincent can you do a rebase?

afvincent · 2018-07-10T03:48:52Z

@timhoffm I'll try to do that tomorrow when I'll have access to one of my computers with a development version of Matplotlib.

afvincent · 2018-07-11T01:22:05Z

Rebased! (I did not modified anything else with respect to the last push.)

I guess one could still reduce the execution time of the test by trying something like what @ImportanceOfBeingErnest and @timhoffm suggested. Unfortunately, I currently have a quite low bandwidth right now to explore that: if this PR is supposed to make it into the incoming release, one might prefer merging it as-it-is (once CI & everything else agree) and possibly improve the test in the next release if it matters.

afvincent added this to the v3.0 milestone Jun 5, 2018

afvincent requested review from timhoffm and tacaswell June 5, 2018 22:46

timhoffm reviewed Jun 6, 2018

View reviewed changes

afvincent added the status: work in progress label Jun 6, 2018

afvincent force-pushed the fix_scatter_error_color_shape_11373 branch from 703c0ce to e3cea18 Compare June 7, 2018 02:22

afvincent changed the title ~~FIX: Use np.shape for error message with a color sequence in scatter~~ FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions Jun 7, 2018

afvincent added the API: argument checking label Jun 7, 2018

timhoffm approved these changes Jun 7, 2018

View reviewed changes

ImportanceOfBeingErnest reviewed Jun 8, 2018

View reviewed changes

timhoffm reviewed Jun 9, 2018

View reviewed changes

afvincent added status: needs review and removed status: work in progress labels Jun 9, 2018

ImportanceOfBeingErnest reviewed Jun 9, 2018

View reviewed changes

dstansby added the status: needs rebase label Jun 25, 2018

jason-neal mentioned this pull request Jul 2, 2018

Scheduled monthly dependency update for July DanielAndreasen/SWEETer-Cat#122

Merged

tacaswell requested a review from efiring July 9, 2018 20:02

afvincent added 5 commits July 10, 2018 18:05

improve c kwarg checking (and error messages) in scatter

b1cd0a3

fix an issue when *c* == 'none'

c77ee58

factor existing scatter tests into a class

b68cc51

add an explicit test about the 'c' kwarg for scatter

607b4d6

fix naming of 'c' and drop the adjective 'named'

e9cdfc0

afvincent force-pushed the fix_scatter_error_color_shape_11373 branch from 613b5aa to e9cdfc0 Compare July 11, 2018 01:14

afvincent removed the status: needs rebase label Jul 11, 2018

tacaswell merged commit e3a8004 into matplotlib:master Jul 15, 2018

timhoffm mentioned this pull request Jul 15, 2018

Refactor color parsing of Axes.scatter #11663

Merged

2 tasks

QuLogic removed the status: needs review label Jul 20, 2018

ImportanceOfBeingErnest mentioned this pull request Jul 25, 2018

Passing an incorrectly sized colour list to scatter should raise a relevant error #11373

Closed

jason-neal mentioned this pull request Aug 21, 2018

Scatter plot with non-sequence ´c´ color should give a better Error message. #10365

Closed

afvincent mentioned this pull request Aug 23, 2018

Wrong Error Message #11919

Closed

jklymak mentioned this pull request Sep 5, 2018

scatter() behavior with different size inputs depends on the backend but both variants are inconsistent with the docs #12021

Closed

jklymak mentioned this pull request Oct 7, 2018

scatter() does not accept gray strings anymore #12429

Closed

ImportanceOfBeingErnest mentioned this pull request Oct 29, 2018

plt.scatter crashes because overwrites the colors to an empty list #12664

Closed

Uh oh!

FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions #11383

FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions #11383

Uh oh!

Conversation

afvincent commented Jun 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Original post (before 2018-06-06) [for archiving purposes]

Uh oh!

ImportanceOfBeingErnest commented Jun 5, 2018

Uh oh!

afvincent commented Jun 6, 2018

Uh oh!

ImportanceOfBeingErnest commented Jun 6, 2018

Uh oh!

afvincent commented Jun 6, 2018

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

afvincent commented Jun 6, 2018

Uh oh!

afvincent commented Jun 6, 2018

Uh oh!

ImportanceOfBeingErnest commented Jun 6, 2018

Uh oh!

afvincent commented Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afvincent commented Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afvincent commented Jun 7, 2018

Uh oh!

ImportanceOfBeingErnest commented Jun 7, 2018

Uh oh!

timhoffm commented Jun 7, 2018

Uh oh!

afvincent commented Jun 7, 2018

Uh oh!

timhoffm commented Jun 7, 2018

Uh oh!

afvincent commented Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timhoffm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afvincent commented Jun 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afvincent commented Jun 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timhoffm Jun 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

FIX: Improve c (color) kwarg checking in scatter and the related exceptions #11383

FIX: Improve c (color) kwarg checking in scatter and the related exceptions #11383

afvincent commented Jun 5, 2018 •

edited

Loading

afvincent commented Jun 7, 2018 •

edited

Loading

afvincent commented Jun 7, 2018 •

edited

Loading

afvincent commented Jun 7, 2018 •

edited

Loading

timhoffm left a comment •

edited

Loading

timhoffm Jun 9, 2018 •

edited

Loading

tacaswell commented Jul 9, 2018 •

edited

Loading