Skip to content

FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions #11383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

afvincent
Copy link
Contributor

@afvincent afvincent commented Jun 5, 2018

PR Summary

Not-so-quick-anymore fix for #11373. One now tries to separate exceptions depending on if they are due to:

  1. a size mismatch of the c argument => “simple” error message with the size of the relevant parameters;
  2. c argument being (at least partially) unsuitable for value-mapping or as mpl color specs => more verbose error message.

Besides, I tried to make it a bit clearer in the docstring what actually happens with a 1D-sequence that can be interpreted as an RGB(A) color spec. This PR also now emits a warning when it looks like the user might have fallen into the trap.

For examples of the current behavior, please have a look at this comment below. Inputs are welcome on the direction this PR should take.

PR Checklist

  • Code is PEP 8 compliant: CI seems happy
  • Has pytest test

Original post (before 2018-06-06) [for archiving purposes]

Quick fix for #11373.

As suggested by @mwaskom, rely on np.shape(c) rather than c.shape, to be able to handle non-ndarray color sequences when printing an error message in axes.scatter.

With @ImportanceOfBeingErnest's snippet from #11373, i.e.

import matplotlib.pyplot as plt
x = [1,2,3,4,5]
c = [1,2,3,4,5,6]  # one color more than values
plt.scatter(x, x, c=c)
plt.show()
  • Before:
AttributeError: 'list' object has no attribute 'shape'
  • After:
ValueError: c of shape (6,) not acceptable as a color sequence for x with size 5, y with size 5

@afvincent afvincent added this to the v3.0 milestone Jun 5, 2018
@afvincent afvincent requested review from timhoffm and tacaswell June 5, 2018 22:46
@ImportanceOfBeingErnest
Copy link
Member

See this comment.

If you specify some wrong color name like c = "jaune" you would get

ValueError: c of shape () not acceptable as a color sequence for x with size 5, y with size 5

which is also not very helpful.

@afvincent
Copy link
Contributor Author

Huh, is it expected that

import matplotlib.pyplot as plt
x = [1,2,3,4,5]
c = "rgbrgb"  # the last 'b' is superfluous
plt.scatter(x, x, c=c)
plt.show()

does not raise an exception but instead truncates the colors actually used (e.g. to "rgbrg" here)?

@ImportanceOfBeingErnest
Copy link
Member

Given that arrays of numbers are not truncated this should probably be handled consistently for (lists of) strings as well.

@afvincent
Copy link
Contributor Author

Well AFAICT, looking at the source code, there is actually no shape checking in the case of non-mappable sequences of colors. One may want to extend a bit the scope of this PR ^^...

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using np.shape(c) seemed like a quick fix. However, at a closer look there are more fundamental poblems with parameter/size checking. This needs a closer look and probably a broder fix.

@afvincent
Copy link
Contributor Author

@timhoffm Agree on that.

@afvincent
Copy link
Contributor Author

Shouldn't the lines

    c_rgb = (0.5, 0.5, 0.5)
    ax.scatter(x, y, c=c_rgb)

in test_axes.test_color_length_mismatch be causing a failure, accordingly to the ax.scatter docstring

        c : color, sequence, or sequence of color, optional, default: 'b'
            The marker color. Possible values:

            - A single color format string.
            - A sequence of color specifications of length n.
            - A sequence of n numbers to be mapped to colors using *cmap* and
              *norm*.
            - A 2-D array in which the rows are RGB or RGBA.

            Note that *c* should not be a single numeric RGB or RGBA sequence
            because that is indistinguishable from an array of values to be
            colormapped. If you want to specify the same RGB or RGBA value for
            all points, use a 2-D array with a single row.

?

@ImportanceOfBeingErnest
Copy link
Member

I think this would only (need to) apply when the data has the same length as the color tuple. In such case the tuple would be interpreted as values to be colormapped. The note in the documentation is not wrong because this is generally the case (i.e. when writing some function that produces a scatter from arbitrary values), but on the other hand such rgb tuple should probably stay allowed for convenience reasons.

@afvincent afvincent force-pushed the fix_scatter_error_color_shape_11373 branch from 703c0ce to e3cea18 Compare June 7, 2018 02:22
@afvincent
Copy link
Contributor Author

afvincent commented Jun 7, 2018

First (and last?) overhaul of this PR, to try addressing the additional issues that were raised by @ImportanceOfBeingErnest and @timhoffm .

When running the snippet

import matplotlib.pyplot as plt

xx = yy = [0, 1, 2, 3]
print("Testing a scatter plot with {} data points.\n".format(len(xx)))

for c in (# Single letter-sequences
          "rgby",
          "rgb",
          "rgbrgb",
          ["rgby"],
          # Special cases
          "red",
          "none",
          None,
          ["r", "g", "b", "none"],
          # Non-valid color spec (FWIW, 'jaune' means yellow in French)
          "jaune",
          ["jaune"],
          ["jaune"]*4,
          # Value-mapping like
          [0.5]*3,
          [0.5]*4,
          [0.5]*5,
          # RGB values
          [[1, 0, 0]],
          [[1, 0, 0]]*3,
          [[1, 0, 0]]*4,
          [[1, 0, 0]]*5,
          # RGBA values
          [[1, 0, 0, 0.5]],
          [[1, 0, 0, 0.5]]*3,
          [[1, 0, 0, 0.5]]*4,
          [[1, 0, 0, 0.5]]*5,
          # Mix of valid color specs
          [[1, 0, 0, 0.5]]*3 + [[1, 0, 0]],
          [[1, 0, 0, 0.5], "red", "0.0"],
          [[1, 0, 0, 0.5], "red", "0.0", "C5"],
          [[1, 0, 0, 0.5], "red", "0.0", "C5", [0, 1, 0]],
          # Mix of valid and non valid color specs
          [[1, 0, 0, 0.5], "red", "jaune"],
          [[1, 0, 0, 0.5], "red", "0.0", "jaune"],
          [[1, 0, 0, 0.5], "red", "0.0", "C5", "jaune"],
         ):

    fig, ax = plt.subplots(num="debug", clear=True)
    try:
        print("With *c* = {}:".format(c))
        ax.scatter(xx, yy, c=c, edgecolors="k")
        print("=> No exception raised.")
        # Visually check a special case that caused me some troubles...
        if c == 'none':
            ax.set_facecolor("0.75")  # to be sure that markers are not filled
            fig.savefig("none_string.png")
    except ValueError as e:
        print("=> An exception has been raised:")
        print(e)
    finally:
        print("")  # to better separate the different trials

the current overhaul of this PR would results in things like:

Testing a scatter plot with 4 data points.

With *c* = rgby:
=> No exception raised.

With *c* = rgb:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = rgbrgb:
=> An exception has been raised:
'c' kwarg has 6 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = ['rgby']:  # ***** Not sure if this one should actually raise an exception :/ *****
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = ['rgby'].

With *c* = red:
=> No exception raised.

With *c* = none:  # ***** Seems now OK: see PNG export. *****
=> No exception raised.

With *c* = None:
=> No exception raised.

With *c* = ['r', 'g', 'b', 'none']:
=> No exception raised.

With *c* = jaune:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = jaune.

With *c* = ['jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = ['jaune'].

With *c* = ['jaune', 'jaune', 'jaune', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = ['jaune', 'jaune', 'jaune', 'jaune'].

With *c* = [0.5, 0.5, 0.5]:
=> No exception raised.

With *c* = [0.5, 0.5, 0.5, 0.5]:  # ***** No warning because its length matches for value-mapping ! *****
=> No exception raised.

With *c* = [0.5, 0.5, 0.5, 0.5, 0.5]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0]]:
=> No exception raised.

With *c* = [[1, 0, 0], [1, 0, 0], [1, 0, 0]]:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0]]:
=> No exception raised.

With *c* = [[1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0]]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5]]:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5]]:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5]]:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5]]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0, 0.5], [1, 0, 0]]:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], 'red', '0.0']:
=> An exception has been raised:
'c' kwarg has 3 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'C5']:
=> No exception raised.

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'C5', [0, 1, 0]]:
=> An exception has been raised:
'c' kwarg has 5 elements, which is not acceptable for use with 'x' with size 4, 'y' with size 4.

With *c* = [[1, 0, 0, 0.5], 'red', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = [[1, 0, 0, 0.5], 'red', 'jaune'].

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = [[1, 0, 0, 0.5], 'red', '0.0', 'jaune'].

With *c* = [[1, 0, 0, 0.5], 'red', '0.0', 'C5', 'jaune']:
=> An exception has been raised:
'c' kwarg must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = [[1, 0, 0, 0.5], 'red', '0.0', 'C5', 'jaune'].

Edit: updated after ee7a854.

@afvincent afvincent changed the title FIX: Use np.shape for error message with a color sequence in scatter FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions Jun 7, 2018
@afvincent
Copy link
Contributor Author

afvincent commented Jun 7, 2018

Note to myself: the failure in matplotlib/examples/animation/rain.py looks real... There seems to be an issue with facecolor='none'.

Edit: Huh...

In [1]: from matplotlib.colors import to_rgba, to_rgba_array

In [2]: to_rgba('none')
Out[2]: (0.0, 0.0, 0.0, 0.0)  # OK

In [3]: to_rgba_array(['none'])
Out[3]: array([[ 0.,  0.,  0.,  0.]])  # OK

In [4]: to_rgba_array('none')
Out[4]: array([], shape=(0, 4), dtype=float64)  # Why ?!

The latter case ([4]) is the one causing the aforementioned error, while I would have expected something similar to the case [3], which should have been OK AFAICT.

@afvincent
Copy link
Contributor Author

FWIW, the PNG that is exported in the “special” case c='none' of the snippet above:
none_string

@ImportanceOfBeingErnest
Copy link
Member

The snippet is great and looks like it should be used as a test for the c argument.

@timhoffm
Copy link
Member

timhoffm commented Jun 7, 2018

It would be great to pull all the logic out into a static private method. Then we could run all the great test examples in almost no time (no need to create axes and figures for this).

@afvincent
Copy link
Contributor Author

@timhoffm Were should live such a private method? At the root of _axes.py or as as static method of Axes?

@timhoffm
Copy link
Member

timhoffm commented Jun 7, 2018

Technically, both would be ok. I'd make it a static method of Axes, because then you can write it directly before scatter() so the method says close to it's only use.

@afvincent
Copy link
Contributor Author

afvincent commented Jun 7, 2018

@timhoffm Just to check one more time before I start doing a second overhaul of that PR : you are suggesting to pull out more or less all of those lines into a private static method, which would then allow to test things like facecolors (i.e. colors) and edgecolors (among other things) in a more systematic manner (like the snippet above)?

Edit: Well actually it will be slightly more difficult than expected to write a static method for almost all the parameter preprocessing: there are a few calls to self in the middle of the flow :/... I still have to think about it ^^

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you are right, it is a bit more difficult due to the interweaved self. I didn't spot them on my quick look. Most of them could be disentangled and the code could generally benfit from some rearrangements. However, it might still be tricky.

You are welcome to try to get this done, but I don't require it as part of the PR. The example demonstrates that it works.

Note: To test that a specific exception occurs, you can use

with pytest.raises(ValueError):
    # the failing code

or pytest.raises(exception, match=some_regexp) to also check that the exception message matches with some_regexp.

@afvincent
Copy link
Contributor Author

@timhoffm Thanks, I knew about the basic use of pytest.raises, but not about it regexp matching feature ! To be honest I would prefer to leave out the logic refactoring for another PR, as I expect it to be a bit tricky (with things like c = self._get_patches_for_fill.get_next_color() to process...).

I refactored the tests related to scatter in test_axes, and I added a new one (test_scatter_c_kwarg) based on the example snippet above. This new test does check that the exception message (if there is one) is the expected one by matching the beginning of it. I do not know if that is very clean but it seems to do the job... FWIW, executing the whole test_axes still takes about 100-105 s on my machine: it did not seems to increase noticeably with the extra test.

if c_array.shape in xy_shape:
c = np.ma.ravel(c_array)
else:
if c_array.shape in ((3,), (4,)):
_log.warning(
"'c' kwarg looks like a **single** numeric RGB or "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking c is not a kwarg, but a named argument. Same in line 3971 and 3978.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should now be fixed by f2fc436 🐑

@afvincent
Copy link
Contributor Author

@ImportanceOfBeingErnest Actually I had simply re-used the kind of phrasings already there in the code base of scatter. Hopefully I have caught all of those not-so-correct phrasings with f2fc436 ;).

# Reuse testcase from above for a labeled data test
fig, ax = plt.subplots()
ax.scatter("x", "y", c="c", s="s", data=data)
class TestScatter(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason, you put the tests into a class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no expert in test writing but at least it seems clearer to me to group all the tests related to the same “main method/function/stuff” into a single place (a very strong argument, huh \o/). Besides if I am correct, you can then do things like

pytest lib/matplotlib/tests/test_axes.py::TestScatter

to run only the relevant subset of the tests (rather than having to find all of them with autocompletion), which can be significantly faster (and more pleasant when you are still in the debugging phase) than running the whole test file (e.g. 1.5 s vs. ~80 s on my current local machine).

But TBH, I could not find any guideline for Matplotlib that suggests to use classes rather than dumping all test functions at the root of the test file : some test files like test_ticker.py use them a lot while other files do not (like test_axes.py ^^).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don‘t think there is a policy on using or nor using Test classes. So you‘re probably right whatever you do.

Personally, I would not use classes (flat is better than nested). The grouping is already done by name ˋtest_scatter...ˋ. You can use keyword expressions ˋpytest -k scatterˋ to select a subset of tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wha???? That’s amazing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer test functions over test classes (unless you need to scope fixtures to the class).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is not worth blocking the PR over.

"For a sequence of values to be"
" color-mapped, use the 'c' kwarg instead.")
"For a sequence of values to be color-mapped,"
" use the 'c' named argument instead.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a strict distinction in the terminology between keyword arguments and named arguments. I agree that it's unfortunate to use "kwarg" here because that's the name of the variable keyword argument list.

The technical details are irrelevant here. I'd just go for "use the argument 'c' instead."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 613b5aa :) .

BTW, if there is a native English-speaker around, would they mind answering a question that often bothers me when editing docstrings ^^ : is it better to say “the x argument” or “the argument x”?

"conversion": "^'c' argument must either be valid", # bad vals
}
x = y = [0, 1, 2, 3]
fig, ax = plt.subplots()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests take too much time, did you consider not creating a new figure for each of them? I guess the could be possible and even quite easy given that you have a class which could store the figure as a class variable.

Copy link
Member

@timhoffm timhoffm Jun 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it‘s not too bad. The test runs a little less than a second on my machine. That‘s a lot for a single test. On the other hand it‘s negligible compared to the whole test suite. Moreover, we‘ve discussed a possible refactoring of the parameter parsing to a private method. That way, one could test it without ever instantiating a figure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, but lets leave it to a follow on PR.

@tacaswell
Copy link
Member

tacaswell commented Jul 9, 2018

@timhoffm This needs a rebase. 🐑 sorry,

@afvincent This needs a rebase

@tacaswell tacaswell requested a review from efiring July 9, 2018 20:02
@timhoffm
Copy link
Member

timhoffm commented Jul 9, 2018

@afvincent can you do a rebase?

@afvincent
Copy link
Contributor Author

@timhoffm I'll try to do that tomorrow when I'll have access to one of my computers with a development version of Matplotlib.

@afvincent afvincent force-pushed the fix_scatter_error_color_shape_11373 branch from 613b5aa to e9cdfc0 Compare July 11, 2018 01:14
@afvincent
Copy link
Contributor Author

Rebased! (I did not modified anything else with respect to the last push.)

I guess one could still reduce the execution time of the test by trying something like what @ImportanceOfBeingErnest and @timhoffm suggested. Unfortunately, I currently have a quite low bandwidth right now to explore that: if this PR is supposed to make it into the incoming release, one might prefer merging it as-it-is (once CI & everything else agree) and possibly improve the test in the next release if it matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants