-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions #11383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions #11383
Conversation
See this comment. If you specify some wrong color name like
which is also not very helpful. |
Huh, is it expected that import matplotlib.pyplot as plt
x = [1,2,3,4,5]
c = "rgbrgb" # the last 'b' is superfluous
plt.scatter(x, x, c=c)
plt.show() does not raise an exception but instead truncates the colors actually used (e.g. to "rgbrg" here)? |
Given that arrays of numbers are not truncated this should probably be handled consistently for (lists of) strings as well. |
Well AFAICT, looking at the source code, there is actually no shape checking in the case of non-mappable sequences of colors. One may want to extend a bit the scope of this PR ^^... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using np.shape(c)
seemed like a quick fix. However, at a closer look there are more fundamental poblems with parameter/size checking. This needs a closer look and probably a broder fix.
@timhoffm Agree on that. |
Shouldn't the lines c_rgb = (0.5, 0.5, 0.5)
ax.scatter(x, y, c=c_rgb) in
? |
I think this would only (need to) apply when the data has the same length as the color tuple. In such case the tuple would be interpreted as values to be colormapped. The note in the documentation is not wrong because this is generally the case (i.e. when writing some function that produces a scatter from arbitrary values), but on the other hand such rgb tuple should probably stay allowed for convenience reasons. |
703c0ce
to
e3cea18
Compare
First (and last?) overhaul of this PR, to try addressing the additional issues that were raised by @ImportanceOfBeingErnest and @timhoffm . When running the snippet import matplotlib.pyplot as plt
xx = yy = [0, 1, 2, 3]
print("Testing a scatter plot with {} data points.\n".format(len(xx)))
for c in (# Single letter-sequences
"rgby",
"rgb",
"rgbrgb",
["rgby"],
# Special cases
"red",
"none",
None,
["r", "g", "b", "none"],
# Non-valid color spec (FWIW, 'jaune' means yellow in French)
"jaune",
["jaune"],
["jaune"]*4,
# Value-mapping like
[0.5]*3,
[0.5]*4,
[0.5]*5,
# RGB values
[[1, 0, 0]],
[[1, 0, 0]]*3,
[[1, 0, 0]]*4,
[[1, 0, 0]]*5,
# RGBA values
[[1, 0, 0, 0.5]],
[[1, 0, 0, 0.5]]*3,
[[1, 0, 0, 0.5]]*4,
[[1, 0, 0, 0.5]]*5,
# Mix of valid color specs
[[1, 0, 0, 0.5]]*3 + [[1, 0, 0]],
[[1, 0, 0, 0.5], "red", "0.0"],
[[1, 0, 0, 0.5], "red", "0.0", "C5"],
[[1, 0, 0, 0.5], "red", "0.0", "C5", [0, 1, 0]],
# Mix of valid and non valid color specs
[[1, 0, 0, 0.5], "red", "jaune"],
[[1, 0, 0, 0.5], "red", "0.0", "jaune"],
[[1, 0, 0, 0.5], "red", "0.0", "C5", "jaune"],
):
fig, ax = plt.subplots(num="debug", clear=True)
try:
print("With *c* = {}:".format(c))
ax.scatter(xx, yy, c=c, edgecolors="k")
print("=> No exception raised.")
# Visually check a special case that caused me some troubles...
if c == 'none':
ax.set_facecolor("0.75") # to be sure that markers are not filled
fig.savefig("none_string.png")
except ValueError as e:
print("=> An exception has been raised:")
print(e)
finally:
print("") # to better separate the different trials the current overhaul of this PR would results in things like:
Edit: updated after ee7a854. |
Note to myself: the failure in Edit: Huh... In [1]: from matplotlib.colors import to_rgba, to_rgba_array
In [2]: to_rgba('none')
Out[2]: (0.0, 0.0, 0.0, 0.0) # OK
In [3]: to_rgba_array(['none'])
Out[3]: array([[ 0., 0., 0., 0.]]) # OK
In [4]: to_rgba_array('none')
Out[4]: array([], shape=(0, 4), dtype=float64) # Why ?! The latter case ([4]) is the one causing the aforementioned error, while I would have expected something similar to the case [3], which should have been OK AFAICT. |
The snippet is great and looks like it should be used as a test for the |
It would be great to pull all the logic out into a static private method. Then we could run all the great test examples in almost no time (no need to create axes and figures for this). |
@timhoffm Were should live such a private method? At the root of |
Technically, both would be ok. I'd make it a static method of |
@timhoffm Just to check one more time before I start doing a second overhaul of that PR : you are suggesting to pull out more or less all of those lines into a private static method, which would then allow to test things like Edit: Well actually it will be slightly more difficult than expected to write a static method for almost all the parameter preprocessing: there are a few calls to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you are right, it is a bit more difficult due to the interweaved self
. I didn't spot them on my quick look. Most of them could be disentangled and the code could generally benfit from some rearrangements. However, it might still be tricky.
You are welcome to try to get this done, but I don't require it as part of the PR. The example demonstrates that it works.
Note: To test that a specific exception occurs, you can use
with pytest.raises(ValueError):
# the failing code
or pytest.raises(exception, match=some_regexp)
to also check that the exception message matches with some_regexp
.
@timhoffm Thanks, I knew about the basic use of I refactored the tests related to |
lib/matplotlib/axes/_axes.py
Outdated
if c_array.shape in xy_shape: | ||
c = np.ma.ravel(c_array) | ||
else: | ||
if c_array.shape in ((3,), (4,)): | ||
_log.warning( | ||
"'c' kwarg looks like a **single** numeric RGB or " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking c
is not a kwarg, but a named argument. Same in line 3971 and 3978.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should now be fixed by f2fc436 🐑
@ImportanceOfBeingErnest Actually I had simply re-used the kind of phrasings already there in the code base of scatter. Hopefully I have caught all of those not-so-correct phrasings with f2fc436 ;). |
# Reuse testcase from above for a labeled data test | ||
fig, ax = plt.subplots() | ||
ax.scatter("x", "y", c="c", s="s", data=data) | ||
class TestScatter(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason, you put the tests into a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no expert in test writing but at least it seems clearer to me to group all the tests related to the same “main method/function/stuff” into a single place (a very strong argument, huh \o/). Besides if I am correct, you can then do things like
pytest lib/matplotlib/tests/test_axes.py::TestScatter
to run only the relevant subset of the tests (rather than having to find all of them with autocompletion), which can be significantly faster (and more pleasant when you are still in the debugging phase) than running the whole test file (e.g. 1.5 s vs. ~80 s on my current local machine).
But TBH, I could not find any guideline for Matplotlib that suggests to use classes rather than dumping all test functions at the root of the test file : some test files like test_ticker.py use them a lot while other files do not (like test_axes.py ^^).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don‘t think there is a policy on using or nor using Test classes. So you‘re probably right whatever you do.
Personally, I would not use classes (flat is better than nested). The grouping is already done by name ˋtest_scatter...ˋ. You can use keyword expressions ˋpytest -k scatterˋ to select a subset of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wha???? That’s amazing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer test functions over test classes (unless you need to scope fixtures to the class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that is not worth blocking the PR over.
lib/matplotlib/axes/_axes.py
Outdated
"For a sequence of values to be" | ||
" color-mapped, use the 'c' kwarg instead.") | ||
"For a sequence of values to be color-mapped," | ||
" use the 'c' named argument instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a strict distinction in the terminology between keyword arguments and named arguments. I agree that it's unfortunate to use "kwarg" here because that's the name of the variable keyword argument list.
The technical details are irrelevant here. I'd just go for "use the argument 'c' instead."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 613b5aa :) .
BTW, if there is a native English-speaker around, would they mind answering a question that often bothers me when editing docstrings ^^ : is it better to say “the x argument” or “the argument x”?
"conversion": "^'c' argument must either be valid", # bad vals | ||
} | ||
x = y = [0, 1, 2, 3] | ||
fig, ax = plt.subplots() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tests take too much time, did you consider not creating a new figure for each of them? I guess the could be possible and even quite easy given that you have a class which could store the figure as a class variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it‘s not too bad. The test runs a little less than a second on my machine. That‘s a lot for a single test. On the other hand it‘s negligible compared to the whole test suite. Moreover, we‘ve discussed a possible refactoring of the parameter parsing to a private method. That way, one could test it without ever instantiating a figure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, but lets leave it to a follow on PR.
@afvincent This needs a rebase |
@afvincent can you do a rebase? |
@timhoffm I'll try to do that tomorrow when I'll have access to one of my computers with a development version of Matplotlib. |
613b5aa
to
e9cdfc0
Compare
Rebased! (I did not modified anything else with respect to the last push.) I guess one could still reduce the execution time of the test by trying something like what @ImportanceOfBeingErnest and @timhoffm suggested. Unfortunately, I currently have a quite low bandwidth right now to explore that: if this PR is supposed to make it into the incoming release, one might prefer merging it as-it-is (once CI & everything else agree) and possibly improve the test in the next release if it matters. |
PR Summary
Not-so-quick-anymore fix for #11373. One now tries to separate exceptions depending on if they are due to:
Besides, I tried to make it a bit clearer in the docstring what actually happens with a 1D-sequence that can be interpreted as an RGB(A) color spec. This PR also now emits a warning when it looks like the user might have fallen into the trap.
For examples of the current behavior, please have a look at this comment below. Inputs are welcome on the direction this PR should take.
PR Checklist
Original post (before 2018-06-06) [for archiving purposes]
Quick fix for #11373.
As suggested by @mwaskom, rely on
np.shape(c)
rather thanc.shape
, to be able to handle non-ndarray color sequences when printing an error message inaxes.scatter
.With @ImportanceOfBeingErnest's snippet from #11373, i.e.