Skip to content

FIX: Improve *c* (color) kwarg checking in scatter and the related exceptions #11383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 45 additions & 12 deletions lib/matplotlib/axes/_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -3792,7 +3792,9 @@ def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None,
Note that *c* should not be a single numeric RGB or RGBA sequence
because that is indistinguishable from an array of values to be
colormapped. If you want to specify the same RGB or RGBA value for
all points, use a 2-D array with a single row.
all points, use a 2-D array with a single row. Otherwise, value-
matching will have precedence in case of a size matching with *x*
and *y*.

marker : `~matplotlib.markers.MarkerStyle`, optional, default: 'o'
The marker style. *marker* can be either an instance of the class
Expand Down Expand Up @@ -3876,15 +3878,15 @@ def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None,
except ValueError:
raise ValueError("'color' kwarg must be an mpl color"
" spec or sequence of color specs.\n"
"For a sequence of values to be"
" color-mapped, use the 'c' kwarg instead.")
"For a sequence of values to be color-mapped,"
" use the 'c' argument instead.")
if edgecolors is None:
edgecolors = co
if facecolors is None:
facecolors = co
if c is not None:
raise ValueError("Supply a 'c' kwarg or a 'color' kwarg"
" but not both; they differ but"
raise ValueError("Supply a 'c' argument or a 'color'"
" kwarg but not both; they differ but"
" their functionalities overlap.")
if c is None:
if facecolors is not None:
Expand Down Expand Up @@ -3925,29 +3927,60 @@ def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None,
# c is an array for mapping. The potential ambiguity
# with a sequence of 3 or 4 numbers is resolved in
# favor of mapping, not rgb or rgba.

# Convenience vars to track shape mismatch *and* conversion failures.
valid_shape = True # will be put to the test!
n_elem = -1 # used only for (some) exceptions

if c_none or co is not None:
c_array = None
else:
try:
try: # First, does 'c' look suitable for value-mapping?
c_array = np.asanyarray(c, dtype=float)
n_elem = c_array.shape[0]
if c_array.shape in xy_shape:
c = np.ma.ravel(c_array)
else:
if c_array.shape in ((3,), (4,)):
_log.warning(
"'c' argument looks like a single numeric RGB or "
"RGBA sequence, which should be avoided as value-"
"mapping will have precedence in case its length "
"matches with 'x' & 'y'. Please use a 2-D array "
"with a single row if you really want to specify "
"the same RGB or RGBA value for all points.")
# Wrong size; it must not be intended for mapping.
valid_shape = False
c_array = None
except ValueError:
# Failed to make a floating-point array; c must be color specs.
c_array = None

if c_array is None:
try:
# must be acceptable as PathCollection facecolors
try: # Then is 'c' acceptable as PathCollection facecolors?
colors = mcolors.to_rgba_array(c)
n_elem = colors.shape[0]
if colors.shape[0] not in (0, 1, x.size, y.size):
# NB: remember that a single color is also acceptable.
# Besides *colors* will be an empty array if c == 'none'.
valid_shape = False
raise ValueError
except ValueError:
# c not acceptable as PathCollection facecolor
raise ValueError("c of shape {} not acceptable as a color "
"sequence for x with size {}, y with size {}"
.format(c.shape, x.size, y.size))
if not valid_shape: # but at least one conversion succeeded.
raise ValueError(
"'c' argument has {nc} elements, which is not "
"acceptable for use with 'x' with size {xs}, "
"'y' with size {ys}."
.format(nc=n_elem, xs=x.size, ys=y.size)
)
# Both the mapping *and* the RGBA conversion failed: pretty
# severe failure => one may appreciate a verbose feedback.
raise ValueError(
"'c' argument must either be valid as mpl color(s) "
"or as numbers to be mapped to colors. "
"Here c = {}." # <- beware, could be long depending on c.
.format(c)
)
else:
colors = None # use cmap, norm after collection is created

Expand Down
163 changes: 110 additions & 53 deletions lib/matplotlib/tests/test_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1660,63 +1660,120 @@ def test_hist2d_transpose():
ax.hist2d(x, y, bins=10, rasterized=True)


@image_comparison(baseline_images=['scatter', 'scatter'])
def test_scatter_plot():
fig, ax = plt.subplots()
data = {"x": [3, 4, 2, 6], "y": [2, 5, 2, 3], "c": ['r', 'y', 'b', 'lime'],
"s": [24, 15, 19, 29]}

ax.scatter(data["x"], data["y"], c=data["c"], s=data["s"])

# Reuse testcase from above for a labeled data test
fig, ax = plt.subplots()
ax.scatter("x", "y", c="c", s="s", data=data)
class TestScatter(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason, you put the tests into a class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no expert in test writing but at least it seems clearer to me to group all the tests related to the same “main method/function/stuff” into a single place (a very strong argument, huh \o/). Besides if I am correct, you can then do things like

pytest lib/matplotlib/tests/test_axes.py::TestScatter

to run only the relevant subset of the tests (rather than having to find all of them with autocompletion), which can be significantly faster (and more pleasant when you are still in the debugging phase) than running the whole test file (e.g. 1.5 s vs. ~80 s on my current local machine).

But TBH, I could not find any guideline for Matplotlib that suggests to use classes rather than dumping all test functions at the root of the test file : some test files like test_ticker.py use them a lot while other files do not (like test_axes.py ^^).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don‘t think there is a policy on using or nor using Test classes. So you‘re probably right whatever you do.

Personally, I would not use classes (flat is better than nested). The grouping is already done by name ˋtest_scatter...ˋ. You can use keyword expressions ˋpytest -k scatterˋ to select a subset of tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wha???? That’s amazing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer test functions over test classes (unless you need to scope fixtures to the class).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is not worth blocking the PR over.

@image_comparison(baseline_images=['scatter', 'scatter'])
def test_scatter_plot(self):
fig, ax = plt.subplots()
data = {"x": [3, 4, 2, 6], "y": [2, 5, 2, 3],
"c": ['r', 'y', 'b', 'lime'], "s": [24, 15, 19, 29]}

ax.scatter(data["x"], data["y"], c=data["c"], s=data["s"])

@image_comparison(baseline_images=['scatter_marker'], remove_text=True,
extensions=['png'])
def test_scatter_marker():
fig, (ax0, ax1, ax2) = plt.subplots(ncols=3)
ax0.scatter([3, 4, 2, 6], [2, 5, 2, 3],
c=[(1, 0, 0), 'y', 'b', 'lime'],
s=[60, 50, 40, 30],
edgecolors=['k', 'r', 'g', 'b'],
marker='s')
ax1.scatter([3, 4, 2, 6], [2, 5, 2, 3],
c=[(1, 0, 0), 'y', 'b', 'lime'],
s=[60, 50, 40, 30],
edgecolors=['k', 'r', 'g', 'b'],
marker=mmarkers.MarkerStyle('o', fillstyle='top'))
# unit area ellipse
rx, ry = 3, 1
area = rx * ry * np.pi
theta = np.linspace(0, 2 * np.pi, 21)
verts = np.column_stack([np.cos(theta) * rx / area,
np.sin(theta) * ry / area])
ax2.scatter([3, 4, 2, 6], [2, 5, 2, 3],
c=[(1, 0, 0), 'y', 'b', 'lime'],
s=[60, 50, 40, 30],
edgecolors=['k', 'r', 'g', 'b'],
marker=verts)


@image_comparison(baseline_images=['scatter_2D'], remove_text=True,
extensions=['png'])
def test_scatter_2D():
x = np.arange(3)
y = np.arange(2)
x, y = np.meshgrid(x, y)
z = x + y
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=200, edgecolors='face')
# Reuse testcase from above for a labeled data test
fig, ax = plt.subplots()
ax.scatter("x", "y", c="c", s="s", data=data)

@image_comparison(baseline_images=['scatter_marker'], remove_text=True,
extensions=['png'])
def test_scatter_marker(self):
fig, (ax0, ax1, ax2) = plt.subplots(ncols=3)
ax0.scatter([3, 4, 2, 6], [2, 5, 2, 3],
c=[(1, 0, 0), 'y', 'b', 'lime'],
s=[60, 50, 40, 30],
edgecolors=['k', 'r', 'g', 'b'],
marker='s')
ax1.scatter([3, 4, 2, 6], [2, 5, 2, 3],
c=[(1, 0, 0), 'y', 'b', 'lime'],
s=[60, 50, 40, 30],
edgecolors=['k', 'r', 'g', 'b'],
marker=mmarkers.MarkerStyle('o', fillstyle='top'))
# unit area ellipse
rx, ry = 3, 1
area = rx * ry * np.pi
theta = np.linspace(0, 2 * np.pi, 21)
verts = np.column_stack([np.cos(theta) * rx / area,
np.sin(theta) * ry / area])
ax2.scatter([3, 4, 2, 6], [2, 5, 2, 3],
c=[(1, 0, 0), 'y', 'b', 'lime'],
s=[60, 50, 40, 30],
edgecolors=['k', 'r', 'g', 'b'],
verts=verts)

@image_comparison(baseline_images=['scatter_2D'], remove_text=True,
extensions=['png'])
def test_scatter_2D(self):
x = np.arange(3)
y = np.arange(2)
x, y = np.meshgrid(x, y)
z = x + y
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=200, edgecolors='face')

def test_scatter_color(self):
# Try to catch cases where 'c' kwarg should have been used.
with pytest.raises(ValueError):
plt.scatter([1, 2], [1, 2], color=[0.1, 0.2])
with pytest.raises(ValueError):
plt.scatter([1, 2, 3], [1, 2, 3], color=[1, 2, 3])

# Parameters for *test_scatter_c*. NB: assuming that the
# scatter plot will have 4 elements. The tuple scheme is:
# (*c* parameter case, exception regexp key or None if no exception)
params_test_scatter_c = [
# Single letter-sequences
("rgby", None),
("rgb", "shape"),
("rgbrgb", "shape"),
(["rgby"], "conversion"),
# Special cases
("red", None),
("none", None),
(None, None),
(["r", "g", "b", "none"], None),
# Non-valid color spec (FWIW, 'jaune' means yellow in French)
("jaune", "conversion"),
(["jaune"], "conversion"), # wrong type before wrong size
(["jaune"]*4, "conversion"),
# Value-mapping like
([0.5]*3, None), # should emit a warning for user's eyes though
([0.5]*4, None), # NB: no warning as matching size allows mapping
([0.5]*5, "shape"),
# RGB values
([[1, 0, 0]], None),
([[1, 0, 0]]*3, "shape"),
([[1, 0, 0]]*4, None),
([[1, 0, 0]]*5, "shape"),
# RGBA values
([[1, 0, 0, 0.5]], None),
([[1, 0, 0, 0.5]]*3, "shape"),
([[1, 0, 0, 0.5]]*4, None),
([[1, 0, 0, 0.5]]*5, "shape"),
# Mix of valid color specs
([[1, 0, 0, 0.5]]*3 + [[1, 0, 0]], None),
([[1, 0, 0, 0.5], "red", "0.0"], "shape"),
([[1, 0, 0, 0.5], "red", "0.0", "C5"], None),
([[1, 0, 0, 0.5], "red", "0.0", "C5", [0, 1, 0]], "shape"),
# Mix of valid and non valid color specs
([[1, 0, 0, 0.5], "red", "jaune"], "conversion"),
([[1, 0, 0, 0.5], "red", "0.0", "jaune"], "conversion"),
([[1, 0, 0, 0.5], "red", "0.0", "C5", "jaune"], "conversion"),
]

@pytest.mark.parametrize('c_case, re_key', params_test_scatter_c)
def test_scatter_c(self, c_case, re_key):
# Additional checking of *c* (introduced in #11383).
REGEXP = {
"shape": "^'c' argument has [0-9]+ elements", # shape mismatch
"conversion": "^'c' argument must either be valid", # bad vals
}
x = y = [0, 1, 2, 3]
fig, ax = plt.subplots()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests take too much time, did you consider not creating a new figure for each of them? I guess the could be possible and even quite easy given that you have a class which could store the figure as a class variable.

Copy link
Member

@timhoffm timhoffm Jun 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it‘s not too bad. The test runs a little less than a second on my machine. That‘s a lot for a single test. On the other hand it‘s negligible compared to the whole test suite. Moreover, we‘ve discussed a possible refactoring of the parameter parsing to a private method. That way, one could test it without ever instantiating a figure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, but lets leave it to a follow on PR.


def test_scatter_color():
# Try to catch cases where 'c' kwarg should have been used.
with pytest.raises(ValueError):
plt.scatter([1, 2], [1, 2], color=[0.1, 0.2])
with pytest.raises(ValueError):
plt.scatter([1, 2, 3], [1, 2, 3], color=[1, 2, 3])
if re_key is None:
ax.scatter(x, y, c=c_case, edgecolors="black")
else:
with pytest.raises(ValueError, match=REGEXP[re_key]):
ax.scatter(x, y, c=c_case, edgecolors="black")


def test_as_mpl_axes_api():
Expand Down