-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix scatter edgecolor for unfilled points #10008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmmm. I wonder if this the right approach. If the markers don't have faces, it seems a little odd to throw out the specified edge color to use the specified face color. |
I think I understand the motivation. If I want a bunch of markers w the same color it’s a bit of a pain that some have to be specified w facecolor and some w edgecolor and it’s not always clear which is which. Not sure this is the right solution to that problem. Perhaps you had another motivation or use case in mind as well? |
Agreed. I think we should respect the edge color if it's specified. But if it's not and face color is specified, then go with that. |
Oh wait, I misread the original description. maybe that's what's going on. |
Oh, OK, I see. Shouldn't read issues on my phone. I think I support this change. But its a breaking API change, so that could be a problem. If this is the way to go:
For non-filled markers, the `edgecolors` kwarg
is ignored and forced to 'face' internally.
|
The rationale for the existing behavior is probably something like this: The primary color for filled markers is the face color; the edge color is a style detail (e.g., black outlines, versus no outlines). This implies that the primary color--which is the sole color--for unfilled markers should be the face. The present behavior allows this to be achieved with a single set of color kwargs, regardless of whether it is applied to filled or unfilled markers. The behavior proposed in this PR would mean that if an application allows the user to select the marker, and it is desired that filled markers have that black outline, the application logic would have to modify the color kwargs depending on whether the user selected a filled or an unfilled marker. |
I think we're still getting things backwards. In the spirit of TDD, here's the test that I think should be included. Curious what you think: # in test_axes.py
import numpy.testing as nptest
import matplotlib.colors as mcolors
@pytest.mark.parametrize(('facecolor', 'edgecolor', 'expected_color'), [
('red', 'cyan', 'cyan'),
('none', 'cyan', 'cyan'),
('red', 'none', 'red')
])
def test_scatter_color_handling(facecolor, edgecolor, expected_color):
fig, ax = plt.subplots()
artist = ax.scatter(x=[1, 2, 3], y=[1, 2, 3], s=550,
linewidth=5, marker='2',
facecolor=facecolor,
edgecolor=edgecolor)
result_color = artist.get_edgecolor()[0]
assert mcolors.to_hex(result_color) == mcolors.to_hex(expected_color) |
For all markers the The current behaviour is awkward because it breaks these associations:
|
28696b9
to
88cd5fd
Compare
@phobson I think thats what @has21k would like. I think that @efiring was suggesting was that the first test should come back red so that: for marker in ['x', 's']:
ax.scatter([1], [1], marker=marker, edgecolor='r', facecolor='c') would return cyan markers for the x and cyan markers w/ a red border for the next call. I could go either way. Its a little annoying that there are filled markers and stroked markers at all... |
On 2017/12/14 11:36 AM, Hassan Kibirige wrote:
For all markers the |linewidth| controls the size of the edge. If you go
with existing behaviour, the |linewidth| gets to control the size of the
marker itself.
I'm sorry, but I don't understand what you mean here, in the second
sentence above.
|
It is my fault, the sentence is murky, discard it. The one below emphasises better what I was trying to say. |
a27175f
to
5f85db6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the typo and the fact it needs a rebase, this behaviour seems right to me. It is an API change, but I don't think it will break too many people.
lib/matplotlib/axes/_axes.py
Outdated
@@ -3873,8 +3873,8 @@ def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None, | |||
If it is 'none', the patch boundary will not | |||
be drawn. | |||
For non-filled markers, the `edgecolors` kwarg | |||
is ignored and forced to 'face' internally. | |||
For non-filled markers , the `edgecolors` kwarg (if it is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-filled markers , the `edgecolors` kwarg (if it is not | |
For non-filled markers, the `edgecolors` kwarg (if it is not |
5f85db6
to
493660a
Compare
For unfilled markers, the edgecolor -- if specified -- has precedence over the facecolor. closes has2k1/plotnine#100
493660a
to
d2bf038
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes more changes than you might think, so I'm blocking it for now. I will elaborate in the comments.
#18480 is a related attempt to improve the handling of color options, which may occur in bewildering variety. In general, facecolor can be None, "none", or a color spec or array of such. Edgecolor can be None, "none", "face", or a colorspec or array of such. In both cases, None generally means a default of some sort, possibly from an rcParam. In addition, there may be an array to be used for color mapping, which case it could be for the face, the edge, or both (if edgecolor is "face"). Collections have built-in logic for deciding what the rendered face and edge colors are; scatter uses its own logic, even though it returns a collection. In mpl 3.3 (and probably going a long way back) scatter gives the following for the case without color mapping: This PR (#10008) does the following: As you see, 4 of the cases are changed. The one in the lower RH corner is what I was referring to in #10008 (comment) above. I'm coming around to the view that the #10008 version is much more internally consistent than current behavior. I'm wondering whether it might be implemented by relying more on standard collection behavior, with less special logic for scatter. This could cause quite a bit of breakage; how to handle it might need more thought. I suspect a lot of people use scatter and are expecting the current behavior. |
Thanks for the great example - the question is whether we think the color we see in a stroked marker like "x" is an edgecolor or a facecolor. I guess I can see both points of view. Given this hasn't seen a lot of take-up in the last 3 years, I wonder if we should just close it, and/or turn it into a documentation issue... |
Also related: #17850. |
As discussed on the call today, the entire In the meantime, here's my proposal for the "ideal" API. Intro/Justification/Existing uses for unfilled markersMarkers as outlinesMarkers with no For example, when you're highlighting certain points in a scatter plot, you can easily encircle them with boxes. You probably want these boxes to be big enough to really highlight the points of interest (i.e. have large plt.scatter(x, y, s=10, color='g')
plt.scatter(x[special], y[special], marker='x', s=100, lw=1, color='k') Now, if we switch to using squares instead, I don't think it's too much to ask of the user that they figure out to set "facecolor" to plt.scatter(x, y, s=10, c='g')
plt.scatter(x[special], y[special], marker='s', s=100, lw=1, color='k', fc='none') This provides justification for why Markers as "dots"On the other hand, using filled markers typically comes with a different desired semantics. Say a user instead wants to make a scatter plot with circles and x's side-by-side: plt.scatter(x, y, marker='o', color='b')
plt.scatter(x, z, marker='x', color='g') this works fine. But now suppose the user notices that they can add nice edges to their markers, so get pretty outlines: plt.scatter(x, y, marker='o', color='b', edgecolor='k')
plt.scatter(x, z, marker='x', color='g', edgecolor='k') Oof, that doesn't work very well. Well, after all, what the user probably wanted all along was: plt.scatter(x, y, marker='o', color='b', edgecolor='k')
plt.scatter(x, z, marker='X', color='g', edgecolor='k') which gives you "what you want". One might argue that the user never wanted to use 'x' in the first place. So we should probably warn the user if they try to set Marker size issuesThere are already many issues with marker sizes not being the most intuitive thing in the world. Even once people get around to figuring out # same "size" marker, but markedly different *actual* size
plt.scatter(x[special], y[special], s=100, marker='X', lw=5, color='b')
plt.scatter(x[special], z[special], s=100, marker='X', lw=1, color='g') Another justification that mpl.rcParams['lines.linewidth'] = 2 # reasonable default for publication
plt.figure(figsize=(3, 2)) # column width in publication size
# these come out just fine
plt.scatter(x, z, s=1, marker='o', color='g')
# but if we try to make them smaller....
plt.scatter(x, y, s=0.1, marker='o', color='b') To me, this suggests that actually, we should only be setting However, even doing that doesn't solve the dual issue, for 1D paths as they get smaller.... mpl.rcParams['lines.linewidth'] = 2 # reasonable default for publication
plt.figure(figsize=(3, 2)) # column width in publication size
# these come out just fine
plt.scatter(x, z, s=20, marker='x', color='g')
# but if we try to make them smaller....
plt.scatter(x, y, s=5, marker='x', color='b') I claim that the real issue tying all of these funny edge cases together is that the user has no way to distinguish what parts of the shape they are looking at are an "edge" and which parts of the shape are a "face" just by looking at it. This is made especially bad by the fact that we use Proposed "ideal" solution
plt.scatter(x, y, marker=mpl.markers.unfilled_x) or similar. (1) solves 90% of the problems above. (2) makes the whole issue of what is a "face" and what is an "edge" totally moot. (3) makes sure we keep feature parity. (4) makes the behavior "discoverable" (i.e., users can figure out how to use the interface correctly by guess-and-read-the-error). This could easily be accomplished by warning whenever someone uses an unfilled marker, that there is new behavior (which you can silence by specifying whether you want the new or old behavior for now, using some Other failure modes this fixesThis solution fixes the original complaint in #17849, since then any time you pass in something to Once #16859 is merged (someone plz review!), this will fix most issues with marker sizes. As the markers stand now, even if that PR was merged (and the user can then specify the area of the desired marker in pts squared exactly), the user is still faced with the confusing task of figuring out why two markers that they have specified the exact size for don't actually match the exact width they requested (because of the interaction with And either way, it would greatly simply the architecting of the more large-scale marker sizing fixes in #16891 (which fixes #15703), since we wouldn't have to special case markers that have zero area, instead trusting the user to very clearly know that area-based calculations don't apply to those markers, since they had to explicitly request them. Proposed "next best" solutionAssuming that changes to the default marker paths are too big of a change (or only makes sense on a 4.x timescale), we can still solve a lot of our problems with the following strategy:
Conclusion
Fundamentally, I think the lesson is that all this could have been altogether prevented by forcing the user to tell us what they want (e.g. by making the fact that you're choosing between 1D and 2D more explicit). I think this PR creates behavior that is more consistent than previous behavior. That said, I am still -1 overall, because this breaks API, and if we're going to break API, my opinion is that we should fix the whole mess up in one go. Some miscellaneous responses
The fact of the matter is that given the marker paths that were chosen many years ago for 'x' and 'X', 'x' is an unfilled marker and 'X' is a filled marker (it says so right in the docs). The whole point of unfilled markers is to not have a I think the real issue is that it is too easy for the user to get ahold of a 1D marker without knowing that it is a totally different kind of object. Instead of trying to deal with every corner case and guess what the user wants ourselves, we should simply make it clear through our API what is going on, by making it hard for the user to end up using an object that they don't already have a mental model for. For completeness, here's the data I used: import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
mpl.rcParams['figure.figsize'] = (3.5, 2)
x = np.random.normal(size=(50,))
y = x + 0.3*np.random.normal(size=(50,))
special = np.arange(5)
y[special] += 1
z = -y |
@brunobeltran I don't claim to have understood every word you wrote, but some remarks:
|
I'm tempted to close this and argue that we should live with the imperfect world we have now. No definition fo these terms will satisfy everyone, and somehow we have all survived the current state of affairs, which at least has the benefit of being backwards compatible. However, if someone sees a clean path to a new API, we can discuss. |
I agree with @jklymak's judgement. |
To maintain back-compatibility, matplotlib decided to ignore the issue. fixes #100 Ref: matplotlib/matplotlib#10008, matplotlib/matplotlib#17850
PR Summary
For unfilled markers, the edgecolor -- if specified -- has precedence
over the facecolor.
With PR

Without PR

closes has2k1/plotnine#100
PR Checklist