Skip to content

Fix scatter edgecolor for unfilled points #10008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

has2k1
Copy link
Contributor

@has2k1 has2k1 commented Dec 14, 2017

PR Summary

For unfilled markers, the edgecolor -- if specified -- has precedence
over the facecolor.

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(x=[1, 2, 3], y=[1, 2, 3], s=550, linewidth=5, marker='2', facecolor='red', edgecolor='cyan')

With PR
with-pr

Without PR
without-pr

closes has2k1/plotnine#100

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@phobson
Copy link
Member

phobson commented Dec 14, 2017

Hmmm. I wonder if this the right approach. If the markers don't have faces, it seems a little odd to throw out the specified edge color to use the specified face color.

@jklymak
Copy link
Member

jklymak commented Dec 14, 2017

I think I understand the motivation. If I want a bunch of markers w the same color it’s a bit of a pain that some have to be specified w facecolor and some w edgecolor and it’s not always clear which is which. Not sure this is the right solution to that problem. Perhaps you had another motivation or use case in mind as well?

@phobson
Copy link
Member

phobson commented Dec 14, 2017

Agreed. I think we should respect the edge color if it's specified. But if it's not and face color is specified, then go with that.

@phobson
Copy link
Member

phobson commented Dec 14, 2017

Oh wait, I misread the original description. maybe that's what's going on.

@jklymak
Copy link
Member

jklymak commented Dec 14, 2017

Oh, OK, I see. Shouldn't read issues on my phone. I think I support this change. But its a breaking API change, so that could be a problem.

If this is the way to go:

  1. the docstring needs to change:
            For non-filled markers, the `edgecolors` kwarg
             is ignored and forced to 'face' internally.
  1. it needs a test for the new behaviour. A test was missing before - this change should have broken a test, but it doesn't.

@efiring
Copy link
Member

efiring commented Dec 14, 2017

The rationale for the existing behavior is probably something like this: The primary color for filled markers is the face color; the edge color is a style detail (e.g., black outlines, versus no outlines). This implies that the primary color--which is the sole color--for unfilled markers should be the face. The present behavior allows this to be achieved with a single set of color kwargs, regardless of whether it is applied to filled or unfilled markers. The behavior proposed in this PR would mean that if an application allows the user to select the marker, and it is desired that filled markers have that black outline, the application logic would have to modify the color kwargs depending on whether the user selected a filled or an unfilled marker.
Perhaps I could be convinced otherwise, but for now I favor keeping the behavior as it is; I don't see sufficient justification for changing it.

@phobson
Copy link
Member

phobson commented Dec 14, 2017

@jklymak

I think we're still getting things backwards. In the spirit of TDD, here's the test that I think should be included. Curious what you think:

# in test_axes.py
import numpy.testing as nptest
import matplotlib.colors as mcolors

@pytest.mark.parametrize(('facecolor', 'edgecolor', 'expected_color'), [
    ('red', 'cyan', 'cyan'),
    ('none', 'cyan', 'cyan'),
    ('red', 'none', 'red')
])
def test_scatter_color_handling(facecolor, edgecolor, expected_color):
    fig, ax = plt.subplots()
    artist = ax.scatter(x=[1, 2, 3], y=[1, 2, 3], s=550,
                        linewidth=5, marker='2', 
                        facecolor=facecolor,
                        edgecolor=edgecolor)
    result_color = artist.get_edgecolor()[0]
    assert mcolors.to_hex(result_color) == mcolors.to_hex(expected_color)

@has2k1
Copy link
Contributor Author

has2k1 commented Dec 14, 2017

This implies that the primary color--which is the sole color--for unfilled markers should be the face

For all markers the linewidth controls the size of the edge. If you go with existing behaviour, the linewidth gets to control the size of the marker itself.

The current behaviour is awkward because it breaks these associations:

  • facecolor -> size|s
  • edgecolor -> linewidth

@has2k1 has2k1 force-pushed the fix-unfilled-scatter-edgecolor branch from 28696b9 to 88cd5fd Compare December 15, 2017 05:09
@jklymak
Copy link
Member

jklymak commented Dec 15, 2017

@phobson I think thats what @has21k would like. I think that @efiring was suggesting was that the first test should come back red so that:

for marker in ['x', 's']:
    ax.scatter([1], [1], marker=marker, edgecolor='r', facecolor='c')

would return cyan markers for the x and cyan markers w/ a red border for the next call.

I could go either way. Its a little annoying that there are filled markers and stroked markers at all...

@efiring
Copy link
Member

efiring commented Dec 15, 2017 via email

@has2k1
Copy link
Contributor Author

has2k1 commented Dec 15, 2017

I'm sorry, but I don't understand what you mean here, in the second sentence above.

It is my fault, the sentence is murky, discard it. The one below emphasises better what I was trying to say.

@has2k1 has2k1 force-pushed the fix-unfilled-scatter-edgecolor branch 2 times, most recently from a27175f to 5f85db6 Compare December 15, 2017 11:16
Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the typo and the fact it needs a rebase, this behaviour seems right to me. It is an API change, but I don't think it will break too many people.

@@ -3873,8 +3873,8 @@ def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None,
If it is 'none', the patch boundary will not
be drawn.
For non-filled markers, the `edgecolors` kwarg
is ignored and forced to 'face' internally.
For non-filled markers , the `edgecolors` kwarg (if it is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For non-filled markers , the `edgecolors` kwarg (if it is not
For non-filled markers, the `edgecolors` kwarg (if it is not

@jklymak jklymak added this to the v3.4.0 milestone Dec 20, 2020
@jklymak
Copy link
Member

jklymak commented Dec 20, 2020

@phobson and @efiring I think we were confusing ourselves a fair bit. Can one of you re-review and decide if this relatively modest change is OK or not?

@has2k1 has2k1 force-pushed the fix-unfilled-scatter-edgecolor branch from 5f85db6 to 493660a Compare December 20, 2020 21:49
For unfilled markers, the edgecolor -- if specified -- has precedence
over the facecolor.

closes has2k1/plotnine#100
Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more changes than you might think, so I'm blocking it for now. I will elaborate in the comments.

@efiring
Copy link
Member

efiring commented Dec 21, 2020

#18480 is a related attempt to improve the handling of color options, which may occur in bewildering variety. In general, facecolor can be None, "none", or a color spec or array of such. Edgecolor can be None, "none", "face", or a colorspec or array of such. In both cases, None generally means a default of some sort, possibly from an rcParam. In addition, there may be an array to be used for color mapping, which case it could be for the face, the edge, or both (if edgecolor is "face"). Collections have built-in logic for deciding what the rendered face and edge colors are; scatter uses its own logic, even though it returns a collection.

In mpl 3.3 (and probably going a long way back) scatter gives the following for the case without color mapping:
Old_scatter_without

This PR (#10008) does the following:
New_scatter_without

As you see, 4 of the cases are changed. The one in the lower RH corner is what I was referring to in #10008 (comment) above.

I'm coming around to the view that the #10008 version is much more internally consistent than current behavior. I'm wondering whether it might be implemented by relying more on standard collection behavior, with less special logic for scatter.

This could cause quite a bit of breakage; how to handle it might need more thought. I suspect a lot of people use scatter and are expecting the current behavior.

@jklymak
Copy link
Member

jklymak commented Dec 21, 2020

Thanks for the great example - the question is whether we think the color we see in a stroked marker like "x" is an edgecolor or a facecolor. I guess I can see both points of view. Given this hasn't seen a lot of take-up in the last 3 years, I wonder if we should just close it, and/or turn it into a documentation issue...

@timhoffm
Copy link
Member

Also related: #17850.

@brunobeltran
Copy link
Contributor

brunobeltran commented Dec 23, 2020

As discussed on the call today, the entire scatter color API is a mess. So, instead of agreeing on whether to merge this PR today, we want to first decide what we would in principle had wanted the API to look like if it were being written from scratch today. Once we're in agreement on what the ideal API would look like (will discuss that on next dev call, once Tim is available), then we can decide whether this is the best first step.

In the meantime, here's my proposal for the "ideal" API.

Intro/Justification/Existing uses for unfilled markers

Markers as outlines

Markers with no Face are kind of their own "type" of object, and their "correct" use, semantically, is to create a "variable weight line drawings".

For example, when you're highlighting certain points in a scatter plot, you can easily encircle them with boxes. You probably want these boxes to be big enough to really highlight the points of interest (i.e. have large size) but also a small linewidth, so that they don't look too bold.

plt.scatter(x, y, s=10, color='g')
plt.scatter(x[special], y[special], marker='x', s=100, lw=1, color='k')

image

Now, if we switch to using squares instead, I don't think it's too much to ask of the user that they figure out to set "facecolor" to 'none':

plt.scatter(x, y, s=10, c='g')
plt.scatter(x[special], y[special], marker='s', s=100, lw=1, color='k', fc='none')

image

This provides justification for why color (and c) should affect the edgecolor. Otherwise, the squares in the above plot would be invisible.

Markers as "dots"

On the other hand, using filled markers typically comes with a different desired semantics. Say a user instead wants to make a scatter plot with circles and x's side-by-side:

plt.scatter(x, y, marker='o', color='b')
plt.scatter(x, z, marker='x', color='g')

image

this works fine. But now suppose the user notices that they can add nice edges to their markers, so get pretty outlines:

plt.scatter(x, y, marker='o', color='b', edgecolor='k')
plt.scatter(x, z, marker='x', color='g', edgecolor='k')

image

Oof, that doesn't work very well. Well, after all, what the user probably wanted all along was:

plt.scatter(x, y, marker='o', color='b', edgecolor='k')
plt.scatter(x, z, marker='X', color='g', edgecolor='k')

image

which gives you "what you want". One might argue that the user never wanted to use 'x' in the first place. So we should probably warn the user if they try to set color and edgecolor for locally-1D marker paths, with a helpful message pointing them both to the relevant "filled" version of the marker they are using (if one exists) and to the fact that these two options are in fact overwriting each other, and so are redundant in this case.

Marker size issues

There are already many issues with marker sizes not being the most intuitive thing in the world. Even once people get around to figuring out s scales like the square, the fact that the color argument affects the edgecolor by default makes it very hard for users to figure out how to make markers of a particular size. And the interaction with ``lines.linewidth'` is very unexpected, in particular for filled markers:

# same "size" marker, but markedly different *actual* size
plt.scatter(x[special], y[special], s=100, marker='X', lw=5, color='b')
plt.scatter(x[special], z[special], s=100, marker='X', lw=1, color='g')

image

Another justification that 'x' is probably not what the user wanted comes when we try to make the markers smaller, or use too large of a linewidth. These should both be filled circles but what we actually get is due to the edgecolor being set by default is:

mpl.rcParams['lines.linewidth'] = 2  # reasonable default for publication
plt.figure(figsize=(3, 2))  # column width in publication size
# these come out just fine
plt.scatter(x, z, s=1, marker='o', color='g')
# but if we try to make them smaller....
plt.scatter(x, y, s=0.1, marker='o', color='b')

tmp

To me, this suggests that actually, we should only be setting edgecolor = color iff we're dealing with a 1D Path or if facecolor='none' (or filled=False).

However, even doing that doesn't solve the dual issue, for 1D paths as they get smaller....

mpl.rcParams['lines.linewidth'] = 2  # reasonable default for publication
plt.figure(figsize=(3, 2))  # column width in publication size
# these come out just fine
plt.scatter(x, z, s=20, marker='x', color='g')
# but if we try to make them smaller....
plt.scatter(x, y, s=5, marker='x', color='b')

tmp

I claim that the real issue tying all of these funny edge cases together is that the user has no way to distinguish what parts of the shape they are looking at are an "edge" and which parts of the shape are a "face" just by looking at it. This is made especially bad by the fact that we use color to set both ec and fc. However, even if we fixed that, we still have to set edgecolor using c/ color for 1D markers (otherwise they'd be invisible by default!)

Proposed "ideal" solution

  1. Deprecate all unfillable paths (so 'x' is a synonym for 'X', for example). And add equivalent "filled" versions when they don't already exist (like for '1', '2', etc.).
  2. Change color kwarg to only set facecolor, and deprecate it.
  3. Add custom marker style objects to markers.py. So if someone really wants that unfilled X, they have to do
plt.scatter(x, y, marker=mpl.markers.unfilled_x)

or similar.
4) Unfilled markers should error if someone tries to set facecolor. c will set only edgecolor for unfilled markers, and only facecolor for filled markers.

(1) solves 90% of the problems above. (2) makes the whole issue of what is a "face" and what is an "edge" totally moot. (3) makes sure we keep feature parity. (4) makes the behavior "discoverable" (i.e., users can figure out how to use the interface correctly by guess-and-read-the-error).

This could easily be accomplished by warning whenever someone uses an unfilled marker, that there is new behavior (which you can silence by specifying whether you want the new or old behavior for now, using some thick_markers=True flag).

Other failure modes this fixes

This solution fixes the original complaint in #17849, since then any time you pass in something to sc = plt.scatter(facecolor=x, edgecolor=y), you are guaranteed to get that same value back out (i.e. in this case sc.get_facecolor() == x and sc.get_edgecolor() == y).

Once #16859 is merged (someone plz review!), this will fix most issues with marker sizes. As the markers stand now, even if that PR was merged (and the user can then specify the area of the desired marker in pts squared exactly), the user is still faced with the confusing task of figuring out why two markers that they have specified the exact size for don't actually match the exact width they requested (because of the interaction with linewidth).

And either way, it would greatly simply the architecting of the more large-scale marker sizing fixes in #16891 (which fixes #15703), since we wouldn't have to special case markers that have zero area, instead trusting the user to very clearly know that area-based calculations don't apply to those markers, since they had to explicitly request them.

Proposed "next best" solution

Assuming that changes to the default marker paths are too big of a change (or only makes sense on a 4.x timescale), we can still solve a lot of our problems with the following strategy:

  1. Passing fc, or passing both color and ec whenever we are dealing with a 1D marker should warn.
  2. For 2D markers, color should affect only fc, unless filled is false or fc is none, in which case it should affect only ec. For 1D markers, color should always affect only ec. fc should always return 'none' for these markers.
  3. In order to store this new "default" color information, MarkerStyle will need a get_color in addition to get_facecolor and get_edgecolor.

Conclusion

MarkerStyle is not the only place in the library where we let users pass in arbitrary paths, and I think that having Markers respond differently to edge and facecolor than every other bit of drawing API in the library was a very understandable choice at the time it was made (and was probably necessary for Matlab compat), but fundamentally is a workaround designed to help users that aren't privy to these types of internals. However, instead of trul helping these users, it has instead cause the side effect that they simply need to figure out a much more complicated mental model than if we had "told" them that the marker is basically a PathPatch in the first place.

Fundamentally, I think the lesson is that all this could have been altogether prevented by forcing the user to tell us what they want (e.g. by making the fact that you're choosing between 1D and 2D more explicit).

I think this PR creates behavior that is more consistent than previous behavior. That said, I am still -1 overall, because this breaks API, and if we're going to break API, my opinion is that we should fix the whole mess up in one go.

Some miscellaneous responses

the question is whether we think the color we see in a stroked marker like "x" is an edgecolor or a facecolor

The fact of the matter is that given the marker paths that were chosen many years ago for 'x' and 'X', 'x' is an unfilled marker and 'X' is a filled marker (it says so right in the docs). The whole point of unfilled markers is to not have a Face, and so it's never made sense to me that they respond to facecolor at all (except in the sense that fc=None, ec='face' is a reasonable set of defaults).

I think the real issue is that it is too easy for the user to get ahold of a 1D marker without knowing that it is a totally different kind of object. Instead of trying to deal with every corner case and guess what the user wants ourselves, we should simply make it clear through our API what is going on, by making it hard for the user to end up using an object that they don't already have a mental model for.

For completeness, here's the data I used:

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

mpl.rcParams['figure.figsize'] = (3.5, 2)

x = np.random.normal(size=(50,))
y = x + 0.3*np.random.normal(size=(50,))
special = np.arange(5)
y[special] += 1
z = -y

@brunobeltran brunobeltran added the status: needs comment/discussion needs consensus on next step label Jan 5, 2021
@timhoffm
Copy link
Member

@brunobeltran I don't claim to have understood every word you wrote, but some remarks:

  • Outlined markers (Marker(..., fillstyle='none')) can also be used as "dots". Say you have two related groups of datasets, then you can use filled and unfilled circles, filled and unfilled squares etc. Such a style is particularly print-friendly.
  • I'm skeptical on deprecating unfillable markers. If I just want a green 'x', I don't want to think about face and edge; and as you wrote outlines on filled markers can render surprises. Deprecating would just move the burden from thinking about filled/unfilled to face/edge/linewidth in such a case. Given that unfillable markers "just work" most of the time, it's not worth it.
  • Deprecating color is not straight forward. While you could deprecate the keyword argument, the "primary" color concept is also present in plt.plot() format strings (plt.plot(x, y, 'rx')) and in scatter(x, y, c=...).

@jklymak
Copy link
Member

jklymak commented Mar 24, 2021

I'm tempted to close this and argue that we should live with the imperfect world we have now. No definition fo these terms will satisfy everyone, and somehow we have all survived the current state of affairs, which at least has the benefit of being backwards compatible.

However, if someone sees a clean path to a new API, we can discuss.

@QuLogic QuLogic modified the milestones: v3.5.0, v3.6.0 Aug 21, 2021
@timhoffm
Copy link
Member

I agree with @jklymak's judgement.

@timhoffm timhoffm closed this Feb 16, 2022
@tacaswell tacaswell modified the milestones: v3.6.0, unassigned Feb 16, 2022
has2k1 added a commit to has2k1/plotnine that referenced this pull request Apr 6, 2022
To maintain back-compatibility, matplotlib decided to ignore the issue.

fixes #100
Ref: matplotlib/matplotlib#10008, matplotlib/matplotlib#17850
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

geom_point happily colorizes unfilled_shapes on main plot but shows only black on legend
8 participants