-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Legend for Scatter #11127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legend for Scatter #11127
Conversation
I like the idea of a generalized way to do legends for PathCollections & PatchCollections, but wonder if this is overly complicated: # produce a legend with the unique colors from the scatter
ax.legend(*scatter.legend_items(), loc=3, title="Classes")
# produce a legend with a cross section of sizes from the scatter
handles, labels = scatter.legend_items(mode="sizes", alpha=0.6)
ax.legend(handles, labels, loc=1, title="Sizes") overlapping with the discussion in # #7383, I wonder if there shouldn't just be a subclass of legend that handles Path/Patch and cmap/norm stuff. |
How is a single line "too complicated"? If I understand correctly, a Legend subclass would then need to take the artist (in this case the PathCollection) as input, as well as those arguments needed to steer the output. In that sense you could replace I also don't see much overlap to What is proposed in this PR is completely analogue to the existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this a useful thing to add. Some minor suggestions/comments below that I think will improve it. I feel somewhat strongly that the name should not just be legend_items()
as the returned tuple is non-unique.
My only other issue is that we are going to add a ax.get_legend_handles_labels
. I've not quite wrapped my brain around where this fits into that.
lib/matplotlib/collections.py
Outdated
|
||
Parameters | ||
---------- | ||
mode : string, optional, default *"colors"* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like "mode" here, though I guess its OK. "key" would be more natural to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes key
more natural? Any good argument for key
?
In principle I'm open to other names here, e.g. prop
because colors or sizes are properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, prop
works for me.
lib/matplotlib/collections.py
Outdated
for obtaining a legend for a :meth:`~.Axes.scatter` plot. E.g.:: | ||
|
||
scatter = plt.scatter([1,2,3], [4,5,6], c=[7,2,3]) | ||
plt.legend(*scatter.legend_items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like the name legend_items
just because it makes it sound like these are a property of the PathCollection; I think a verb would help like create_legend_items
or make_legend_items
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency I would go with legend_elements
as commented above already. Unless people are prepared to break the existing contour method and call both something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still like having a verb because you are actually doing something versus just returning a property of the collection. But I don't feel super strongly...
lib/matplotlib/collections.py
Outdated
Can be *"colors"* or *"sizes"*. In case of *"colors"*, the legend | ||
handles will show the different colors of the collection. In case | ||
of "sizes", the legend will show the different sizes. | ||
useall : bool or string "auto", optional (default "auto") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: oops, this somehow got lost: I don't think you need both useall
and num
kwargs. num=None is all data, num=int is number bins, num='auto' is auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that won't work, but I will recheck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're right, that works. The only "drawback" is that in that case num
cannot act as a parameter for the automatic determination; but I guess it's fine to have that parameter hardcoded (choosing num=9 now for the border between showing all data or some cross section of it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get that num interacted w/ auto somehow. (still don't get it w/o checking the code).
I wonder if what you really want is a "bins" argument, and make this completely analagous to the histogram bins argument. https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html You could even optionally put in the legend how many elements there are in each bin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I want "bins". I want "ticks"; and I get them using a Locator. From my side this is all working fine. The idea is of course to get nice "tick positions", unlike in the case of histogram bins where you'd end up with arbitrarily ugly numbers like 1.57214.
What this function does not allow for is to add custom ticks (which is possible for bins in histogram
). I decided against that because I think this is exactly the point where you'd anyways simply create the legend the way you want it with the artists created manually. But if someone feels that this is needed, I could also add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thats cool. What about letting a (power) user pass in a Locator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is sure possible. I think it would then simply ignore num
completely. I'm not sure if it's needed, but if there is the wish to add it, I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant let them pass in a locator as the argument to num. I don’t think anyone who knows how to write a locator would be put off by the slight semantic incongruity.
The format or formatter to use for the labels. If a string must be | ||
a valid input for a `~.StrMethodFormatter`. If None (the default), | ||
use a `~.ScalarFormatter`. | ||
func : function, default *lambda x: x* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this too confusing. What does *func*
do and why do you need it? Maybe this needs an example...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func
would allow to calculate the values to be shown in the legend. This is mainly useful for "sizes", because unlike the c
parameter in scatter, which takes numbers and maps them to color using a Normalize
, the s
parameter does not have a normalization. This means that usually you would encode your data into sizes and provide those sizes in the scatter call. However in the legend you might still want to show the original data's values. Example: You use the s
to show price ranging between 0.5 and 10 dollars. Now you cannot use those numbers directly as scatter sizes, because that would result in tiny dots. So you convert them to sizes via s = lambda price : 0.5*(price*10)^2
. Now func
would allow you to provide price = lambda s: np.sqrt(2*s)/10
to have a nice legend using the actual prices in dollars.
I think you are right that an example might be useful here. I could add that in addition to the already extended example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I was particularly flummoxed by "This converts the initial values for color or size and needs to take a numpy array as input." I wasn't sure what "initial" meant.
For the doc string maybe: "Function to calculate the labels. Often the size (or color) argument to scatter will have been pre-processed by the user using a function s = f(x)
to make the markers visible; eg size = np.log10(x)`. Providing this function allows that pre-processing to be inverted, so that the legend labels have the correct values. eg
np.exp(x, 10)``"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reformulated the docstring and also added an additional example now.
Sorry, I was reacting to the * notation and should have looked at this way more carefully. It's fine as is, especially if you go with |
Actually ax.get_legend_handles_labels already exists. As does the concept of legend handlers. Any way this can be turned into a legend handler? |
would produce a single legend row with one scatter dot and the label "mylabel". It is hence equalvalent to simply calling
Now what is possible using a legend handler would be to change the single handle from a single point to something more characteristic of a scatter, e.g.: However, this is still a single legend entry for the complete scatter. What this PR proposes is a way to get several legend entries for one single scatter plot. This is not possible with a One might consider putting the proposed API for |
lib/matplotlib/collections.py
Outdated
@@ -899,6 +900,118 @@ def set_paths(self, paths): | |||
def get_paths(self): | |||
return self._paths | |||
|
|||
def legend_items(self, mode="colors", useall="auto", num=10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How far up the class hierarchy does it make sense to move this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular method only makes sense for the PathCollection
. A less useful version of it could be created for any Collection
and yes, even Artist
could in principle have such method. However the point being that at least for more complex things like scatter (PathCollection), LineCollection, contour (ContourSet), imshow (AxesImage) there need to be some options for the user to decide on the final appearance. I'm unsure how to generalize such arguments and how to use them, other than letting the user call this method and supply the returned handles/labels to legend
- see comment below.
The issue here is that the "one artist : one dataset : one legend entry" mapping that exist for ex A concern with the way we currently do legends is that the handlers are created by lookup up state held in the How far can the notion of 'legend_elements' be generalized? Even if we only have a real implemenation for PathCollection, having a I think for now coming up with a consistent name and API for |
I wasn't exactly sure the scope of what legend_handler could do. I agree that the functionality here is useful, and I don't really see how it can easily be made extensible. What I'm hung up on is that I am not sure I like the API of having it a method on OTOH, if we do this for every different artist type, I imagine this gets to be a mess so I can see the point of having it attached to the artist. Again, just trying to think the API through a bit. I think this is a good idea overall. |
we currently have:
Concerning the API, i.e. the needed arguments, I think there might be little to no overlap between what different artists need as input. Even |
a801e21
to
809c130
Compare
Sorry, I missed the point about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is useful, and consistent w/ the similar method to ContourSet
. "num" could possible pass a Locator if we want more flexibility in the binning of the sizes, but that could either happen now or as a follow up if someone wants more flexibility. Thanks a lot for this!
dfe09f3
to
1097412
Compare
Update: You may now pass a locator to Concerning the discussion about having a
However, as long as this isn't used anywhere internally, it's not really useful. In any case this can be part of a different PR, in case one wants to let |
@ImportanceOfBeingErnest we talks about this on the weekly phone call, and the issue is with attaching this method to The request would be that we add a If you don't feel like doing this, I imagine someone else will be up for picking it up! Thanks for a great idea... |
Stale: requesting a bit more of a refactor...
But this is not only useful for scatters. If you have a different PathCollection than the one being created by a scatter, you may still use it to create legend entries for it. As of now, it seems no other matplotlib function other than scatter creates PathCollections, but if there were others, why restrict this functionality to only |
What's the status here? |
My understanding which may be incomplete, is that If you still disagree with this organization, then we can bring other folks into the conversation. |
As said, this is not specific to |
I don't have any skin in this game - you'll have to work it out w/ @tacaswell |
ping @tacaswell |
Marking as "Needs Review" to hopefully burble to the top of people's lists... |
Looks like doc build and travis aren't happy at the moment? |
3d85295
to
431e074
Compare
I'm happy to fix all the little hickups that come and go with updated dependencies (pep8 to flake, sphinx 1.7 to 1.8 etc.) once the general fate of this PR is decided upon. |
... is this really release critical? No, but I can imagine @ImportanceOfBeingErnest is a bit frustrated its sat around for months... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this seems like important functionality and a reasonable implementation. I would be reluctant to hold it up indefinitely based on the controversy over whether the method should be on something other than the PathCollection
. I don't see any harm in putting it there; that doesn't preclude some future refactoring, so long as the user would still be able to call it as a method of whatever a future scatter
returns. I would like to see some consideration of my suggestion regarding the "num" kwarg (possibly with a new "locator" kwarg) to facilitate what I would expect to be the very common use case of wanting to specify the levels as a sequence.
scatter = ax.scatter(x, y, c=c, s=s) | ||
|
||
# produce a legend with the unique colors from the scatter | ||
legend1 = ax.legend(*scatter.legend_elements(), loc=3, title="Classes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit-pick: I think that the numeric location codes are a terrible API and should not be used in examples. (I would favor removing them entirely from the legend documentation.)
lib/matplotlib/collections.py
Outdated
Can be *"colors"* or *"sizes"*. In case of *"colors"*, the legend | ||
handles will show the different colors of the collection. In case | ||
of "sizes", the legend will show the different sizes. | ||
num : int, None, "auto" (default), or `~.ticker.Locator`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the ability to use a Locator
is good, and provides the ability to put in a list via a FixedLocator
, wouldn't one of the most common--maybe the most common--use cases be an explicit list? For example, sizes represent populations of cities, so the legend shows sizes for [10000, 100000, 1000000, 10000000]
. If this would be a common use case, then it might be worth supporting it directly by accepting a sequence. To make all this more like contour
with its 'levels', a separate kwarg could be used to specify a Locator
for the rare cases when it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use the same argument num
though, because a list of values is an alternative to those other options. Else one would need to go through checking what happens if both are provided etc.
So what is a canonical way of checking if something is a list, tuple or array of numbers? (isinstance(x, collections.abc.Iterable)
only does half the job; is there something in cbook
that I overlooked?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.iterable()
should suffice, I think.
lib/matplotlib/collections.py
Outdated
------- | ||
tuple (handles, labels) | ||
with *handles* being a list of `.Line2D` objects | ||
and *labels* a list of strings of the same length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor suggestion: 'and labels being a matching list of strings.' (Otherwise, it could be misinterpreted as meaning the strings must all be the same length.)
lib/matplotlib/collections.py
Outdated
u = np.unique(self.get_sizes()) | ||
color = kwargs.pop("color", "k") | ||
else: | ||
warnings.warn("Invalid prop provided, or collection without " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for a warning rather than an Exception here? At least in the case of an invalid prop, shouldn't this raise an Exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my rationale here was that similar to how a legend is still created even if no labeled objects are found, here it should not break the code in case no handles/labels can be determined. So one would still get a plot out, just without propper legend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but then I think that the "invalid prop" case should raise ValueError
and only the lack of an array should raise the warning.
lib/matplotlib/collections.py
Outdated
loc = mpl.ticker.MaxNLocator(nbins=num, min_n_ticks=num-1, | ||
steps=[1, 2, 2.5, 3, 5, 6, 8, 10]) | ||
label_values = loc.tick_values(func(arr).min(), func(arr).max()) | ||
cond = (label_values >= func(arr).min()) & \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style preference: when possible, as it is here, use parentheses rather than trailing backslashes for line continuation.
lib/matplotlib/collections.py
Outdated
kw.update(kwargs) | ||
|
||
for val, lab in zip(values, label_values): | ||
if prop == "colors" and hasarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if I am reading the code correctly, the hasarray
check is not needed. (But it makes me wonder whether I am missing something, which is quite possible.)
af5bb74
to
646e2c8
Compare
I am 👍 on this, @efiring is responsible for getting this merged. |
646e2c8
to
d9bd109
Compare
Thanks for the review @efiring. I think I have now incoorporated all the requested changes. I think it's good to have the API fixed for the case of the scatter (which is the most common usecase I assume). We can now still implement this for other collections. If you want I can already now put a |
Looks good, so I merged it. I don't think that putting the method into the Collection base would be good, because it probably makes sense to implement it only for some types of Collection. |
Thanks @efiring and thanks @ImportanceOfBeingErnest for your patience with this! |
🎉 I am glad this landed! |
PR Summary
This PR proposes to include an easy, yet versatile option to create legends for scatterplots.
Motivation:
Scatter plots create a Collection of points, for which it is rather straight forward to create a colorbar. This is useful for continuous variables mapped to color. However, creating a legend with discrete entries requires to manually set up the necessary proxy artists. It appears that scatter plots are more and more used to show discrete of categorical variables. In such cases, a discrete legend is often more desireable than a (continuous, or even discrete) colorbar.
ggplot offers an easy method to create legends for scatter plots,

ggplot(...) + geom_point(aes(x=.., y=.., size=.., color=..))
, resulting in something likeThis PR adds a method
legend_items
to thePathCollection
, which returns the handles and labels for a legend.Creating a scatter legend would then look like
PR Checklist