Skip to content

[ENH]: Legend entries for boxplot #27792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timhoffm opened this issue Feb 14, 2024 · 12 comments · Fixed by #27840
Closed

[ENH]: Legend entries for boxplot #27792

timhoffm opened this issue Feb 14, 2024 · 12 comments · Fixed by #27840
Milestone

Comments

@timhoffm
Copy link
Member

timhoffm commented Feb 14, 2024

Problem

Currently, boxplots do not get legend entries. #27711 was an attempt to introduce them but only solved one very particular usecase and did not generalize well. See #27780.

There is a labels parameter, but that only sets x-tick labels. And it has one entry per box. For the legend we should have only one entry per boxplot() call no matter how many boxes that has, because boxplot draws N identically styled boxes and it does not make sense to have N identical handles with different labels.

Because of the x-tick relation of labels and it's relation to individual boxes, this parameter is not suited for the legend labels.

Proposed solution

We need a separate parameter. There are basically two options:

Variant 1) legend_label: str

pro: simple and clear, no interference with the existing API
con: inconsistent with the rest of the library: all other functions use label for the legend entry.

Variant 2) label: str

pro: consistent with the rest of the library
con: Having label alongside labels is quite confusing and easily leads to errors. (We could runtime-check for the type str vs list of str, to give helpful error messages but still ...)
If we want to go this way, we should rename labels to tick_labels or similar.

@timhoffm
Copy link
Member Author

Side note: We may later allow a list of labels if we want to ease labeling when styling individually, e.g. (currently not working code):

bplot = ax.boxplot(fruit_weights, patch_artist=True, legend_labels=labels)
for patch, color in zip(bplot['boxes'], colors):
    patch.set_facecolor(color)

grafik

@rcomer
Copy link
Member

rcomer commented Feb 14, 2024

I'm unconvinced we would need to support passing multiple legend labels. If you are already looping through your artists to set the colours, you can set the labels at the same time. This works with v3.8.2:

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(19680801)
fruit_weights = [
    np.random.normal(130, 10, size=100),
    np.random.normal(125, 20, size=100),
    np.random.normal(120, 30, size=100),
]
labels = ['peaches', 'oranges', 'tomatoes']
colors = ['peachpuff', 'orange', 'tomato']

fig, ax = plt.subplots()
ax.set_ylabel('fruit weight (g)')

bplot = ax.boxplot(fruit_weights,
                   patch_artist=True)  # fill with color

# fill with colors and add legend labels
for patch, color, label in zip(bplot['boxes'], colors, labels):
    patch.set_facecolor(color)
    patch.set_label(label)
    
ax.legend()
ax.get_xaxis().set_visible(False)

plt.show()

image

@timhoffm
Copy link
Member Author

I'm unconvinced we would need to support passing multiple legend labels.

True. OTOH it doesn't cost us much. And it leaves the theoretical option open to add per-box coloring later (note that we've recently done this for bar (see https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_colors.html). Not that I have any concrete plans to do this, but it's not unreasonable.

What I actually wanted to say with this is that extensions in that directions are possible in a consistent way.

@story645
Copy link
Member

Because of the x-tick relation of labels and it's relation to individual boxes, this parameter is not suited for the legend labels.
If we want to go this way, we should rename labels to tick_labels or similar.

How bad would it be to do a shuffle labels->tick_labels then labels gets used for legend like everywhere else?

@timhoffm
Copy link
Member Author

Note: Theres currently labels(plural). If anything the new one would belabel` (singular). So strictly there's no overlap.

But

  • if you don't rename labels, as said above having labels and label would be really confusing.
  • if you rename labels, I assume that will break about every boxplot. As soon as you have two boxes labels is by far the most simple solution to identify them.

@story645
Copy link
Member

If anything the new one would belabel` (singular).

Yeah, sorry missed that label is what gets used everywhere for legend. Then variant 2 + rename makes sense- I assume the rename will go through a deprecation process to not break everything?

@timhoffm
Copy link
Member Author

Disregarding the existing API and how we would migrate, would label + tick_labels be good?

  • yes in terms of API consistency
  • meh in terms of label is generic tick_labels is specific. - As long as you only have one label (like in plot), generic is ok. The alternative here would be to have two specific names legend_label + tick_labels.

@rcomer
Copy link
Member

rcomer commented Feb 15, 2024

Could we introduce tick_labels and legend_label and have a longer-than-usual deprecation period for labels?

@story645
Copy link
Member

story645 commented Feb 15, 2024

I lean towards label>legend_label b/c API consistency >> name specificity.

@timhoffm
Copy link
Member Author

timhoffm commented Feb 18, 2024

Note to self/ to whoever is interested (I haven't made up my mind on this yet):

Should positions also support str labels - in other functions we allow str as inputs on coordinate labels (e.g. x in bar()).
And if so, should we then remove the current labels. - While a little less capable (one can currently use labels alongside positions to create str labeled boxes at arbitrary positions), the main use case for simple labeling would be covered.

@saranti
Copy link
Contributor

saranti commented Feb 24, 2024

bar(h) also uses the tick_label and label semantics so this would be consistent with that.

How bad would it be to do a shuffle labels->tick_labels then labels gets used for legend like everywhere else?

I have a solution almost ready for this using variant 2 where the labels are set this way:

labels = ['peaches', 'oranges', 'tomatoes']
ticklabels = ['a', 'b', 'c']

bplot = ax.boxplot(fruit_weights,
                   tick_labels=ticklabels, 
                   label=labels) # legend labels
ax.legend()

I can send in a PR as it is, unless you want to discuss this more (I don't want to create a bias for this variant.)

@timhoffm
Copy link
Member Author

bar(h) also uses the tick_label and label semantics so this would be consistent with that.

This is a very good point. Let's go with that.

I have a solution almost ready for this using variant 2 where the labels are set this way:

labels = ['peaches', 'oranges', 'tomatoes']
ticklabels = ['a', 'b', 'c']

bplot = ax.boxplot(fruit_weights,
                   tick_labels=ticklabels, 
                   label=labels) # legend labels
ax.legend()

I can send in a PR as it is, unless you want to discuss this more (I don't want to create a bias for this variant.)

Go for it! Please note that label should accept str and list of str, as with bar().

@QuLogic QuLogic added this to the v3.9.0 milestone Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants