-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Feature Request: Label Grouping and Multi-Level Axis Labels #6321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
attn @story645 this roughly falls under the categorical umbrella. |
I wish this worked with pandas multiindices. |
possibly a dupe of #1257 |
Implement this as extension of formatters? Not just major/minor but arbitrary? |
I agree that some kind of nested formatters may be useful here. Also note that in the original example the ticks are drawn "halfway" between where two ticks would normally be drawn by matplotlib (but not the labels), also something to look at... In any case may also be useful for someone to do a small review of the API proposed by other tools (already mentioned: R, ROOT, Excel). |
https://stackoverflow.com/questions/52323446/graphing-a-multi-level-index-dataframe-with-pandas-seaborn another such question in stackoverflow. simple things can be don in excel but not easily in matplotlib. |
Pull requests always welcome. FWIW, Excel was released 31 years ago, and is developed by a company with an operating budget on the order of a billion a year. Yet, Excel still can't contour a 2-D matrix of data. |
Barring all discussions comparing matplotlib with data analysis software with plotting capabilities and their respective funding; I remember this requests and some personal attempts to tackle it back when I opened it. I even remember some emails being exchanged about what the possible API layout might look like for this problem with @tacaswell. Unfortunately I can't find them anymore. Since this is a topic since 2012 here's what I remember about it, whether it's helpful or not. In short: What I remember being the main problem is the fact that it's easy to mistake the labeling problem with using hierarchical properties of a data structure as a metadata source for your labels. Hierarchical labeling can be, or better to say should be, a property of labels and not the plot type or the manner in which it was created. The main problem of the problem is the API design. A review of existing API's might help. In long: If we go about committing to dictionaries, where keys can have dictionaries with more keys under which data might live in, as the structure from which plots will be made then we are committing to two things:
To clear up what I meant by that - regardless of whether the plot is a line plot connecting dots, a histogram, or a bar plot, users should be able to organize their labels hierarchically. For the bar example the trivial dict then might look like:
Ok, seems easy enough to write a function that converts that to hierarchical keys and plots a bar graph.
Suddenly the same functionality for the bar graph is not applicable and somehow an adapter to a line plot is needed.
but then what would flatten the data to make a line plot? On the other hand a possible solution is to completely decouple the label making process from the actual plots such for example:
The users become responsible for the translation of their structures to MPL abstractions and in many cases this can be pretty tedious. There is also an underlying assumption that somehow all the ticks will be "consecutive", i.e. that the plots are made from data that are sorted on the hierarchy key such that the subgroups are always bunched together in the plots. It is not obvious to me that this assumption is given a-priori, even though it is definitely natural to expect it being fulfilled. Of course another solution would be to have a factory for the HierarchicalTicks where you would This is without considering potential pathologies with code that exists and does something like:
which people will likely find online and then try to jam the hierarchical ticks in. I don't mind implementing the solution personally*, but figuring out the use cases, test cases and the API design for this particular issue is not something I'm willing to tackle. What is still needed here is the same thing originally called for by @anntzer: "a small review of the API proposed by other tools (already mentioned: R, ROOT, Excel)." * the solution to actually displaying the ticks themselves does not seem trivial either; a nice, general, one would likely not be manual setting of Text instances in positions, but having a general Formatter for such text. |
Secondary axis will be a reasonable way to impliment the hierarchy of ticks. But as you say actually defining the hierarchy in the data is the problem. I’d suggest a higher level library like pandas with well defined data hierarchies would be a better place to implement this. |
I'm running into the same issue, district->county->A/B, can plot each district separately, but really wish there is an easy way to show everything in figure. |
I just came across this issue when addressing this question on SO. I think solving the full problem is hard and the attempt I made on SO is probably not the way to go in the long run. But, I think a lower level and small and easy to implement step towards addressing the full problem (without having to solve the API issues discussed by @ljetibo ) would be providing a means to annotate a range of x or y values on the axes (in analogy to the existing capabilities to annotate a single data point). I imagine the annotation object to consist of a rectangle and two lines delimiting the start and end (i.e. the 'ticks') with the following attributes:
This would be fairly easy to implement, would immediately solve the base case, and could be a foundation on which to build a solution to the full problem. |
For example, the following solves the base case for annotating a range of y-values along the left spine and demonstrates how this solution could scale to the hierarchical case. #!/usr/bin/env python
"""
Annotate a range of y-values.
"""
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms
from matplotlib.patches import Rectangle
from matplotlib.lines import Line2D
def annotate_yrange(ymin, ymax,
label=None,
offset=-0.1,
width=-0.1,
ax=None,
patch_kwargs={'facecolor':'yellow'},
line_kwargs={'color':'black'},
text_kwargs={'rotation':'vertical'}
):
if ax is None:
ax = plt.gca()
# x-coordinates in axis coordinates,
# y coordinates in data coordinates
trans = transforms.blended_transform_factory(
ax.transAxes, ax.transData)
# a bar indicting the range of values
rect = Rectangle((offset, ymin), width=width, height=ymax-ymin,
transform=trans, clip_on=False, **patch_kwargs)
ax.add_patch(rect)
# delimiters at the start and end of the range mimicking ticks
min_delimiter = Line2D((offset+width, offset), (ymin, ymin),
transform=trans, clip_on=False, **line_kwargs)
max_delimiter = Line2D((offset+width, offset), (ymax, ymax),
transform=trans, clip_on=False, **line_kwargs)
ax.add_artist(min_delimiter)
ax.add_artist(max_delimiter)
# label
if label:
x = offset + 0.5 * width
y = ymin + 0.5 * (ymax - ymin)
# we need to fix the alignment as otherwise our choice of x
# and y leads to unexpected results;
# e.g. 'right' does not align with the minimum_delimiter
ax.text(x, y, label,
horizontalalignment='center', verticalalignment='center',
clip_on=False, transform=trans, **text_kwargs)
def demo():
fig, ax = plt.subplots(1, 1)
# add some extra space for the annotations
fig.subplots_adjust(left=0.2)
annotate_yrange(0.1, 0.9, 'test')
plt.show()
def demo_hierarchy():
fig, ax = plt.subplots(1, 1)
fig.subplots_adjust(left=0.3)
width = -0.1
offsets = [-0.1, -0.2]
lower = [(0.1, 0.3), (0.3, 0.5), (0.5, 0.7), (0.7, 0.9)]
upper = [(0.1, 0.5), (0.5, 0.9)]
for ii, (level, offset) in enumerate(zip((lower, upper), offsets)):
for jj, (ymin, ymax) in enumerate(level):
annotate_yrange(ymin, ymax, f'test {ii}.{jj}', offset=offset, width=width)
plt.show()
if __name__ == '__main__':
demo()
demo_hierarchy() |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
This is still very much wanted, especially if we implement hierarchical visualizations like grouped-stacked bar charts #24313 |
Is there a compelling reason this would need to be in the core Matplotlib versus a third party package? |
Because one potential way of implementing this would be to generalize the core {minor/major} axis hierarchy/formatting/labeling/scaling we already have, which is going to be easier to do internally than as a patching. And because this is functionality many tools built on top of us would like so that they can build their labeling rather than a domain specific request. Which if we can get agreement on a proposed approach, could this be doable as a GSOC or does it require too much understanding of the core API? |
Adding a second level of ticking is probably fine for core, and I agree would be easier if we did in Matplotlib. Adding how to automatically decide on that second level of ticking from the structure of data seems too domain specific. I'm not sure what the second level of ticking API interface would look like though, nor what is to stop us at two levels (versus 3, ..., versus N) |
1000% agree. I think Matplotlib should be providing something like the following: ax.{xyaxis}.set_arbitrary_formatter(name="", formatter object, level)
ax.{xyaxis}.set_arbitrary_locator(name="", formatter object, level) or don't even touch that level of api and do: ax.{x,y}axis.add_level(name, level)
ax.{x,y}axis.level[f'{name}'].set_ticks/set_ticklabels/set_locator/set_formatter where Well constrained flexibility basically. (And my advisor is the one who proposed this years ago 😓) |
Even though users can deal with axis labels on a higher level through the Formatter and Text classes, Matplotlib's handling of Labels is still very flat. At best users can change the string formatting through the Formatter class and text formatting (font, size, rotation...) through the Text class.
Neither approach, however, offers users the ability to represent their data more accurately by using multiple label levels on the same plot in the way Excel, Root or R do.
In Root axes instances can be plotted individually as objects and each one be assigned a new Label instance which allows maximal flexibility for the user.
Excel and R handle this issue similarly by using the data-name in the legend, instead of as axis labels, and then use the group of the data-name as axis labels. In both cases the end result looks like:
No need to say that Root approach seems as an overkill and I'm more partial to Excel and R approach. A workaround by using hand-set text objects is provided in this StackOverflow question. The frequency of similar questions is the main motivation behind this request.
The text was updated successfully, but these errors were encountered: