Skip to content

Feature Request: Label Grouping and Multi-Level Axis Labels #6321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ghost opened this issue Apr 20, 2016 · 18 comments
Open

Feature Request: Label Grouping and Multi-Level Axis Labels #6321

ghost opened this issue Apr 20, 2016 · 18 comments
Labels
Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues keep Items to be ignored by the “Stale” Github Action New feature topic: categorical

Comments

@ghost
Copy link

ghost commented Apr 20, 2016

Even though users can deal with axis labels on a higher level through the Formatter and Text classes, Matplotlib's handling of Labels is still very flat. At best users can change the string formatting through the Formatter class and text formatting (font, size, rotation...) through the Text class.

Neither approach, however, offers users the ability to represent their data more accurately by using multiple label levels on the same plot in the way Excel, Root or R do.

In Root axes instances can be plotted individually as objects and each one be assigned a new Label instance which allows maximal flexibility for the user.

Excel and R handle this issue similarly by using the data-name in the legend, instead of as axis labels, and then use the group of the data-name as axis labels. In both cases the end result looks like:

latex

No need to say that Root approach seems as an overkill and I'm more partial to Excel and R approach. A workaround by using hand-set text objects is provided in this StackOverflow question. The frequency of similar questions is the main motivation behind this request.

@jenshnielsen jenshnielsen added this to the 2.1 (next point release) milestone Apr 25, 2016
@tacaswell
Copy link
Member

attn @story645 this roughly falls under the categorical umbrella.

@michalkahle
Copy link

I wish this worked with pandas multiindices.

@story645
Copy link
Member

possibly a dupe of #1257

@tacaswell tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Oct 3, 2017
@story645
Copy link
Member

story645 commented Nov 1, 2017

Implement this as extension of formatters? Not just major/minor but arbitrary?

@anntzer
Copy link
Contributor

anntzer commented Nov 24, 2017

I agree that some kind of nested formatters may be useful here. Also note that in the original example the ticks are drawn "halfway" between where two ticks would normally be drawn by matplotlib (but not the labels), also something to look at...

In any case may also be useful for someone to do a small review of the API proposed by other tools (already mentioned: R, ROOT, Excel).

@story645 story645 added the Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues label Feb 5, 2018
@story645 story645 modified the milestones: v2.2, v3.0 Feb 6, 2018
@dstansby dstansby modified the milestones: v3.0, needs sorting Jul 26, 2018
@Khalilsqu
Copy link

https://stackoverflow.com/questions/52323446/graphing-a-multi-level-index-dataframe-with-pandas-seaborn another such question in stackoverflow. simple things can be don in excel but not easily in matplotlib.

@jklymak
Copy link
Member

jklymak commented Sep 14, 2018

Pull requests always welcome.

FWIW, Excel was released 31 years ago, and is developed by a company with an operating budget on the order of a billion a year. Yet, Excel still can't contour a 2-D matrix of data.

@ghost
Copy link
Author

ghost commented Sep 14, 2018

Barring all discussions comparing matplotlib with data analysis software with plotting capabilities and their respective funding; I remember this requests and some personal attempts to tackle it back when I opened it. I even remember some emails being exchanged about what the possible API layout might look like for this problem with @tacaswell. Unfortunately I can't find them anymore. Since this is a topic since 2012 here's what I remember about it, whether it's helpful or not.

In short: What I remember being the main problem is the fact that it's easy to mistake the labeling problem with using hierarchical properties of a data structure as a metadata source for your labels. Hierarchical labeling can be, or better to say should be, a property of labels and not the plot type or the manner in which it was created. The main problem of the problem is the API design. A review of existing API's might help.

In long: If we go about committing to dictionaries, where keys can have dictionaries with more keys under which data might live in, as the structure from which plots will be made then we are committing to two things:

  • translating various data structures capable of storing hierarchical datasets to dicts
  • writing adapters to all currently existing plot types

To clear up what I meant by that - regardless of whether the plot is a line plot connecting dots, a histogram, or a bar plot, users should be able to organize their labels hierarchically. For the bar example the trivial dict then might look like:

{g1: sg1: 10
     sg2: 20
 g2: sg3: 30
     sg4: 40
....}

Ok, seems easy enough to write a function that converts that to hierarchical keys and plots a bar graph.
But for example if the users want to have a line plot of of stock price behavior over time with quartals as hierarchical group the timestamp labels would trivially be organized as

{q1: timestamps: [t1, t2, t3, ...]
     stockprices: [p1, p2, p3...]
 q2: timestamps: 
     stockprices: 
.....
}

Suddenly the same functionality for the bar graph is not applicable and somehow an adapter to a line plot is needed.
Of course each timestamp could be organized as a individual subgroup of a quartal

{q1: t1: p1
     t2: p2
.....
}

but then what would flatten the data to make a line plot?
The same points stand for plots made by the histogram functionality.

On the other hand a possible solution is to completely decouple the label making process from the actual plots such for example:

timestamps = [t1, t2, t3 ...]
stockprices = [p1, p2, p3 ....]
tick_hierarchy = {q1: timestamps[:10]
                           q2:: timestamps[10:]
                           ....}
hierarchTicks = plt.HierarchicalTicks(tick_hierarchy)

The users become responsible for the translation of their structures to MPL abstractions and in many cases this can be pretty tedious. There is also an underlying assumption that somehow all the ticks will be "consecutive", i.e. that the plots are made from data that are sorted on the hierarchy key such that the subgroups are always bunched together in the plots. It is not obvious to me that this assumption is given a-priori, even though it is definitely natural to expect it being fulfilled.
In this case with 3rd party hierarchical data structures, such as Pandas DataFrames for example, the timestamps, stockprices and tick_hierarchy have to be created by slicing the data. Are we confident this is guaranteed by these data structures? Will columnar value slicing of DataFrames produce data in order required, how about formats like netCDF, HDF5? Don't miss-understand me they might as well do, I just don't know. The point I'm making is if they don't, then nobody will use this functionality because it requires so much looping and manual translation.

Of course another solution would be to have a factory for the HierarchicalTicks where you would add_group_levels, add_group("", level), associate_data and other monstrosities.

This is without considering potential pathologies with code that exists and does something like:

a = [1, 2, 3]
b = ["a", "b", "c"]
plt.bar(a, a, label="a")
plt.xticks(a, b)

which people will likely find online and then try to jam the hierarchical ticks in.

I don't mind implementing the solution personally*, but figuring out the use cases, test cases and the API design for this particular issue is not something I'm willing to tackle. What is still needed here is the same thing originally called for by @anntzer: "a small review of the API proposed by other tools (already mentioned: R, ROOT, Excel)."

* the solution to actually displaying the ticks themselves does not seem trivial either; a nice, general, one would likely not be manual setting of Text instances in positions, but having a general Formatter for such text.

@jklymak
Copy link
Member

jklymak commented Sep 14, 2018

Secondary axis will be a reasonable way to impliment the hierarchy of ticks.

But as you say actually defining the hierarchy in the data is the problem. I’d suggest a higher level library like pandas with well defined data hierarchies would be a better place to implement this.

@Jingxiang6
Copy link

I'm running into the same issue, district->county->A/B, can plot each district separately, but really wish there is an easy way to show everything in figure.

@paulbrodersen
Copy link

paulbrodersen commented Nov 19, 2019

I just came across this issue when addressing this question on SO. I think solving the full problem is hard and the attempt I made on SO is probably not the way to go in the long run.

But, I think a lower level and small and easy to implement step towards addressing the full problem (without having to solve the API issues discussed by @ljetibo ) would be providing a means to annotate a range of x or y values on the axes (in analogy to the existing capabilities to annotate a single data point). I imagine the annotation object to consist of a rectangle and two lines delimiting the start and end (i.e. the 'ticks') with the following attributes:

  • text
  • min, max (in data coordinates)
  • position ('left', 'bottom', 'right', 'top', 'x'=='bottom', 'y'=='left')
  • offset (in axis coordinates)
  • width (in axis coordinates)
  • facecolor (default None)
  • (face-)alpha
  • linecolor (default black)
  • linewidth (default tick width)
  • text properties (font, fontsize, alignment, orientation, etc)

This would be fairly easy to implement, would immediately solve the base case, and could be a foundation on which to build a solution to the full problem.

@paulbrodersen
Copy link

paulbrodersen commented Nov 19, 2019

For example, the following solves the base case for annotating a range of y-values along the left spine and demonstrates how this solution could scale to the hierarchical case.

Figure_1

Figure_2

#!/usr/bin/env python
"""
Annotate a range of y-values.
"""

import matplotlib.pyplot as plt
import matplotlib.transforms as transforms

from matplotlib.patches import Rectangle
from matplotlib.lines import Line2D

def annotate_yrange(ymin, ymax,
                    label=None,
                    offset=-0.1,
                    width=-0.1,
                    ax=None,
                    patch_kwargs={'facecolor':'yellow'},
                    line_kwargs={'color':'black'},
                    text_kwargs={'rotation':'vertical'}
):
    if ax is None:
        ax = plt.gca()

    # x-coordinates in axis coordinates,
    # y coordinates in data coordinates
    trans = transforms.blended_transform_factory(
        ax.transAxes, ax.transData)

    # a bar indicting the range of values
    rect = Rectangle((offset, ymin), width=width, height=ymax-ymin,
                     transform=trans, clip_on=False, **patch_kwargs)
    ax.add_patch(rect)

    # delimiters at the start and end of the range mimicking ticks
    min_delimiter = Line2D((offset+width, offset), (ymin, ymin),
                           transform=trans, clip_on=False, **line_kwargs)
    max_delimiter = Line2D((offset+width, offset), (ymax, ymax),
                           transform=trans, clip_on=False, **line_kwargs)
    ax.add_artist(min_delimiter)
    ax.add_artist(max_delimiter)

    # label
    if label:
        x = offset + 0.5 * width
        y = ymin + 0.5 * (ymax - ymin)
        # we need to fix the alignment as otherwise our choice of x
        # and y leads to unexpected results;
        # e.g. 'right' does not align with the minimum_delimiter
        ax.text(x, y, label,
                horizontalalignment='center', verticalalignment='center',
                clip_on=False, transform=trans, **text_kwargs)


def demo():
    fig, ax = plt.subplots(1, 1)

    # add some extra space for the annotations
    fig.subplots_adjust(left=0.2)

    annotate_yrange(0.1, 0.9, 'test')

    plt.show()


def demo_hierarchy():
    fig, ax = plt.subplots(1, 1)
    fig.subplots_adjust(left=0.3)

    width   = -0.1
    offsets = [-0.1, -0.2]
    lower   = [(0.1, 0.3), (0.3, 0.5), (0.5, 0.7), (0.7, 0.9)]
    upper   = [(0.1,             0.5), (0.5,             0.9)]

    for ii, (level, offset) in enumerate(zip((lower, upper), offsets)):
        for jj, (ymin, ymax) in enumerate(level):
            annotate_yrange(ymin, ymax, f'test {ii}.{jj}', offset=offset, width=width)

    plt.show()


if __name__ == '__main__':

    demo()
    demo_hierarchy()

@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 24, 2023
@story645 story645 added the keep Items to be ignored by the “Stale” Github Action label Mar 24, 2023
@story645 story645 removed the status: inactive Marked by the “Stale” Github Action label Mar 24, 2023
@story645
Copy link
Member

This is still very much wanted, especially if we implement hierarchical visualizations like grouped-stacked bar charts #24313

@jklymak
Copy link
Member

jklymak commented Mar 24, 2023

Is there a compelling reason this would need to be in the core Matplotlib versus a third party package?

@story645
Copy link
Member

story645 commented Nov 29, 2023

Is there a compelling reason this would need to be in the core Matplotlib versus a third party package?

Because one potential way of implementing this would be to generalize the core {minor/major} axis hierarchy/formatting/labeling/scaling we already have, which is going to be easier to do internally than as a patching. And because this is functionality many tools built on top of us would like so that they can build their labeling rather than a domain specific request. Which if we can get agreement on a proposed approach, could this be doable as a GSOC or does it require too much understanding of the core API?

@jklymak
Copy link
Member

jklymak commented Nov 29, 2023

Adding a second level of ticking is probably fine for core, and I agree would be easier if we did in Matplotlib. Adding how to automatically decide on that second level of ticking from the structure of data seems too domain specific.

I'm not sure what the second level of ticking API interface would look like though, nor what is to stop us at two levels (versus 3, ..., versus N)

@story645
Copy link
Member

story645 commented Nov 29, 2023

Adding how to automatically decide on that second level of ticking from the structure of data seems too domain specific.

1000% agree. I think Matplotlib should be providing something like the following:

ax.{xyaxis}.set_arbitrary_formatter(name="", formatter object, level)
ax.{xyaxis}.set_arbitrary_locator(name="", formatter object, level)

or don't even touch that level of api and do:

ax.{x,y}axis.add_level(name, level) 
ax.{x,y}axis.level[f'{name}'].set_ticks/set_ticklabels/set_locator/set_formatter

where major and minor are our already defined special cases. And maybe in this case require tick width/height and fontsize/position or compute dynamically relative to a level 0='major', level -1 = 'minor'

Well constrained flexibility basically. (And my advisor is the one who proposed this years ago 😓)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues keep Items to be ignored by the “Stale” Github Action New feature topic: categorical
Projects
None yet
Development

No branches or pull requests

10 participants