Skip to content

DOC: add units to user/explain [ci doc] #26969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 12, 2023
Merged

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Oct 1, 2023

PR summary

PR checklist

@jklymak jklymak force-pushed the doc-units branch 2 times, most recently from dcd3f05 to 3df4d95 Compare October 1, 2023 23:06
Copy link
Member

@story645 story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+- my usual concern that it doesn't step through the code enough to really break down the concepts for someone who is new to Matplotlib

ETA: positives: I think the intent is spot on - I agree that the user guide should provide a brief overview of what Matplotib means by units & where they show up, and here's the key things you may need to know if the units are misbehaving so you can debug things. And I do think that's all in here, which is awesome.

I just also think the guide would be more helpful if the signposting was a bit cleaner. Like instead of burying in a comment/prose "units sometimes mess up this way", add a section subtitle "inspecting the conversion", and if the structure was a really consistent intro, debug - which is why I think that last section is throwing me for the loop, it's breaking the structure of the rest of the page.

Here we briefly overview the built-in date and string converters.

Date conversion
===============
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can datetime be put near dates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear what you mean here. The first sentence notes datetime and datetime64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 217/219 -> batch all the datetime stuff together

@@ -0,0 +1,278 @@
"""
.. _user_axes_units:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno that it's untuitive that this is in axes-I had a really hard time finding it in the TOC - I know that the implementation is on the axes, but using it feels more akin to the first box of more general functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first box is the quick start guide. This support could/should be mentioned there (if its not already), but no where near in this detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but no where near in this detail

I honestly think this guide is trying to do too many things, which is why I'm harping on more signposting please.
I think the general units stuff should be a sentence or so here and fleshed out in the units tutorial that I think @ksunden is working on. Which that's also on second thought where I'd put the info about convertors and formatters and stuff. That leaves two or three examples of dates and categories for the quick start guide. Basically I think folks generally shouldn't need loads of detail for something we're in theory providing as an automagic unless they're trying to make their own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also to my original point, I don';t think folks would think to look for it in Axes cause this is a data thing and everything else is layout or labeling
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general units stuff should be a sentence or so here and fleshed out in the units tutorial that I think

There is a sentence at the top that says:

The method to add converters to Matplotlib is described in `matplotlib.units`.
Here we briefly overview the built-in date and string converters.

The final section is because people are quite likely to see more general units support in other libraries or even in our examples. I would like to keep it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don';t think folks would think to look for it in Axes cause this is a data thing and everything else is layout or labeling

I disagree that this is not about labelling - that is the major advantage of unit support, providing useful tick locators and formatters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is the major advantage of unit support, providing useful tick locators and formatters.

Yes, but using unitid data hands over control of locators and formatters to the unit implementation. The users interaction with the unit framework is in the data they plot, not in the ticks/labels (annotations) they set. It's analogous to colormapping, which is also mapping data to the unit (color) that the heatmap visualizations can work with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but using unitid data hands over control of locators and formatters to the unit implementation. The users interaction with the unit framework is in the data they plot, not in the ticks/labels (annotations) they set.

After passing unitful data to the plotting routines, which the user doesn't control, the only way to interact with the units machinery is via tick locators, formatters, and axis limits. Indeed these are the things that cause confusion, and the main motivation for this section. eg. user uses mplfinances converter which ignores weekends, and then tries to use our Locators and Formatters and get nonsense results. I'd say a lot of people who use datetimes end up setting the Locator and customizing the Formatters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say a lot of people who use datetimes end up setting the Locator and customizing the Formatter

But we have documentation to explain the datetime formatters/convertors. They literally can't use any other locator/formatter if they're using categorical or everything will break.

Copy link
Member

@story645 story645 Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which actually, I would support this document focusing on datetime (+ probably consolidating or linking out to some of the other datetime guides) and the general discussion being in what's currently called the quick start guide (and I think should be transitioned into overview).

Comment on lines 259 to 260
# General unit support
# ====================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section at the bottom here is sort of distracting - it's repeating a lot of the information from the top, which is where I expect this section to be -> introduce the topic, then show the two specifics that are implemented. I think the most important bits here should be pulled to the top and this section go away. Most important bits being:

  1. mpl maintains a registry of units
  2. external libraries can register their convertors
  3. to see a list, type: print(munits.registery.keys())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99.9% of users never need to encounter how the unit support takes place or the idea of the registry. I think explaining it at the top is needlessly distracting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explaining it at the top is needlessly distracting.

And I think jumping into "we handle dates and times differently" without a sentence first grounding why is disorienting. Like this whole document is giving all these tricks for debugging units, but then you wait til the bottom to explain what units are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also, I think this section can be more or less integrated with the first paragraph and end up at roughly the same length.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole document is giving all these tricks for debugging units, but then you wait til the bottom to explain what units are.

The document I submitted does not talk about debugging units until the clearly labelled subsection "Determine converter, formatter, and locator on an axis".

And I think jumping into "we handle dates and times differently" without a sentence first grounding why is disorienting.

The first two paragraphs are about "why".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

he document I submitted does not talk about debugging units

implicitly it does when you describe the expected behavior of datetime and categoricals - they work like all the other functions, so the only reason to tell me is so that when it behaves weird I can go to this doc that tells me "no, it's supposed to behave like normal"

The first two paragraphs are about "why".

No, the first two paragraphs are about "what" Matplotlib does with data. But also I like this section as a standalone and am ok enough w/ the bookending that I can waive this objection.

Comment on lines +197 to +207
# The category axes are helpful for some plot types, but can lead to confusion
# if data is read in as a list of strings, even if it is meant to be a list of
# floats or dates. This sometimes happens when reading comma-separated value
# (CSV) files. The categorical locator and formatter will put a tick at every
# string value and label each one as well:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already an FAQ entry, can it please only be in one place?

Suggested change
# The category axes are helpful for some plot types, but can lead to confusion
# if data is read in as a list of strings, even if it is meant to be a list of
# floats or dates. This sometimes happens when reading comma-separated value
# (CSV) files. The categorical locator and formatter will put a tick at every
# string value and label each one as well:
# The category axes are helpful for some plot types, but can lead to confusion
# if data is read in as a list of strings, even if it is meant to be a list of
# floats or dates. This sometimes happens when reading comma-separated value
# (CSV) files. The categorical locator and formatter will put a tick at every
# string value and label each one as well:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes up naturally here as well. I don't see the harm in discussing it here in context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means maintaining two sets of documentation, which introduces the documentation maintenance cost of keeping them in sync. I think it's fine to have here, but either this should link to the trouble shooting entry, or (my preference honestly), the troubleshooting reference should be deleted and point folks here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need both - this is a super-common problem. I don't think the duplication is at all a maintenance burden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the duplication is at all a maintenance burden.

It means that if there's a change to the unit framework that changes this behavior, there are 2 places this information needs to be updated. And there are 2 places to link folks when they ask "hey, what should I do here?", which is the doc equivalent of the API complaint we get all the time about having way too many ways to do the same thing. And we've been working at that on the API side & trying to be more explicit about when to reach for which function & same here - is there a reason that the FAQ needs to carry around a second version of this troubleshooting rather than linking here?

An added bonus of linking the FAQ to this document is that then the user is pointed to it & if they're having trouble with datetimes they can get more context here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood your point the first time. I simply disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll do a follow up PR where I delete the FAQ entry and link here.
https://en.wikipedia.org/wiki/Single_source_of_truth

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jody, there the small harm in discussing this in multiple places is out weighed by making any given section readable on its own.

Copy link
Member

@story645 story645 Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would linking the FAQ entry back to this guide harm readability? the semi-consensus in #17920 was to transitioning the FAQ into more of an alternative TOC, and linking the FAQ entry back into the guide is inline w/ that consensus.

@jklymak jklymak force-pushed the doc-units branch 3 times, most recently from 1817f3b to b76e921 Compare October 3, 2023 14:27
@jklymak jklymak marked this pull request as ready for review October 3, 2023 17:19

# 0 gets labeled as "apple"
ax.plot(0, 0, 'd', color='C1')
ax.text(0, 0, ' Float x=0', rotation=45, color='C1')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ax.text(0, 0, ' Float x=0', rotation=45, color='C1')
ax.text(0, 0, ' Float x=0', rotation=45, color='C1', bbox={'color': 'gray', 'alpha': .2, 'boxstyle': 'round'})

I would stick a bounding box on all of these. The thin orange text is really hard to read against the blue/white background.

@tacaswell tacaswell added this to the v3.9.0 milestone Oct 3, 2023
Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review metric on documentation PR is "does it make things better" and this clearly clears that bar.

I have one suggestion for formatting the "converted value" text labels.

@story645
Copy link
Member

story645 commented Oct 4, 2023

The review metric on documentation PR is "does it make things better"

There's no set project standard for better because there's no coordination on the docs which is leading to this stuff gets added and then consolidated/removed cycle that's causing frustration for the folks who want to add more stuff and for the folks who think the priority should be organizing what we have. And doing both clearly isn't working.

Copy link
Member

@ksunden ksunden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall 👍, a few comments here and there. I'd be okay with those being followups if we want to get this in, but I'll hold off on hitting the green button to give you a chance to consider them.

Comment on lines +15 to +17
converter" exists for the data type. Matplotlib has two built-in converters,
one for dates and the other for lists of strings. Other downstream libraries
have their own converters to handle their data types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically there is also one for decimal.Decimal, though admittedly it is just calling float(val).

(This is also included in the printout at the bottom of the page)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thats such a weird edge case, I'd prefer not to mention it. I assume few people use this.

Comment on lines 278 to 279
for k in munits.registry:
print(f"type: {k};\n converter: {munits.registry[k]}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for k in munits.registry:
print(f"type: {k};\n converter: {munits.registry[k]}")
for k, v in munits.registry.items():
print(f"type: {k};\n converter: {type(v)}")

The repr of these clases is not super useful and so I think dropping at least the object at 0x1234ABCD will improve readability.

Could go even further and do type(...).__name__, though that loses the namespacing, and don't want to complicate the code to get __module__.__name__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I admit I didn't figure out an easy way to just say the type, which really should have been more obvious to me ;-)

Comment on lines 283 to 285
# Downstream libraries like `pandas <https://pandas.pydata.org>`_,
# `astropy <https://www.astropy.org>`_, `pint <https://pint.readthedocs.io>`_
# and others supply their own converters that can be used with Matplotlib.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see expanding this into a couple subheadings, e.g:

The following downstream libraries provide support for physical units integrated with matplotlib:

  • astropy
  • pint
  • unyt

The following provide support for additional date/time formats:

  • pandas
  • nc-time-axis (used by xarray)

I don't want to get into a comprehensive list, but a little more context as to what is provided by each popular downstream may be helpful to a reader who comes to this page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

# There are a number of downstream libraries that provide their own converters
# with locators and formatters.  Physical unit support is provided by
# `astropy <https://www.astropy.org>`_, `pint <https://pint.readthedocs.io>`_, and
# `unyt <https://unyt.readthedocs.io>`_, among others.
#
# High level libraries like `pandas <https://pandas.pydata.org>`_ and
# `nc-time-axis <https://nc-time-axis.readthedocs.io>`_ (and thus
# `xarray <https://docs.xarray.dev>`_) provide their own datetime support.
# This support can sometimes be incompatible with Matplotlib native datetime
# support, so care should be taken when using Matplotlib locators and
# formatters if these libraries are being used.

@jklymak jklymak marked this pull request as draft October 4, 2023 18:08
@jklymak jklymak marked this pull request as ready for review October 4, 2023 22:24
@ksunden ksunden merged commit cf0116b into matplotlib:main Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants