-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
DOC: add units to user/explain [ci doc] #26969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dcd3f05
to
3df4d95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+- my usual concern that it doesn't step through the code enough to really break down the concepts for someone who is new to Matplotlib
ETA: positives: I think the intent is spot on - I agree that the user guide should provide a brief overview of what Matplotib means by units & where they show up, and here's the key things you may need to know if the units are misbehaving so you can debug things. And I do think that's all in here, which is awesome.
I just also think the guide would be more helpful if the signposting was a bit cleaner. Like instead of burying in a comment/prose "units sometimes mess up this way", add a section subtitle "inspecting the conversion", and if the structure was a really consistent intro, debug - which is why I think that last section is throwing me for the loop, it's breaking the structure of the rest of the page.
Here we briefly overview the built-in date and string converters. | ||
|
||
Date conversion | ||
=============== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can datetime be put near dates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear what you mean here. The first sentence notes datetime
and datetime64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 217/219 -> batch all the datetime stuff together
@@ -0,0 +1,278 @@ | |||
""" | |||
.. _user_axes_units: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno that it's untuitive that this is in axes-I had a really hard time finding it in the TOC - I know that the implementation is on the axes, but using it feels more akin to the first box of more general functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first box is the quick start guide. This support could/should be mentioned there (if its not already), but no where near in this detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but no where near in this detail
I honestly think this guide is trying to do too many things, which is why I'm harping on more signposting please.
I think the general units stuff should be a sentence or so here and fleshed out in the units tutorial that I think @ksunden is working on. Which that's also on second thought where I'd put the info about convertors and formatters and stuff. That leaves two or three examples of dates and categories for the quick start guide. Basically I think folks generally shouldn't need loads of detail for something we're in theory providing as an automagic unless they're trying to make their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general units stuff should be a sentence or so here and fleshed out in the units tutorial that I think
There is a sentence at the top that says:
The method to add converters to Matplotlib is described in `matplotlib.units`.
Here we briefly overview the built-in date and string converters.
The final section is because people are quite likely to see more general units support in other libraries or even in our examples. I would like to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don';t think folks would think to look for it in Axes cause this is a data thing and everything else is layout or labeling
I disagree that this is not about labelling - that is the major advantage of unit support, providing useful tick locators and formatters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is the major advantage of unit support, providing useful tick locators and formatters.
Yes, but using unitid data hands over control of locators and formatters to the unit implementation. The users interaction with the unit framework is in the data they plot, not in the ticks/labels (annotations) they set. It's analogous to colormapping, which is also mapping data to the unit (color) that the heatmap visualizations can work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but using unitid data hands over control of locators and formatters to the unit implementation. The users interaction with the unit framework is in the data they plot, not in the ticks/labels (annotations) they set.
After passing unitful data to the plotting routines, which the user doesn't control, the only way to interact with the units machinery is via tick locators, formatters, and axis limits. Indeed these are the things that cause confusion, and the main motivation for this section. eg. user uses mplfinances converter which ignores weekends, and then tries to use our Locators and Formatters and get nonsense results. I'd say a lot of people who use datetimes end up setting the Locator and customizing the Formatters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say a lot of people who use datetimes end up setting the Locator and customizing the Formatter
But we have documentation to explain the datetime formatters/convertors. They literally can't use any other locator/formatter if they're using categorical or everything will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which actually, I would support this document focusing on datetime (+ probably consolidating or linking out to some of the other datetime guides) and the general discussion being in what's currently called the quick start guide (and I think should be transitioned into overview).
# General unit support | ||
# ==================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section at the bottom here is sort of distracting - it's repeating a lot of the information from the top, which is where I expect this section to be -> introduce the topic, then show the two specifics that are implemented. I think the most important bits here should be pulled to the top and this section go away. Most important bits being:
- mpl maintains a registry of units
- external libraries can register their convertors
- to see a list, type: print(munits.registery.keys())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
99.9% of users never need to encounter how the unit support takes place or the idea of the registry. I think explaining it at the top is needlessly distracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think explaining it at the top is needlessly distracting.
And I think jumping into "we handle dates and times differently" without a sentence first grounding why is disorienting. Like this whole document is giving all these tricks for debugging units, but then you wait til the bottom to explain what units are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, I think this section can be more or less integrated with the first paragraph and end up at roughly the same length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole document is giving all these tricks for debugging units, but then you wait til the bottom to explain what units are.
The document I submitted does not talk about debugging units until the clearly labelled subsection "Determine converter, formatter, and locator on an axis".
And I think jumping into "we handle dates and times differently" without a sentence first grounding why is disorienting.
The first two paragraphs are about "why".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
he document I submitted does not talk about debugging units
implicitly it does when you describe the expected behavior of datetime and categoricals - they work like all the other functions, so the only reason to tell me is so that when it behaves weird I can go to this doc that tells me "no, it's supposed to behave like normal"
The first two paragraphs are about "why".
No, the first two paragraphs are about "what" Matplotlib does with data. But also I like this section as a standalone and am ok enough w/ the bookending that I can waive this objection.
# The category axes are helpful for some plot types, but can lead to confusion | ||
# if data is read in as a list of strings, even if it is meant to be a list of | ||
# floats or dates. This sometimes happens when reading comma-separated value | ||
# (CSV) files. The categorical locator and formatter will put a tick at every | ||
# string value and label each one as well: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already an FAQ entry, can it please only be in one place?
# The category axes are helpful for some plot types, but can lead to confusion | |
# if data is read in as a list of strings, even if it is meant to be a list of | |
# floats or dates. This sometimes happens when reading comma-separated value | |
# (CSV) files. The categorical locator and formatter will put a tick at every | |
# string value and label each one as well: | |
# The category axes are helpful for some plot types, but can lead to confusion | |
# if data is read in as a list of strings, even if it is meant to be a list of | |
# floats or dates. This sometimes happens when reading comma-separated value | |
# (CSV) files. The categorical locator and formatter will put a tick at every | |
# string value and label each one as well: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It comes up naturally here as well. I don't see the harm in discussing it here in context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means maintaining two sets of documentation, which introduces the documentation maintenance cost of keeping them in sync. I think it's fine to have here, but either this should link to the trouble shooting entry, or (my preference honestly), the troubleshooting reference should be deleted and point folks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need both - this is a super-common problem. I don't think the duplication is at all a maintenance burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the duplication is at all a maintenance burden.
It means that if there's a change to the unit framework that changes this behavior, there are 2 places this information needs to be updated. And there are 2 places to link folks when they ask "hey, what should I do here?", which is the doc equivalent of the API complaint we get all the time about having way too many ways to do the same thing. And we've been working at that on the API side & trying to be more explicit about when to reach for which function & same here - is there a reason that the FAQ needs to carry around a second version of this troubleshooting rather than linking here?
An added bonus of linking the FAQ to this document is that then the user is pointed to it & if they're having trouble with datetimes they can get more context here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood your point the first time. I simply disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll do a follow up PR where I delete the FAQ entry and link here.
https://en.wikipedia.org/wiki/Single_source_of_truth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jody, there the small harm in discussing this in multiple places is out weighed by making any given section readable on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would linking the FAQ entry back to this guide harm readability? the semi-consensus in #17920 was to transitioning the FAQ into more of an alternative TOC, and linking the FAQ entry back into the guide is inline w/ that consensus.
1817f3b
to
b76e921
Compare
|
||
# 0 gets labeled as "apple" | ||
ax.plot(0, 0, 'd', color='C1') | ||
ax.text(0, 0, ' Float x=0', rotation=45, color='C1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ax.text(0, 0, ' Float x=0', rotation=45, color='C1') | |
ax.text(0, 0, ' Float x=0', rotation=45, color='C1', bbox={'color': 'gray', 'alpha': .2, 'boxstyle': 'round'}) |
I would stick a bounding box on all of these. The thin orange text is really hard to read against the blue/white background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The review metric on documentation PR is "does it make things better" and this clearly clears that bar.
I have one suggestion for formatting the "converted value" text labels.
There's no set project standard for better because there's no coordination on the docs which is leading to this stuff gets added and then consolidated/removed cycle that's causing frustration for the folks who want to add more stuff and for the folks who think the priority should be organizing what we have. And doing both clearly isn't working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall 👍, a few comments here and there. I'd be okay with those being followups if we want to get this in, but I'll hold off on hitting the green button to give you a chance to consider them.
converter" exists for the data type. Matplotlib has two built-in converters, | ||
one for dates and the other for lists of strings. Other downstream libraries | ||
have their own converters to handle their data types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically there is also one for decimal.Decimal, though admittedly it is just calling float(val)
.
(This is also included in the printout at the bottom of the page)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thats such a weird edge case, I'd prefer not to mention it. I assume few people use this.
for k in munits.registry: | ||
print(f"type: {k};\n converter: {munits.registry[k]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for k in munits.registry: | |
print(f"type: {k};\n converter: {munits.registry[k]}") | |
for k, v in munits.registry.items(): | |
print(f"type: {k};\n converter: {type(v)}") |
The repr of these clases is not super useful and so I think dropping at least the object at 0x1234ABCD
will improve readability.
Could go even further and do type(...).__name__
, though that loses the namespacing, and don't want to complicate the code to get __module__.__name__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - I admit I didn't figure out an easy way to just say the type, which really should have been more obvious to me ;-)
# Downstream libraries like `pandas <https://pandas.pydata.org>`_, | ||
# `astropy <https://www.astropy.org>`_, `pint <https://pint.readthedocs.io>`_ | ||
# and others supply their own converters that can be used with Matplotlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see expanding this into a couple subheadings, e.g:
The following downstream libraries provide support for physical units integrated with matplotlib:
- astropy
- pint
- unyt
The following provide support for additional date/time formats:
- pandas
- nc-time-axis (used by xarray)
I don't want to get into a comprehensive list, but a little more context as to what is provided by each popular downstream may be helpful to a reader who comes to this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to
# There are a number of downstream libraries that provide their own converters
# with locators and formatters. Physical unit support is provided by
# `astropy <https://www.astropy.org>`_, `pint <https://pint.readthedocs.io>`_, and
# `unyt <https://unyt.readthedocs.io>`_, among others.
#
# High level libraries like `pandas <https://pandas.pydata.org>`_ and
# `nc-time-axis <https://nc-time-axis.readthedocs.io>`_ (and thus
# `xarray <https://docs.xarray.dev>`_) provide their own datetime support.
# This support can sometimes be incompatible with Matplotlib native datetime
# support, so care should be taken when using Matplotlib locators and
# formatters if these libraries are being used.
PR summary
PR checklist