Skip to content

ENH: new date formatter #10841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 5, 2019
Merged

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Mar 19, 2018

PR Summary

Update 22 Sep 2018: New localization mechanism.

This can now be localized. Its a bit verbose because there are basically 3*7 = 21 formatting strings. 7 for the normal ticks, 7 for the ticks that are major time divisions (i.e. if ticks are 22 Mar, 1 Apr, 8 Apr, the tick labels will be "22", "Apr", "8". ). I think the default is good, and ISO compliant. The verbose lists of strings allow localization with a bit of work. I include an example of how to create a trivial subclass to encapsulate that customization. Maybe someone more clever than me can come up with a better way to do this.

..its described in detail at https://13776-1385122-gh.circle-artifacts.com/0/home/circleci/project/doc/build/html/gallery/ticks_and_spines/date_concise_formatter.html#sphx-glr-gallery-ticks-and-spines-date-concise-formatter-py

[x] still needs tests

ping @anntzer and @ImportanceOfBeingErnest who were both interested in how this would localize.

Update 4 July 2018

Renamed ConciseDateFormatter and included examples of how to use it.

I think mechanisms or dicussion of whether it shoudl be the defautl can be deferred. Certainly its nice to be able to set the date format manually as pointed out by @ImportanceOfBeingErnest below, and the current method allows this, whereas this Formatter has no localization like that.

...

This is a proposed new date formatter. I find the Automatic one less than useful, though I appreciate the functional flexibility of the default one.

I'd argue the default should be something like this, but apprecaite we are conservative on such matters.

This isn't ready for serious review attention yet, but thoughts on the API or the result are welcome...

I'll note a couple fo flaws (in my opinion) of the default locator. As per #9801, interval_multiples=True should be the default. Secondly, it would be nice if the locator only located the first and 15th of the month rather than 1, 15, 29.

EDIT: added extra info in the "offset" region of the axes.

import datetime
import numpy as np
import matplotlib
matplotlib.use('Qt5Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as mdates


fig, axs = plt.subplots(2, 3, figsize=(8,8), constrained_layout=True)
axs = axs.flatten()
if False:
    for ax in axs:
        locator = mdates.AutoDateLocator(interval_multiples=True)
        formatter = mdates.ConciseDateFormatter(locator)
        ax.yaxis.set_major_locator(locator)
        ax.yaxis.set_major_formatter(formatter)
    fname = 'New.png'
else:
    for ax in axs:
        locator = mdates.AutoDateLocator(interval_multiples=True)
        formatter = mdates.AutoDateFormatter(locator)
        ax.yaxis.set_major_locator(locator)
        ax.yaxis.set_major_formatter(formatter)
    fname = 'Old.png'

ax = axs[0]

t0 = datetime.datetime(2009, 1, 20)
tf = datetime.datetime(2009, 1, 21)
ax.axhspan(t0, tf, facecolor="blue", alpha=0.25)
ax.set_ylim(t0 - datetime.timedelta(days=5),
            tf + datetime.timedelta(days=5))

ax.set_title('Days', loc='Right')

ax = axs[1]
t0 = datetime.datetime(2009, 1, 20)
tf = datetime.datetime(2009, 3, 21)
ax.axhspan(t0, tf, facecolor="blue", alpha=0.25)
ax.set_ylim(t0 - datetime.timedelta(days=15),
            tf + datetime.timedelta(days=15))

ax.set_title('2 Months', loc='Right')

ax = axs[2]
t0 = datetime.datetime(2008, 8, 20)
tf = datetime.datetime(2010, 3, 21)
ax.axhspan(t0, tf, facecolor="blue", alpha=0.25)
ax.set_ylim(t0 - datetime.timedelta(days=105),
            tf + datetime.timedelta(days=105))

ax.set_title(' Months/Y', loc='Right')


ax = axs[3]
t0 = datetime.datetime(2005, 8, 20)
tf = datetime.datetime(2015, 3, 21)
ax.axhspan(t0, tf, facecolor="blue", alpha=0.25)
ax.set_ylim(t0 - datetime.timedelta(days=1005),
            tf + datetime.timedelta(days=1005))

ax.set_title(' Years', loc='Right')

ax = axs[4]
t0 = datetime.datetime(2009, 8, 20)
tf = datetime.datetime(2009, 8, 21)
ax.axhspan(t0, tf, facecolor="blue", alpha=0.25)
ax.set_ylim(t0 - datetime.timedelta(hours=3),
            tf + datetime.timedelta(hours=3))

ax.set_title(' Hours', loc='Right')

ax = axs[5]
t0 = datetime.datetime(2009, 8, 20, 9)
tf = datetime.datetime(2009, 8, 20, 10)
ax.axhspan(t0, tf, facecolor="blue", alpha=0.25)
ax.set_ylim(t0 - datetime.timedelta(minutes=3),
            tf + datetime.timedelta(minutes=3))

ax.set_title(' Minutes', loc='Right')


fig.savefig(fname)
plt.show()

Before:

old

After:

new

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak jklymak added this to the v3.0 milestone Mar 19, 2018
@ImportanceOfBeingErnest
Copy link
Member

It may be that I misunderstand something here, but shouldn't the date format be determined though the rcParams date.autoformatter.xyz parameters?

@jklymak
Copy link
Member Author

jklymak commented Mar 20, 2018

@ImportanceOfBeingErnest

The current formatters are:

# date.autoformatter.year     : %Y
# date.autoformatter.month    : %Y-%m
# date.autoformatter.day      : %Y-%m-%d
# date.autoformatter.hour     : %m-%d %H
# date.autoformatter.minute   : %d %H:%M
# date.autoformatter.second   : %H:%M:%S
# date.autoformatter.microsecond   : %M:%S.%f

and which one gets used is set by logic inside dates.py that chooses which interval is most appropriate. But only one formatter gets used for every tick once that formatter is chosen.

This PR takes into account surrounding ticks and changes the format accordingly. For instance, its extra info to repeat 2009-01- for all the ticks in the "Days" example above, or the date in the "Hours" example. Taking into account the surrounding ticks to minimize visual clutter is what this PR does.

It would make sense for this PR to have a way to specify the format of the individual date elements - there are already placeholders for that. i.e.

self._yearfmt = '%Y'
self._monthfmt = '%b'
self._dayfmt = '%d'
# etc...

but I've not added RC params or kwargs yet.

It would of course be necessary to get the old behaviour back, either by leaving it the default, or by leaving instructions:

locator = mdates.LegacyAutoDateLocator()
formatter = mdates.LegacyAutoDateFormatter(locator)
ax.yaxis.set_major_locator(locator)
ax.yaxis.set_major_formatter(formatter)

@ImportanceOfBeingErnest
Copy link
Member

I see. Just note that it is very convenient to set the preferred format once through the rcParams and have every plot the way you want it. So if that is going to change, intruducing a parameter which reverts back would definitely be useful, date.autoformatter.use : old or similar.

@jklymak
Copy link
Member Author

jklymak commented Mar 20, 2018

@ImportanceOfBeingErnest That makes sense to me.

It sounds like we are also moving to a "startup-file" like startup sequence, so folks will be able to put slightly more sophisticated recipes in their new matplotlibrc.py, like changing the default date handler. However I'm not sure if the architecture of that file is going to just affect rcParams, or arbitrary code. I think the former, so some way for rcParams to specify the DateFormatter is what will be needed. ie. rcParams['dates.locator']=mdates.LegacayAutoDateLocator

@QuLogic
Copy link
Member

QuLogic commented Mar 21, 2018

This PR takes into account surrounding ticks and changes the format accordingly. For instance, its extra info to repeat 2009-01- for all the ticks in the "Days" example above, or the date in the "Hours" example. Taking into account the surrounding ticks to minimize visual clutter is what this PR does.

This seems to be something used downstream in ObsPy (e.g., on waveform plots). I have not looked at or compared either implementation though.

@jklymak
Copy link
Member Author

jklymak commented Mar 21, 2018

@QuLogic I think ObPy just makes the first tick have more info, but doesn't try to be intelligent about
which tick gets the extra info. This PR makes, for instance, midnight be labeled by the day, or second zero labeled by HH:MM, or 1 Jan by YYYY, etc. Note I've mildly improved the plot above...

@story645
Copy link
Member

Working through the example, I think my question may be more of what resolution does the formatter pick up?

fig, ax = plt.subplots()
t0 = datetime.datetime(2009, 1, 20)
tf = datetime.datetime(2009, 1, 21)
ax.axvspan(t0, tf, facecolor="blue", alpha=0.25)
ax.set_xlim(t0 - datetime.timedelta(days=5), tf + datetime.timedelta(days=5))
ax.vlines([datetime.datetime(2009,1,17,5) +
           datetime.timedelta(hours=x) for x in range(0,200,25)], 0, 1)
ax.set_ylim(0,1)
plt.show()

index

@jklymak
Copy link
Member Author

jklymak commented Mar 26, 2018

@story645 It returns:

figure_1-3

Note this doesn't change the locator (which chooses the ticks), it sets the formatter.

@anntzer
Copy link
Contributor

anntzer commented Mar 26, 2018

Actually, coming back to the comment about localization mentioned during the call:

  1. In the "hours" example, can "Aug-21" be replaced by "Aug 21"? (looks better even in English) (and in general, perhaps just get rid of the dashes?)
  2. in French, that gives "août-21" and "2009-août-20" ("minutes" example), both of which are quite weird in French (I'd say "21 août", "21 août 2009").

A quick search did not yield any way to obtain localized separators or formats (especially "date without year") though :/ Not saying this should be a blocker, but just raising the issue.

@jklymak
Copy link
Member Author

jklymak commented Mar 26, 2018

I'm fine w/o the dashes. Easy change.

I'm personally against any formats that don't go year month day. As someone who writes hand written logs, nothing drives me nuttier than seeing "12-11-10" in a log book I know was written in November 2010, even though I would totally say 12 November 2010 in speech; written and spoken speech and scientific notation on a plot can/should be different.

@story645
Copy link
Member

@jklymak thanks! This only works with the major locator? (And I know that's probably a question about the autoformatter in general...)

@jklymak
Copy link
Member Author

jklymak commented Mar 26, 2018

Ummm... I think you can pass whatever formatter you want:

ax.yaxis.set_minor_formatter(formatter)

I think to make nicely formatted labels for minor date ticks would be a challenge given that they are already too big for normal axes sizes!

I think having a decent minor locator would be a fun challenge, but one thing at a time ;-)

@jklymak
Copy link
Member Author

jklymak commented Mar 27, 2018

This was largely met w/ approval on the call to turn this into AutoDateFormatter, and move the current AutoDateFormatter to a Mpl22DateFormatter so folks could still access it. But some key folks weren't on the call. @tacaswell and @efiring, or anyone else want to weigh in before I do all the work of changing it?

@ImportanceOfBeingErnest I'll still try to figure out the best API for getting the old behaviour back, and ping you for your opinion.

@ImportanceOfBeingErnest
Copy link
Member

Since my opinion on this is being asked for: I would not change the name. AutoDateFormatter is the automatic formatter which labels the ticks according to the rules given in the date.autoformatter.xyz rcParams. I would leave it at this, because this is how people use it now and will still want to use it. There are all kinds of useful as well as weird standards for datetime formats and indeed as @anntzer commented, different language backgrounds need different formatting capabilities. They might have set their rcParams up to do the plots they like that way and I think one should not require them to change anything more than possibly add a single additional rcParam to get the old behaviour back.

In view of that I could imagine to allow the rcParam date.formatter : auto to get the AutoDateFormatter.

Concerning the new formatter: I'm not sure if I grasp this correctly. Does it not allow to specify any custom formats for individual components like days, minutes, years etc, or is it still not decided? As the overall aim of the new formatter seems to be to have nicely looking plots I could imagine to name it PrettyDateFormatter. Setting this would then be done via the rcParam date.formatter : pretty, which would also be the default if that is what people agree upon.

@jklymak
Copy link
Member Author

jklymak commented Mar 27, 2018

Thanks - I'll certainly make it so the current behaviour is achievable with rcParams.

As written the new formatter does not allow you to pass different formats for the elements. It'd be nice if it did. My tendancy would be to try to make it fit into one rcParam versus six or seven, but I'd have to think about whether parsing a format string is the best way to do that or passing a dict. Kind of needs @anntzer new rcParam methodology in place to see how that'll work 😉

@jklymak jklymak modified the milestones: v3.0, v3.1 Jul 4, 2018
@jklymak jklymak force-pushed the enh-new-date-formatter branch 2 times, most recently from 8d3014d to 3718266 Compare July 4, 2018 21:07
@jklymak jklymak force-pushed the enh-new-date-formatter branch from 26ff858 to c59b5a4 Compare November 25, 2018 23:17
in future versions of Matplotlib. Please report any issues or
suggestions for improvement to the github repository or mailing list.

First, this is the default formatter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this line be put after imports&data and right before the figure? It seems a bit odd here by itself.


#############################################################################
# If all calls to axes that have dates are to be made using this converter,
# it is probably most convenient to use the units reigistry where you do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo:

Suggested change
# it is probably most convenient to use the units reigistry where you do
# it is probably most convenient to use the units registry where you do


#############################################################################
# The default date formater is quite verbose, so Matplotlib has an
# alternative that can be chosen with a few extra lines of code. Note,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should mention what the alternative is here; it may be in the code, but that doesn't mean it's obvious what's important.

# manipulating one of three lists of strings.
#
# The ``formatter.formats`` list of formats is for the normal tick labels,
# There are seven levels, years, months, days, hours, minutes, seconds, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# There are seven levels, years, months, days, hours, minutes, seconds, and
# There are seven levels: years, months, days, hours, minutes, seconds, and

# ticks that are "zeros". These are tick values that are either the first of
# the year, month, or day of month, or the zeroth hour, minute, or second.
# These are usually the same as the format of
# the ticks a level above. e.g. if the axis limts mean the ticks are mostly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflow.
Also, "e.g." cannot start a sentence; use "For example".
typo: limts

self._formatter = DateFormatter(fmt, self._tz)
tickdate = np.array([num2date(tick).timetuple()[:6]
for tick in ticks])
tickdatetime = [num2date(tick) for tick in ticks]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you flip the order of these two lines, you won't need to call num2date twice and create duplicate datetime objects, I think?

Suggested change
tickdatetime = [num2date(tick) for tick in ticks]
tickdatetime = [num2date(tick) for tick in ticks]
tickdate = np.array([tick.timetuple()[:6]
for tick in tickdatetime])

else:
fmt = fmts[level]
else:
# special handling for sconds + microseconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# special handling for sconds + microseconds
# special handling for seconds + microseconds

# now loop through and decide the actual ticklabels
zerovals = [0, 1, 1, 0, 0, 0, 0]
ticknew = []
for nn in range(len(tickdate)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be zipd, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you don't seem to need nn: for something, something_else in zip(tickdate, tickdatetime): (I'm just not sure what to name those two.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, but I find that actually makes me have to read the code more than what this does. However, to make things better, I got rid of the append and pre-allocate the list of string first (and hence need the index to know which string to populate).

Return the :class:`~matplotlib.units.AxisInfo` for *unit*.

*unit* is a tzinfo instance or None.
The *axis* argument is required but not used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, well... Thats just copied over from DateConverter 😉

return sts

d1 = datetime.datetime(1997, 1, 1)
results = ([datetime.timedelta(weeks=52 * 200),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using test parametrization seems the better option here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but all the other similar tests are written this way, and I', not going to change all of them, particularly given my lack of comfort with parameterized tests...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fairly straightforward rearrangement of the for-loop, essentially:

@pytest.mark.parametrize(
    't_delta expected',
    # Here put the list/tuple of things that was in `results`.
)
def test_concise_formatter(t_delta, expected):
     d1 = datetime.datetime(1997, 1, 1)
     d2 = d1 + t_delta
    # And then put the rest of `_create_auto_date_locator`, but without the function.

You could do more fancy things with ids or pytest.param, but it's not really required here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem any more readable to me. I guess there is some advantage to parameterizing tests from the point of view of running them all rather than having them fail sequentially, but....

@jklymak jklymak force-pushed the enh-new-date-formatter branch from c59b5a4 to ec5fc34 Compare November 26, 2018 00:43
@jklymak
Copy link
Member Author

jklymak commented Nov 26, 2018

Thanks @QuLogic - one of your comments I didn't grok, and a couple of them I think can be left to future efforts (parameterizing the tests)....

@jklymak jklymak force-pushed the enh-new-date-formatter branch 2 times, most recently from 9b7966b to 211fc06 Compare November 26, 2018 05:22
@jklymak
Copy link
Member Author

jklymak commented Nov 29, 2018

Pinging for this to be merged...

locator_unit_scale = 1.0
ticks = self._locator()
if pos is not None:
if not np.array_equal(ticks, self._oldticks):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get this correctly, _oldticks/_oldlabels is an optimization to avoid recomputing all the ticks when the formatter gets called with pos=0, pos=1, pos=2, etc.
But that also means that if one assigns to formats/offset_formats/zero_formats, the labels don't get invalidated, so the assignment has no effect... until one changes the axes limits enough to make the ticks change.
I guess the options are either to make formats/... properties such that assigning to them invalidates _oldlabels, or also keep track of _oldformats, or use some helper that goes through functools.lru_cache(1), or just not worry about the performance cost of doing the formatting multiple times.
(FWIW I think the formatter API is not optimal here; transforming tick positions to tick labels should be done as a single method call that takes all tick positions at the same time and returns all labels.)

Copy link
Member Author

@jklymak jklymak Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it works w properties. OTOH, the API is not what I'd have expected. The "setter" doesn't get called if I do something like:

formatter.format[0] = '%y'

and instead you must do:

fmt = formatter.format
fmt[0] = '%y'
formatter.format = fmt

which I guess makes sense, but is not what I think would happen if it was just a bare class variable.

So, should I just track the old formats? Thats a bit ugly, and at that point I'm not convinced its not just as fast to reformat everything for each call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I failed to notice the problem with mutability, which means you need to track the old formats (in a local copy, as the lists can get modified in place...).
I agree with just redoing the formatting n times for now and change that if someone ever comes up with a use case where the performance cost matters...

The setter doesn't get called as foo.bar[0] = baz is equivalent to tmp = foo.bar; tmp[0] = baz so you call the getter, not the setter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the other thing we could do is add set_formats, get_formats etc and make self.formats private. Then it will be clear that you have to pass the whole list of formats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that just seems to make the API more awkward to work around what may not even be a real performance issue.

@efiring
Copy link
Member

efiring commented Nov 29, 2018 via email

@anntzer
Copy link
Contributor

anntzer commented Nov 29, 2018

It's not particularly hard (keep track of the various foo_formats in copies and see whether things have been invalidated); the question is just whether the whole thing is worth it.

Of course the datetime formatters already break the underlying assumption that formatters operate independently of locators and format each value independently of the others: the formatter proposed by this PR needs to call (n times... even with the fix above) self._locator() to know what are all the tick values before formatting each of them (or, with the fix above, to see whether invalidation occurred). Which again argues that formatters should just grow a format_tick_labels(self, ticks) (name up to bikeshedding) method that computes all the labels in one go. Actually now that I think of it, it's not even that hard to do the API change while providing a deprecation period for third-party formatters (if there's no format_tick_labels, then fallback on calling __call__ n times after emitting a DeprecationWarning).

@jklymak jklymak force-pushed the enh-new-date-formatter branch from ab328bd to 46305a8 Compare November 29, 2018 22:20
@jklymak
Copy link
Member Author

jklymak commented Nov 29, 2018

Added the check. It wasn't the hardness of it that I was worried about, it was just the performance. But the algorithm itself is a bit of a pig, so I think its worth adding the checks.

FIX: allow non-unit aware tickers

FIX: make str list length more strict

DOC

FIX: add kwargs to ConciseDateConverter

FIX: fix cursor formatter

FIX: cleanup loop logic

FIX: cleanup append
@jklymak jklymak force-pushed the enh-new-date-formatter branch from 3cd7b6c to a76c70f Compare January 1, 2019 17:36
@jklymak
Copy link
Member Author

jklymak commented Jan 1, 2019

Rebased. I think this is OK to merge assuming it all passes CI

@jklymak
Copy link
Member Author

jklymak commented Jan 5, 2019

This has two approvals - would someone do the honours and merge it?

@anntzer
Copy link
Contributor

anntzer commented Jan 5, 2019

Let's roll.

@anntzer anntzer merged commit ddde7e8 into matplotlib:master Jan 5, 2019
@jklymak
Copy link
Member Author

jklymak commented Jan 6, 2019

Thanks @anntzer (ahem, particularly as you don't use date times very much 😉)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants