Skip to content

ENH: add variable epoch #15008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 29, 2020
Merged

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Aug 8, 2019

PR Summary

EDIT: 9 April 2020:

The default epoch is now "1970-01-01T00:00:00". This has microsecond resolution for dates pithing 70 years of either side of 1970. A more recent epoch would likely be better, but this is the UNIX standard, and is hence defendable. (per @efiring: #15008 (review))

There is a set_epoch, and an rcParam date.epoch that take datetime64-compatible strings. Must be called before the epoch is used, otherwise a sentinel is tripped to stop confusion (per @ImportanceOfBeingErnest and @tacaswell #15008 (comment)).

Who will get tripped up by this? Anyone who stored their data as floats in the old epoch. But that is relatively easy to fix:

neword = oldord + mdates.date2num(np.datetime64('0000-12-31'))

EDIT: 14 Oct 2019:

So, big question is if we want to change the default epoch, and how we can do that w/o breaking many users....

EDIT: Updated 9 Aug, 2019

Allow a variable epoch via a mdates.set_epoch('2001-01-01') so matplotlib dates can be relative to that instead of 0001-01-01. This allows folks who insist on driving down to micro-seconds a bit more leeway to do so precisely.

Note I'm also depreciating epoch2num and num2epoch. I don't know why they are there, and they seem to refer to 1970 epochs, which is of course not the only epoch you can have, and isn't terribly relevant to python dates.

[To come: I think its totally possible to change the locators to get down to whatever resolution the user wants. How the formatters handle that is another issue, but using datetime64 and a close enough epoch we can get pretty small. But this PR still improves things for the current situation]

i.e.

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np 

x = np.arange('2000-01-01T00:00:00.0', '2000-01-01T00:00:00.000100',
              dtype='datetime64[us]')
y = np.arange(0, len(x))
fig, ax = plt.subplots(constrained_layout=True)
ax.plot(x, y)
ax.set_title('Epoch: ' + mdates.get_epoch())
plt.setp(ax.xaxis.get_majorticklabels(), rotation=40)
plt.show()

Before

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np 

mdates.set_epoch('1999-01-01')

x = np.arange('2000-01-01T00:00:00.0', '2000-01-01T00:00:00.000100',
              dtype='datetime64[us]')
y = np.arange(0, len(x))
fig, ax = plt.subplots(constrained_layout=True)
ax.plot(x, y)
ax.set_title('Epoch: ' + mdates.get_epoch())
plt.setp(ax.xaxis.get_majorticklabels(), rotation=40)
plt.show()

After

Addresses: #7138

Note, this also gets rid of the restriction of matplotlib times must be greater than 1, because it no longer uses the datetime.toordinal function, but rather just specifies the epoch in numpy.datetime64. Of course this might mean folks will plot data far off where they think they want it to go, but...

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak jklymak added this to the v3.3.0 milestone Aug 8, 2019
@jklymak jklymak force-pushed the enh-add-variable-epoch branch from 12f8f99 to 8419417 Compare August 8, 2019 20:03
@jklymak jklymak changed the title WIP: ENH: add variable epoch ENH: add variable epoch Aug 8, 2019
@jklymak jklymak marked this pull request as ready for review August 8, 2019 20:12
@ImportanceOfBeingErnest
Copy link
Member

I find it pretty dangerous that depending on when you set epoch you may get totally different outcomes and potentially inconsistent results. Especially if you think about some package that you import setting the epoch - a nightmare to debug.

@jklymak
Copy link
Member Author

jklymak commented Aug 9, 2019

You shouldn't get "totally different outcomes", unless you mix datetime with floats in your code (and you change the epoch). Of course you get different outcomes if one of them is broken, but thats what this is meant to fix

You can always check the epoch with mdates.get_epoch() if something funky starts going on.

If we want sub-millisecond plotting (which you kind of want for zooming) then this is the way to do it.

The ultimate problem is that the original choice was to make the epoch year 0001 which is too far away for modern dates to get much floating point resolution.

@ImportanceOfBeingErnest
Copy link
Member

Clearly, it's not the way to do it; it's one way to do it. Pandas uses a totally different approach -which is also non-ideal and subject to a lot of confusion. But e.g. the case from above works out of the box in pandas.

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt

x = np.arange('2000-01-01T00:00:00.0', '2000-01-01T00:00:00.000100',
              dtype='datetime64[us]')
y = np.arange(0, len(x))

df = pd.DataFrame({"XX" : x, "YY" : y})
fig, ax = plt.subplots(constrained_layout=True)
df.plot(x="XX", y="YY")
plt.show()

image


Concerning totally different outcomes, what I mean is that you may e.g. (re)set epoch too early, like

x = np.arange('2000-01-01T00:00:00.0', '2000-01-01T00:00:00.000100',
              dtype='datetime64[us]')
y = np.arange(0, len(x))

mdates.set_epoch('1999-01-01')
fig, ax = plt.subplots(constrained_layout=True)
ax.plot(x, y)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M:%S.%f\n%Y-%m-%d"))

mdates.reset_epoch()

plt.show()

resulting in the axes showing dates from when Jesus was born,

image

In this case it's rather obvious, but as said, some module you import in cell 32 of your notebook may (re)set epoch and it would still affect the plot you started in cell 2 and show in cell 35.

Similarly you may read in some data without having set an epoch, and only set it later before plotting,

x,y = np.loadtxt(..., 
                 converters={0: mdates.datestr2num})

#lots of other stuff

mdates.set_epoch('1999-01-01')
plt.plot(x,y)

@tacaswell
Copy link
Member

I had the same thought as @ImportanceOfBeingErnest walking home from the train, probably need to bind the epoch to the axes some how?

@jklymak
Copy link
Member Author

jklymak commented Aug 9, 2019

x,y = np.loadtxt(..., 
                 converters={0: mdates.datestr2num})

#lots of other stuff

mdates.set_epoch('1999-01-01')
plt.plot(x,y)

is an interesting one. I didn't know people did that. Binding the epoch to the axes won't help there (and is in fact why the epoch needs to be a global). Is there an advantage to the user using datestr2num instead of one of the standard libraries and keeping their dates as datetime?

What does Pandas do?

I guess one approach would be to sentinel the epoch so that once its used anywhere, set_epoch throws an error (or a warning and doesn't change the epoch). ie. you can really only call it before you use anything from dates.py.

@jklymak
Copy link
Member Author

jklymak commented Aug 9, 2019

I guess also to be clear, this is really not meant to be used willy-nilly. But we do get issues where people complain about microseconds not plotting correctly, and its kind of a shame because we have chosen a poor epoch for 99% of the people who want microsecond accuracy (not many folks were tracking microseconds back in year 0001). I really expect this to be a seldom used feature.

@anntzer
Copy link
Contributor

anntzer commented Aug 9, 2019

In #7138 (comment) I suggested making this callable only once, and only before any call to date2num (if a library calls it before you, that's too bad -- we could even explicitly discourage other libraries to call it, suggesting that the use should be reserved to end users). Perhaps having a per-axes epoch would be even better, but that's likely much harder to do properly...

@timhoffm
Copy link
Member

timhoffm commented Aug 9, 2019

Perhaps having a per-axes epoch would be even better, but that's likely much harder to do properly...

If you go that way, probably a per-axis epoch ...

@jklymak
Copy link
Member Author

jklymak commented Aug 9, 2019

A sentinel is easy enoiugh except it makes writing the tests and docs really hard because I need to change the epoch which then would expose to users the private method to reset the epoch.

A per-axis epoch is basically a new locator/formatter. Then the date2num etc could go be methods on the locator as would the epoch. We could easily go that route but then the new locators are not backwards compatible with the old ones and we have the rigamarole of making the new ones standard. Ie I vastly prefer ConciseDateFormatter to the default AutoDateFormatter, but it’s hard to see how we will ever make it the default.

@anntzer
Copy link
Contributor

anntzer commented Aug 9, 2019

The tests could use a private method to reset it, I agree it's less nice for the docs :/

@jklymak
Copy link
Member Author

jklymak commented Aug 9, 2019

This new version has a sentinel. So:

import matplotlib.date as mdate
import numpy as np 
import matplotlib.pyplot as plt 

x = np.arange('2000-01-01T00:00:00.0', '2000-01-01T00:00:20.000100',
              dtype='datetime64[s]')
y = np.arange(0, len(x))
x2 = np.arange('2001-01-01T00:00:00.0', '2001-01-01T00:00:20.000100',
              dtype='datetime64[s]')
y2 = np.arange(0, len(x))
fig, ax = plt.subplots()
ax.plot(x, y)
mdates.set_epoch('1990-01-01')
ax.plot(x2, y2)

fails w/

  File "testepoch.py", line 13, in <module>
    mdates.set_epoch('1990-01-01')
  File "/Users/jklymak/matplotlib/lib/matplotlib/dates.py", line 253, in set_epoch
    raise RuntimeError('set_epoch must be called before dates plotted.')
RuntimeError: set_epoch must be called before dates plotted.

The docs need to call _reset_epoch, and just have a huge caveat about doing so.

@jklymak
Copy link
Member Author

jklymak commented Aug 11, 2019

ping @efiring, as having some interest in dates

@efiring
Copy link
Member

efiring commented Aug 11, 2019

There is a lot here, and I appreciate the work. I am sympathetic to the idea of providing a way to retreat from the unfortunate choices of epoch used by mpl and by Matlab, for that matter. This is a bit like a convention that I started very long ago for my work with shipboard ADCP data: using floating-point time in days (which I call "decimal days") relative to the start of a specified year (which I call the "yearbase"). It also fits in with many satellite-based data products which typically give time in days since the start of 1950. Other data products use other years. Unix uses seconds since the start of 1970. In my understanding, the "epoch" in such a case is the start of the specified year, so noon on January 1 of the epoch year is 0.5 days. I think this is consistent with the usage in https://en.wikipedia.org/wiki/Epoch, for example.

In mpl datenums, noon on January 1 of the year 1 is 1.5, so I would have to say that the real epoch for mpl is December 31 on 1 BCE (since there is no year zero). But this conflicts with the way you are using "epoch", claiming that the mpl default is January 1 of the year 1. This is a recipe for confusion.

Putting aside the question of what the epoch really is, some of the motivations for this change seem contrived, though. Is there a real use case for zooming from dates to microseconds, especially given that all of the date-time formats in question are locally incorrect at the 1-second level because of leap-seconds?

My experience with my decimal day and yearbase also makes me cautious here. I found that I had to be very careful to keep track of the yearbase, adding complexity to the code. There were valid reasons for the design when it was developed, but if I were starting now I would probably pick a single epoch and stick with it.

@jklymak
Copy link
Member Author

jklymak commented Aug 12, 2019

Well the other application is people who have floats in seconds since 1970 (for instance). In order to get in mTpltolib yeardays they have to use datetime anyways to figure out how many days to add to that number. Or, we could just do it for them by letting them set the epoch. Or the example you just gave of yeardays 2017. Right now they need to know the offset

@jklymak
Copy link
Member Author

jklymak commented Oct 1, 2019

Discussed this briefly on the 1 Oct call. This seems a reasonable way forward.

While I agree that going to sub-seconds looses absolute accuracy, its the precision you usually want at such small scales.

@jklymak jklymak force-pushed the enh-add-variable-epoch branch 2 times, most recently from dd0f7ea to a1ec8a1 Compare October 14, 2019 19:10
@jklymak
Copy link
Member Author

jklymak commented Oct 14, 2019

I don't think the test failures are from this PR, but could be misunderstanding...

@jklymak
Copy link
Member Author

jklymak commented Apr 20, 2020

Putting back in draft mode to better understand #15008 (comment)

@jklymak jklymak marked this pull request as ready for review April 20, 2020 22:08
@jklymak jklymak force-pushed the enh-add-variable-epoch branch from f91b603 to fd4fc70 Compare April 21, 2020 21:06
@@ -229,6 +229,7 @@ date.autoformatter.hour : %H:%M:%S
date.autoformatter.minute : %H:%M:%S.%f
date.autoformatter.second : %H:%M:%S.%f
date.autoformatter.microsecond : %H:%M:%S.%f
date.epoch : 0000-12-31T00:00:00 # old epoch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add date.epoch to styles because I see it as a functional property not a style property.

Btw. what would happen if I change the style later after having already used date operations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Styles are just rcParams. The tests are run using the classic style. This is how you tell all the tests to run with the old epoch. The alternative is changing every date test which I'm fine with doing, but there will be a lot of changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def mpl_test_settings(request):

That is where we do the setup / clean up if we want to force this in the tests.

I think I agree that this is not something that should be in the style sheets (and probably should be added to the banned list).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, blacklisted and changed the tests to use the new epoch.

@jklymak jklymak force-pushed the enh-add-variable-epoch branch 2 times, most recently from 1095ba5 to 2ae42f5 Compare April 28, 2020 00:49
Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know too much about datetime in Matplotlib, so these comments are not really about it.

@QuLogic
Copy link
Member

QuLogic commented Apr 28, 2020

Why does the first plot in the example get broken Axis?

@jklymak
Copy link
Member Author

jklymak commented Apr 28, 2020

@QuLogic The axis in that example is one of the things this PR is addressing. It gets broken by roundoff error. Please see the example at the top of this PR as well

@jklymak jklymak force-pushed the enh-add-variable-epoch branch 3 times, most recently from 77df9ac to e9fa873 Compare April 28, 2020 19:46
@@ -487,3 +487,13 @@ of all renderer classes is deprecated.

`.transforms.AffineDeltaTransform` can be used as a replacement. This API is
experimental and may change in the future.

``ColorbarBase`` parameters will become keyword-only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition seems accidental.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. Happened during rebase. Not sure why...

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. 🎉

Just needs a final rebase.

Co-Authored-By: Tim Hoffmann <2836374+timhoffm@users.noreply.github.com>
@jklymak jklymak force-pushed the enh-add-variable-epoch branch from 1234d70 to a3fbc49 Compare April 29, 2020 14:23
@jklymak
Copy link
Member Author

jklymak commented Apr 29, 2020

Rebased. Thanks for everyone's help with this!

@timhoffm timhoffm merged commit e8c7579 into matplotlib:master Apr 29, 2020
@jklymak jklymak deleted the enh-add-variable-epoch branch April 29, 2020 15:58
@tacaswell
Copy link
Member

🎉 Thank you everyone!

trexfeathers added a commit to SciTools/iris that referenced this pull request Sep 3, 2020
* unpin proj, move cartopy pin forward

* Explicit resolution of coastlines in tests - see SciTools/cartopy#1105.

* Removed gdal optional requirement.

* New target image for test_plot_custom_aggregation due to auto-sizing coastlines in SciTools/cartopy#1105. Twin commit: SciTools/test-iris-imagehash@9b4e50e.

* New acceptable images for various tests due to minor changes in Matplotlib 3.3. Twin commit: SciTools/test-iris-imagehash@3babde5.

* New target images for test plots affected by the gridline spacing change in SciTools/cartopy@2f5e568. Twin commit: SciTools/test-iris-imagehash@f071f2c.

* New acceptable images to allow for minute colormap range changes in Matplotlib 3.3. Twin commit: SciTools/test-iris-imagehash@9f4b04e.

* Improvement to quickplot time axis labelling and accompanying graphics test target changes. Twin commit: SciTools/test-iris-imagehash@3036a6f

* Made quickplot time axis label sensitive to MPL version (see matplotlib/matplotlib#15008).

* More target images following from 41b3b2a. Twin commit: SciTools/test-iris-imagehash@804ff68.

* More target images following from 27ea2f2. Twin commit: SciTools/test-iris-imagehash@f559a36.

* Mirroring _draw_2d_from_points() use of mpl date2num in _draw_2d_from_bounds(). Twin commit: SciTools/test-iris-imagehash@3c582dc.

* Re-instated all valid image targets for TestPlotCoordinatesGiven.test_tx.

* New target image for TestSimple.test_bounds following change to plot axis labelling. Twin commit: SciTools/test-iris-imagehash@770dc92.

* modify raster tests to handle new gdal behaviour

* modify raster tests to handle new gdal naming behaviour

* modify tests to reflect new PROJ behaviour

* modify parts of test_project to use Transverse Mercator

* revert test_project to use Robinson, add warning to docstring

* keep results consistent

Co-authored-by: Martin Yeo <martin.yeo@metoffice.gov.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants