-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
ENH: add variable epoch #15008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add variable epoch #15008
Conversation
12f8f99
to
8419417
Compare
I find it pretty dangerous that depending on when you set epoch you may get totally different outcomes and potentially inconsistent results. Especially if you think about some package that you import setting the epoch - a nightmare to debug. |
You shouldn't get "totally different outcomes", unless you mix datetime with floats in your code (and you change the epoch). Of course you get different outcomes if one of them is broken, but thats what this is meant to fix You can always check the epoch with If we want sub-millisecond plotting (which you kind of want for zooming) then this is the way to do it. The ultimate problem is that the original choice was to make the epoch year 0001 which is too far away for modern dates to get much floating point resolution. |
Clearly, it's not the way to do it; it's one way to do it. Pandas uses a totally different approach -which is also non-ideal and subject to a lot of confusion. But e.g. the case from above works out of the box in pandas.
Concerning totally different outcomes, what I mean is that you may e.g. (re)set epoch too early, like
resulting in the axes showing dates from when Jesus was born, In this case it's rather obvious, but as said, some module you import in cell 32 of your notebook may (re)set epoch and it would still affect the plot you started in cell 2 and show in cell 35. Similarly you may read in some data without having set an epoch, and only set it later before plotting,
|
I had the same thought as @ImportanceOfBeingErnest walking home from the train, probably need to bind the epoch to the axes some how? |
x,y = np.loadtxt(...,
converters={0: mdates.datestr2num})
#lots of other stuff
mdates.set_epoch('1999-01-01')
plt.plot(x,y) is an interesting one. I didn't know people did that. Binding the epoch to the axes won't help there (and is in fact why the epoch needs to be a global). Is there an advantage to the user using datestr2num instead of one of the standard libraries and keeping their dates as datetime? What does Pandas do? I guess one approach would be to sentinel the epoch so that once its used anywhere, |
I guess also to be clear, this is really not meant to be used willy-nilly. But we do get issues where people complain about microseconds not plotting correctly, and its kind of a shame because we have chosen a poor epoch for 99% of the people who want microsecond accuracy (not many folks were tracking microseconds back in year 0001). I really expect this to be a seldom used feature. |
In #7138 (comment) I suggested making this callable only once, and only before any call to date2num (if a library calls it before you, that's too bad -- we could even explicitly discourage other libraries to call it, suggesting that the use should be reserved to end users). Perhaps having a per-axes epoch would be even better, but that's likely much harder to do properly... |
If you go that way, probably a per-axis epoch ... |
A sentinel is easy enoiugh except it makes writing the tests and docs really hard because I need to change the epoch which then would expose to users the private method to reset the epoch. A per-axis epoch is basically a new locator/formatter. Then the date2num etc could go be methods on the locator as would the epoch. We could easily go that route but then the new locators are not backwards compatible with the old ones and we have the rigamarole of making the new ones standard. Ie I vastly prefer |
The tests could use a private method to reset it, I agree it's less nice for the docs :/ |
This new version has a sentinel. So: import matplotlib.date as mdate
import numpy as np
import matplotlib.pyplot as plt
x = np.arange('2000-01-01T00:00:00.0', '2000-01-01T00:00:20.000100',
dtype='datetime64[s]')
y = np.arange(0, len(x))
x2 = np.arange('2001-01-01T00:00:00.0', '2001-01-01T00:00:20.000100',
dtype='datetime64[s]')
y2 = np.arange(0, len(x))
fig, ax = plt.subplots()
ax.plot(x, y)
mdates.set_epoch('1990-01-01')
ax.plot(x2, y2) fails w/
The docs need to call |
ping @efiring, as having some interest in dates |
There is a lot here, and I appreciate the work. I am sympathetic to the idea of providing a way to retreat from the unfortunate choices of epoch used by mpl and by Matlab, for that matter. This is a bit like a convention that I started very long ago for my work with shipboard ADCP data: using floating-point time in days (which I call "decimal days") relative to the start of a specified year (which I call the "yearbase"). It also fits in with many satellite-based data products which typically give time in days since the start of 1950. Other data products use other years. Unix uses seconds since the start of 1970. In my understanding, the "epoch" in such a case is the start of the specified year, so noon on January 1 of the epoch year is 0.5 days. I think this is consistent with the usage in https://en.wikipedia.org/wiki/Epoch, for example. In mpl datenums, noon on January 1 of the year 1 is 1.5, so I would have to say that the real epoch for mpl is December 31 on 1 BCE (since there is no year zero). But this conflicts with the way you are using "epoch", claiming that the mpl default is January 1 of the year 1. This is a recipe for confusion. Putting aside the question of what the epoch really is, some of the motivations for this change seem contrived, though. Is there a real use case for zooming from dates to microseconds, especially given that all of the date-time formats in question are locally incorrect at the 1-second level because of leap-seconds? My experience with my decimal day and yearbase also makes me cautious here. I found that I had to be very careful to keep track of the yearbase, adding complexity to the code. There were valid reasons for the design when it was developed, but if I were starting now I would probably pick a single epoch and stick with it. |
Well the other application is people who have floats in seconds since 1970 (for instance). In order to get in mTpltolib yeardays they have to use datetime anyways to figure out how many days to add to that number. Or, we could just do it for them by letting them set the epoch. Or the example you just gave of yeardays 2017. Right now they need to know the offset |
Discussed this briefly on the 1 Oct call. This seems a reasonable way forward. While I agree that going to sub-seconds looses absolute accuracy, its the precision you usually want at such small scales. |
dd0f7ea
to
a1ec8a1
Compare
I don't think the test failures are from this PR, but could be misunderstanding... |
Putting back in draft mode to better understand #15008 (comment) |
f91b603
to
fd4fc70
Compare
@@ -229,6 +229,7 @@ date.autoformatter.hour : %H:%M:%S | |||
date.autoformatter.minute : %H:%M:%S.%f | |||
date.autoformatter.second : %H:%M:%S.%f | |||
date.autoformatter.microsecond : %H:%M:%S.%f | |||
date.epoch : 0000-12-31T00:00:00 # old epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not add date.epoch
to styles because I see it as a functional property not a style property.
Btw. what would happen if I change the style later after having already used date operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Styles are just rcParams. The tests are run using the classic style. This is how you tell all the tests to run with the old epoch. The alternative is changing every date test which I'm fine with doing, but there will be a lot of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def mpl_test_settings(request): |
That is where we do the setup / clean up if we want to force this in the tests.
I think I agree that this is not something that should be in the style sheets (and probably should be added to the banned list).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, blacklisted and changed the tests to use the new epoch.
1095ba5
to
2ae42f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know too much about datetime in Matplotlib, so these comments are not really about it.
Why does the first plot in the example get broken Axis? |
@QuLogic The axis in that example is one of the things this PR is addressing. It gets broken by roundoff error. Please see the example at the top of this PR as well |
77df9ac
to
e9fa873
Compare
@@ -487,3 +487,13 @@ of all renderer classes is deprecated. | |||
|
|||
`.transforms.AffineDeltaTransform` can be used as a replacement. This API is | |||
experimental and may change in the future. | |||
|
|||
``ColorbarBase`` parameters will become keyword-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition seems accidental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. Happened during rebase. Not sure why...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now. 🎉
Just needs a final rebase.
Co-Authored-By: Tim Hoffmann <2836374+timhoffm@users.noreply.github.com>
1234d70
to
a3fbc49
Compare
Rebased. Thanks for everyone's help with this! |
🎉 Thank you everyone! |
* unpin proj, move cartopy pin forward * Explicit resolution of coastlines in tests - see SciTools/cartopy#1105. * Removed gdal optional requirement. * New target image for test_plot_custom_aggregation due to auto-sizing coastlines in SciTools/cartopy#1105. Twin commit: SciTools/test-iris-imagehash@9b4e50e. * New acceptable images for various tests due to minor changes in Matplotlib 3.3. Twin commit: SciTools/test-iris-imagehash@3babde5. * New target images for test plots affected by the gridline spacing change in SciTools/cartopy@2f5e568. Twin commit: SciTools/test-iris-imagehash@f071f2c. * New acceptable images to allow for minute colormap range changes in Matplotlib 3.3. Twin commit: SciTools/test-iris-imagehash@9f4b04e. * Improvement to quickplot time axis labelling and accompanying graphics test target changes. Twin commit: SciTools/test-iris-imagehash@3036a6f * Made quickplot time axis label sensitive to MPL version (see matplotlib/matplotlib#15008). * More target images following from 41b3b2a. Twin commit: SciTools/test-iris-imagehash@804ff68. * More target images following from 27ea2f2. Twin commit: SciTools/test-iris-imagehash@f559a36. * Mirroring _draw_2d_from_points() use of mpl date2num in _draw_2d_from_bounds(). Twin commit: SciTools/test-iris-imagehash@3c582dc. * Re-instated all valid image targets for TestPlotCoordinatesGiven.test_tx. * New target image for TestSimple.test_bounds following change to plot axis labelling. Twin commit: SciTools/test-iris-imagehash@770dc92. * modify raster tests to handle new gdal behaviour * modify raster tests to handle new gdal naming behaviour * modify tests to reflect new PROJ behaviour * modify parts of test_project to use Transverse Mercator * revert test_project to use Robinson, add warning to docstring * keep results consistent Co-authored-by: Martin Yeo <martin.yeo@metoffice.gov.uk>
PR Summary
EDIT: 9 April 2020:
The default epoch is now "1970-01-01T00:00:00". This has microsecond resolution for dates pithing 70 years of either side of 1970. A more recent epoch would likely be better, but this is the UNIX standard, and is hence defendable. (per @efiring: #15008 (review))
There is a
set_epoch
, and an rcParamdate.epoch
that take datetime64-compatible strings. Must be called before the epoch is used, otherwise a sentinel is tripped to stop confusion (per @ImportanceOfBeingErnest and @tacaswell #15008 (comment)).Who will get tripped up by this? Anyone who stored their data as floats in the old epoch. But that is relatively easy to fix:
EDIT: 14 Oct 2019:
So, big question is if we want to change the default epoch, and how we can do that w/o breaking many users....
EDIT: Updated 9 Aug, 2019
Allow a variable epoch via a
mdates.set_epoch('2001-01-01')
so matplotlib dates can be relative to that instead of0001-01-01
. This allows folks who insist on driving down to micro-seconds a bit more leeway to do so precisely.Note I'm also depreciating
epoch2num
andnum2epoch
. I don't know why they are there, and they seem to refer to 1970 epochs, which is of course not the only epoch you can have, and isn't terribly relevant to python dates.[To come: I think its totally possible to change the locators to get down to whatever resolution the user wants. How the formatters handle that is another issue, but using datetime64 and a close enough epoch we can get pretty small. But this PR still improves things for the current situation]
i.e.
Addresses: #7138
Note, this also gets rid of the restriction of matplotlib times must be greater than 1, because it no longer uses the
datetime.toordinal
function, but rather just specifies the epoch in numpy.datetime64. Of course this might mean folks will plot data far off where they think they want it to go, but...PR Checklist