-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
FIX: translate timedeltas in _to_ordinalf #12863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thats indeed what happens; datetimes are translated to an ordinal day (days since 1 Jan 0000 plus one). So the proposed behaviour for time deltas in bar plots (for instance) is to translate them to days so they can be relative to an actual datetime value that anchors the bars. I don't think the approach in #9120 will help much - users are going to supply lists that mix datetime and timedelta objects, so this PR just decides on an element by element basis what to do w the object. |
c569ca7
to
bec6488
Compare
This is ready for review |
lib/matplotlib/tests/test_dates.py
Outdated
|
||
def test_timedelta(): | ||
""" | ||
test that timedelta objects are properly translated into days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test that timedelta objects are properly translated into days | |
Test that timedelta objects are properly translated into days. |
# check that mixed lists work.... | ||
assert mdates.date2num(dt)[0] == 730120.0 | ||
assert mdates.date2num(dt)[1] == 1 + 2 / 24 | ||
dt = (np.datetime64('2000-01-01'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a blank line since a new part of the test starts here.
lib/matplotlib/dates.py
Outdated
|
||
elif (isinstance(d, np.ndarray) and | ||
(np.issubdtype(d.dtype, np.datetime64) or | ||
np.issubdtype(d.dtype, np.timedelta64))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.issubdtype(d.dtype, np.timedelta64))): | |
np.issubdtype(d.dtype, np.timedelta64))): |
lib/matplotlib/dates.py
Outdated
return _to_ordinalf_np_vectorized(d) | ||
elif hasattr(d, 'size') and not d.size: | ||
# this is an empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an empty what? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure. If I remove that test (which I didn't write), all the tests pass, so its not tested. OTOH, who knows if someone needs it? Left comment admiring my ignorance, and maybe someone will remove.
lib/matplotlib/dates.py
Outdated
dnew = np.zeros(len(d)) | ||
for nn, dd in enumerate(d): | ||
dnew[nn] = _dt64_to_ordinalf(dd) | ||
return(dnew) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work as well:
return np.array(_dt64_to_ordinalf(dd) for dd in d)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That fails some tests and returns <generator object _dt64_to_ordinalf_iterable.<locals>.<genexpr>
:-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe this is numpy-version dependent. Anyway, there has to be a way to not add each element one-by-one on the python level, which is slow. My next guess would be:
return np.from_iter((_dt64_to_ordinalf(dd) for dd in d), float, count=len(d))
and if all fails, we can still create a list-comprehension and turn that into an array:
return np.array([_dt64_to_ordinalf(dd) for dd in d])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.fromiter
works. Not at all sure I understand why...
bec6488
to
e277333
Compare
FIX: translate timedelta64 in _dt64_to_ordinalf
e277333
to
a79a42d
Compare
I am not sure how much this addresses my main concerns in #9120. My biggest issue is that Consider:
I think there are complications that you'll run into, particularly having to do with units (and compatibility with |
@pganssle Unfortunately, we have lots of API that require x and delta x be specified in the same tuple or array, and the units machinery is just called on the whole tuple and array. That works fine for most data (i.e 30 kg +/- 2 kg are both the same units...) but time is time, and in order to support some functions we need to be able to process both t and dt in the same call to |
@jklymak What do you mean by "the right thing"? If I do What happens if I do this: plt.plot([1, 2], [datetime.timedelta(hours=3),
datetime.datetime(2018, 1, 1)])` Will it throw an exception? I realize that this is not a simple problem to solve, but if you merge something that just forces
My suggestion for handling spans and other situations like this is the same as it was in #9120. I believe these problems would go away if you did unit conversions after any arithmetical operations. By converting everything to float before calculating spans and such, you lose all custom I strongly recommend against treating |
The first will have a point at t = 1 Jan 0000, 03:00, and the second will have a point there and at 2018. I don't consider that so bad, though I agree the user shouldn't mix the vectors in this case, but, garbage in, garbage out.
That would be great, but thats not how unit support works right now, and would require a lot of engineering to make it work that way. @dopplershift has been thinking about these sorts of issues, and its on our radar to fix units in a "better" way. Though to be honest, the idea of assuming variable 'x' in
Any major changes to the units machinery is likely to not be backwards compatible anyways. Given that, and the fact such an overhaul is not being actively worked on right now, I think making the current system work better is the less of two evils. I certainly agree with your main point, but practically I still think this PR is specifically helpful. BTW, I think folks would be willing to have someone dive deep on units if you are interested, but it'd require a MEP first to make sure we all think it is heading in the right direction. |
Both of these are decidedly the wrong thing. The first one isn't "garbage in", it's a very reasonable thing to do, actually. There are many experiments where the "time" axis is "time elapsed" not "absolute time" (including nearly everything I've ever published). Real support for that in the form of a "TimeDeltaLocator" or "ElapsedTimeLocator" would be great. The second one is nonsense and should either throw an exception or plot some garbage, depending on how it's usually handled when you inappropriately mix units in a single axis.
I think it's fine to make incremental improvements, but this is the wrong way to do it. I don't have time to dig into this right now, but I think you're in a hole at the moment and the correct thing to do is to stop digging. If you're looking to fix things like #11290 and #12862, you shouldn't do it in such a way that also makes fixing #8869 correctly a backwards-incompatible change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree with this approach, per my comments in the thread.
lib/matplotlib/dates.py
Outdated
@@ -222,6 +222,10 @@ def _to_ordinalf(dt): | |||
dt = dt.astimezone(UTC) | |||
tzi = UTC | |||
|
|||
if isinstance(dt, datetime.timedelta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this goes through, you need to update the docstring for _to_ordinalf
to include timedelta
.
I haven't understood all the nuances here, but the problem seems to stem from the fact that Is there a particular reason why we're favoring the latter use case over the former? As far as I can tell, we're not successfully solving both use cases here. |
OK, I hadn't checked, but, because we aren't registering timedeltas of either flavours the date converter does not get called if a If I add Overall, its not clear to me that this PR precludes properly handling full import matplotlib.pyplot as plt
from datetime import datetime, timedelta, timezone
plt.scatter([timedelta(hours=3), timedelta(hours=4), timedelta(hours=5)], [1, 2, 3]);
plt.show() gives
plt.plot([1, 2], [datetime(2018, 1, 1), timedelta(hours=3)]) gives
(Actually, I'm not 100% sure why this happens) Switching order yields: plt.plot([1, 2], [timedelta(hours=3), datetime(2018, 1, 1)])
Which is all to say that there is still scope for backwards compatible changes... |
@dopplershift sorry, cross post. But you are correct, we are not solving the vector-of- |
I feel nervous given how much code is floating around special casing all of the various date/time things in the ecosystem that we can make this statement for certain, given that we don't know how to solve #8869. I'm not saying this is right or wrong, just that we should exercise caution when adding more duct tape to our mess here. |
I think the right way to solve #8869 and the various "datetime axes can't have a span" issues floating around is clear. It's just not the approach anyone has taken, and if the various people working on it are to be believed (a qualification I only put in place because I personally have not been able to dig into this, I see no reason to doubt them), it's hard to do. I would be a lot more comfortable with a refactoring based on dispatch. Also, throwing this out there, but how about something like this as a temporary measure (using @dataclass
class DeltaWrapper(float):
_value : float
def __add__(self, other):
return self._value + other Doesn't have to be @dataclass
class DeltaWrapper(float):
_delta: object # The pre-conversion object
_value : float # The post-conversion object
def __add__(self, other):
try:
return self._delta + other
except Exception:
return self._value + other Then you can incrementally convert over to the unit framework performing arithmetic pre-conversion without it throwing errors. Once everything is changed over, you can drop |
Perhaps an easy idea is to add a |
This new commit is a different approach as described above... The converter gets a new I've not changed broken_barv, or the other axes methods where we expect the second element in a tuple etc to be a delta, but this gets the concept across as a proof of concept. |
Ah, OK, I see what the fundamental problem w/ The issue is that New PR on the way for the original issue. Closing this, but I'll open a new one based on this branch, since it does 75% of the work needed for #8869. Thanks a lot @pganssle for your comments, which were largely correct. |
I may have stumbled across an issue with the timedelta64 handling. The following produces a numpy SystemError:
Looking into the error, it seems that
This may only be a symptom rather than the root cause, and perhaps DateFormatter is not passing the expected value? If it is the expected value being passed in this case, then the following would not produce the error...
... noting that |
PR Summary
Allows timedeltas to be returned in untis of floating point days in
dates._to_ordinalf
so thatmdates.date2num(timedelta(days=1, hours=2))
returns1.0833
.mdates.date2num(np.timedelta64(26, 'h'))
returns1.0833
.Closes #12862
Needs tests and docs....Needs to see if something similar can be done for np.datetime64 objects?Note, supercedes #9120, and I think its in the right place (rather than on its own)...
PR Checklist