FIX: translate timedeltas in _to_ordinalf #12863

jklymak · 2018-11-21T16:54:05Z

PR Summary

Allows timedeltas to be returned in untis of floating point days in dates._to_ordinalf so that mdates.date2num(timedelta(days=1, hours=2)) returns 1.0833.

mdates.date2num(np.timedelta64(26, 'h')) returns 1.0833.

Closes #12862

~~Needs tests and docs....~~

~~Needs to see if something similar can be done for np.datetime64 objects?~~

Note, supercedes #9120, and I think its in the right place (rather than on its own)...

PR Checklist

Has Pytest style unit tests
Code is Flake 8 compliant
New features are documented, with examples if plot related
Documentation is sphinx and numpydoc compliant
Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

jklymak · 2018-11-21T17:02:07Z

#9120 (comment)

I haven't had a chance to really look at and grok how the units framework works. I suppose it's possible that units are inferred from type and that everything gets translated into a number under the hood and plotted on the same plot,

Thats indeed what happens; datetimes are translated to an ordinal day (days since 1 Jan 0000 plus one). So the proposed behaviour for time deltas in bar plots (for instance) is to translate them to days so they can be relative to an actual datetime value that anchors the bars.

I don't think the approach in #9120 will help much - users are going to supply lists that mix datetime and timedelta objects, so this PR just decides on an element by element basis what to do w the object.

jklymak · 2018-11-22T22:08:31Z

This is ready for review

timhoffm · 2018-11-25T22:04:12Z

lib/matplotlib/tests/test_dates.py

+
+def test_timedelta():
+    """
+    test that timedelta objects are properly translated into days


Suggested change

test that timedelta objects are properly translated into days

Test that timedelta objects are properly translated into days.

timhoffm · 2018-11-25T22:05:39Z

lib/matplotlib/tests/test_dates.py

+    # check that mixed lists work....
+    assert mdates.date2num(dt)[0] == 730120.0
+    assert mdates.date2num(dt)[1] == 1 + 2 / 24
+    dt = (np.datetime64('2000-01-01'),


Add a blank line since a new part of the test starts here.

timhoffm · 2018-11-25T22:08:59Z

lib/matplotlib/dates.py

+
+    elif (isinstance(d, np.ndarray) and
+            (np.issubdtype(d.dtype, np.datetime64) or
+            np.issubdtype(d.dtype, np.timedelta64))):


Suggested change

np.issubdtype(d.dtype, np.timedelta64))):

np.issubdtype(d.dtype, np.timedelta64))):

timhoffm · 2018-11-25T22:14:39Z

lib/matplotlib/dates.py

        return _to_ordinalf_np_vectorized(d)
+    elif hasattr(d, 'size') and not d.size:
+        # this is an empty


an empty what? 😄

Yeah, I'm not sure. If I remove that test (which I didn't write), all the tests pass, so its not tested. OTOH, who knows if someone needs it? Left comment admiring my ignorance, and maybe someone will remove.

timhoffm · 2018-11-25T22:17:57Z

lib/matplotlib/dates.py

+    dnew = np.zeros(len(d))
+    for nn, dd in enumerate(d):
+        dnew[nn] = _dt64_to_ordinalf(dd)
+    return(dnew)


Does this work as well:

return np.array(_dt64_to_ordinalf(dd) for dd in d)

That fails some tests and returns <generator object _dt64_to_ordinalf_iterable.<locals>.<genexpr> :-(

Ok, maybe this is numpy-version dependent. Anyway, there has to be a way to not add each element one-by-one on the python level, which is slow. My next guess would be:

return np.from_iter((_dt64_to_ordinalf(dd) for dd in d), float, count=len(d))

and if all fails, we can still create a list-comprehension and turn that into an array:

return np.array([_dt64_to_ordinalf(dd) for dd in d])

np.fromiter works. Not at all sure I understand why...

FIX: translate timedelta64 in _dt64_to_ordinalf

pganssle · 2018-11-26T20:04:57Z

I am not sure how much this addresses my main concerns in #9120. My biggest issue is that timedelta and datetime are fundamentally different creatures, despite the fact that they both have to do with time.

Consider:

It doesn't make sense to plot datetime and timedelta objects on the same axis, since one is an absolute position and the other is a delta.
The default axis for timedelta and for datetime should be completely different. For timedelta, you expect an elapsed-time axis denoted in minutes, seconds, hours, days, etc. For datetime you expect absolute dates.
Basically all the special time-handling stuff is datetime specific. It has to handle weird calendrical arithmetic, time zones and a number of other funky things. timedelta can be losslessly converted to a float with total_seconds() and converted back to timedelta with timedelta(seconds=x).

I think there are complications that you'll run into, particularly having to do with units (and compatibility with np.timedelta64), but they're completely different from the kinds of complications you expect with datetime, which is further evidence for not conflating the two abstractions into one.

jklymak · 2018-11-26T21:08:29Z

@pganssle Unfortunately, we have lots of API that require x and delta x be specified in the same tuple or array, and the units machinery is just called on the whole tuple and array. That works fine for most data (i.e 30 kg +/- 2 kg are both the same units...) but time is time, and in order to support some functions we need to be able to process both t and dt in the same call to date2num((t, dt)). Refactoring those functions to make different calls to the units machinery seems like a far heavier lift than detecting the types here and doing the right thing.

pganssle · 2018-11-26T22:05:08Z

@jklymak What do you mean by "the right thing"?

If I do plt.plot([1], [datetime.timedelta(3)]), what would be plotted? What axis would be used?

What happens if I do this:

plt.plot([1, 2], [datetime.timedelta(hours=3),
                  datetime.datetime(2018, 1, 1)])`

Will it throw an exception?

I realize that this is not a simple problem to solve, but if you merge something that just forces datetime and timedelta into the same units automatically, you are creating a situation that is hard to reverse without breaking backwards compatibility. It is worth taking the time to get this right.

(i.e 30 kg +/- 2 kg are both the same units...)

My suggestion for handling spans and other situations like this is the same as it was in #9120. I believe these problems would go away if you did unit conversions after any arithmetical operations. By converting everything to float before calculating spans and such, you lose all custom __add__ behavior and the like, which means you can't have a span that is a relativedelta or any other interesting custom behavior.

I strongly recommend against treating timedelta and its ilk like a datetime for the short-term expedience of being able to ignore the unit analysis.

jklymak · 2018-11-26T22:31:30Z

The first will have a point at t = 1 Jan 0000, 03:00, and the second will have a point there and at 2018. I don't consider that so bad, though I agree the user shouldn't mix the vectors in this case, but, garbage in, garbage out.

My suggestion for handling spans and other situations like this is the same as it was in #9120. I believe these problems would go away if you did unit conversions after any arithmetical operations.

That would be great, but thats not how unit support works right now, and would require a lot of engineering to make it work that way. @dopplershift has been thinking about these sorts of issues, and its on our radar to fix units in a "better" way. Though to be honest, the idea of assuming variable 'x' in plot has an __add__ method might be a heavy ask.

I strongly recommend against treating timedelta and its ilk like a datetime for the short-term expedience of being able to ignore the unit analysis.

Any major changes to the units machinery is likely to not be backwards compatible anyways. Given that, and the fact such an overhaul is not being actively worked on right now, I think making the current system work better is the less of two evils. I certainly agree with your main point, but practically I still think this PR is specifically helpful.

BTW, I think folks would be willing to have someone dive deep on units if you are interested, but it'd require a MEP first to make sure we all think it is heading in the right direction.

pganssle · 2018-11-26T22:50:28Z

The first will have a point at t = 1 Jan 0000, 03:00, and the second will have a point there and at 2018. I don't consider that so bad, though I agree the user shouldn't mix the vectors in this case, but, garbage in, garbage out.

Both of these are decidedly the wrong thing. The first one isn't "garbage in", it's a very reasonable thing to do, actually. There are many experiments where the "time" axis is "time elapsed" not "absolute time" (including nearly everything I've ever published). Real support for that in the form of a "TimeDeltaLocator" or "ElapsedTimeLocator" would be great.

The second one is nonsense and should either throw an exception or plot some garbage, depending on how it's usually handled when you inappropriately mix units in a single axis.

Any major changes to the units machinery is likely to not be backwards compatible anyways. Given that, and the fact such an overhaul is not being actively worked on right now, I think making the current system work better is the less of two evils. I certainly agree with your main point, but practically I still think this PR is specifically helpful.

I think it's fine to make incremental improvements, but this is the wrong way to do it. I don't have time to dig into this right now, but I think you're in a hole at the moment and the correct thing to do is to stop digging.

If you're looking to fix things like #11290 and #12862, you shouldn't do it in such a way that also makes fixing #8869 correctly a backwards-incompatible change.

pganssle

Disagree with this approach, per my comments in the thread.

dopplershift · 2018-11-26T23:06:36Z

lib/matplotlib/dates.py

@@ -222,6 +222,10 @@ def _to_ordinalf(dt):
        dt = dt.astimezone(UTC)
        tzi = UTC

+    if isinstance(dt, datetime.timedelta):


If this goes through, you need to update the docstring for _to_ordinalf to include timedelta.

dopplershift · 2018-11-26T23:10:51Z

I haven't understood all the nuances here, but the problem seems to stem from the fact that timedelta has valid uses standalone (e.g. should just be another registered type for the unit conversion machinery) and has uses as one of two classes used arithmetically with dates and times.

Is there a particular reason why we're favoring the latter use case over the former? As far as I can tell, we're not successfully solving both use cases here.

jklymak · 2018-11-26T23:10:55Z

OK, I hadn't checked, but, because we aren't registering timedeltas of either flavours the date converter does not get called if a timedelta is the first element of the array. So, as is, the PR doesn't block whatever needs to be done to make #8869 work.

If I add timedelta to the registry, then there are still errors, but since this is not meant to make #8869 work, I'm not planning to work on this.

Overall, its not clear to me that this PR precludes properly handling full timedelta arrays, and adapting a new Locator (though is that necessary?) and Formatter (probably more necessary) for that kind of data.

import matplotlib.pyplot as plt
from datetime import datetime, timedelta, timezone

plt.scatter([timedelta(hours=3), timedelta(hours=4), timedelta(hours=5)], [1, 2, 3]);
plt.show()

gives

TypeError: float() argument must be a string or a number, not 'datetime.timedelta'

plt.plot([1, 2], [datetime(2018, 1, 1), timedelta(hours=3)])

gives

ValueError: view limit minimum -36834.61875 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units

(Actually, I'm not 100% sure why this happens)

Switching order yields:

plt.plot([1, 2], [timedelta(hours=3), datetime(2018, 1, 1)])

TypeError: float() argument must be a string or a number, not 'datetime.datetime'

Which is all to say that there is still scope for backwards compatible changes...

jklymak · 2018-11-26T23:12:24Z

@dopplershift sorry, cross post. But you are correct, we are not solving the vector-of-timedelta case. But we are not precluding it from working here either.

dopplershift · 2018-11-26T23:15:19Z

But we are not precluding it from working here either.

I feel nervous given how much code is floating around special casing all of the various date/time things in the ecosystem that we can make this statement for certain, given that we don't know how to solve #8869.

I'm not saying this is right or wrong, just that we should exercise caution when adding more duct tape to our mess here.

pganssle · 2018-11-27T00:39:00Z

I feel nervous given how much code is floating around special casing all of the various date/time things in the ecosystem that we can make this statement for certain, given that we don't know how to solve #8869.

I think the right way to solve #8869 and the various "datetime axes can't have a span" issues floating around is clear. It's just not the approach anyone has taken, and if the various people working on it are to be believed (a qualification I only put in place because I personally have not been able to dig into this, I see no reason to doubt them), it's hard to do.

I would be a lot more comfortable with a refactoring based on dispatch. Also, throwing this out there, but how about something like this as a temporary measure (using @dataclass because it makes example code shorter):

@dataclass
class DeltaWrapper(float):
    _value : float

    def __add__(self, other):
        return self._value + other

Doesn't have to be float and it probably needs to be somewhat more complicated than this, but the idea would be that in the plotting code you can detect the presence of a _DeltaWrapper and raise an exception if you try to plot it. In span code it will work as expected. If necessary, this could become a stepping stone to a version that looks like this:

@dataclass
class DeltaWrapper(float):
    _delta: object     # The pre-conversion object
    _value : float     # The post-conversion object

    def __add__(self, other):
        try:
            return self._delta + other
        except Exception:
            return self._value + other

Then you can incrementally convert over to the unit framework performing arithmetic pre-conversion without it throwing errors. Once everything is changed over, you can drop DeltaWrapper as vestigial.

jklymak · 2018-11-27T16:58:11Z

Perhaps an easy idea is to add a convert_units_delta function that could go in parallel to convert_units but did the right thing for deltas. By default, it could just call convert_units. Then the task would be finding the places where a delta is expected and calling convert_units_delta instead of convert_units.

jklymak · 2018-11-27T22:00:22Z

This new commit is a different approach as described above... The converter gets a new def convert_delta(value, units, axis) method that axis.convert_units_delta(self, dx) tries to call if the method exists. broken_barh then simply goes through its x values and converts the first element in the tuple differently than the second (delta) value.

I've not changed broken_barv, or the other axes methods where we expect the second element in a tuple etc to be a delta, but this gets the concept across as a proof of concept.

Removing as stale

jklymak · 2018-11-27T23:26:59Z

Ah, OK, I see what the fundamental problem w/ broken_barh was, but its not actually a problem for other functions with deltax or deltay (that I've seen).

The issue is that barh was doing the conversion on dx, whereas other functions indeed do the conversion on x+dx (i.e. errorbar or fill_between). So as @pganssle said above, we should do the math first and then convert. I mistakenly thought this issue was everywhere, but its not.

New PR on the way for the original issue. Closing this, but I'll open a new one based on this branch, since it does 75% of the work needed for #8869.

Thanks a lot @pganssle for your comments, which were largely correct.

wheeled · 2021-12-22T00:22:20Z

I may have stumbled across an issue with the timedelta64 handling. The following produces a numpy SystemError:

import matplotlib.pyplot as plt
import matplotlib.dates as plt_dt
import pandas as pd
from datetime import timedelta

timedelta_series = [timedelta(minutes=n) for n in range(0, 25, 5)]
td_df = pd.DataFrame([1, 2, 3, 4, 5], timedelta_series, columns=["Val"])
plt.plot(td_df.index, td_df.Val)
ax = plt.gca()
ax.xaxis.set_major_formatter(plt_dt.DateFormatter('%H:%M'))
plt.show()

Looking into the error, it seems that dates._from_ordinalf is receiving x as a timedelta64[ns] integer, which then produces a meaningless result when:

    dt = (np.datetime64(get_epoch()) +
          np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))

This may only be a symptom rather than the root cause, and perhaps DateFormatter is not passing the expected value?

If it is the expected value being passed in this case, then the following would not produce the error...

    dt = (np.datetime64(get_epoch()) +
          np.timedelta64(int(np.round(x), 'ns'))

... noting that x is already in ns (rather than 'us').

jklymak added status: work in progress topic: date handling topic: units and array ducktypes labels Nov 21, 2018

jklymak mentioned this pull request Nov 21, 2018

Allow timedelta to be converted to an ordinalf #9120

Closed

jklymak force-pushed the fix-datetime-timedelta branch 6 times, most recently from c569ca7 to bec6488 Compare November 22, 2018 22:06

jklymak removed the status: work in progress label Nov 22, 2018

jklymak mentioned this pull request Nov 22, 2018

broken_barh appears not to work with datetime/timedelta objects #12862

Closed

jklymak added this to the v3.1 milestone Nov 22, 2018

timhoffm reviewed Nov 25, 2018

View reviewed changes

jklymak force-pushed the fix-datetime-timedelta branch from bec6488 to e277333 Compare November 25, 2018 23:36

FIX: translate timedeltas in _to_ordinalf

a79a42d

FIX: translate timedelta64 in _dt64_to_ordinalf

jklymak force-pushed the fix-datetime-timedelta branch from e277333 to a79a42d Compare November 26, 2018 03:43

timhoffm previously approved these changes Nov 26, 2018

View reviewed changes

pganssle suggested changes Nov 26, 2018

View reviewed changes

dopplershift reviewed Nov 26, 2018

View reviewed changes

ENH: change to convert_delta approach

b12584c

jklymak added the status: work in progress label Nov 27, 2018

jklymak closed this Nov 27, 2018

pganssle mentioned this pull request Mar 4, 2019

Cannot plot fully masked array against datetimes #11337

Closed

pganssle mentioned this pull request Sep 23, 2019

Type promotion error with datetime bins in hist #15332

Closed

	test that timedelta objects are properly translated into days
	Test that timedelta objects are properly translated into days.

	np.issubdtype(d.dtype, np.timedelta64))):
	np.issubdtype(d.dtype, np.timedelta64))):

Uh oh!

FIX: translate timedeltas in _to_ordinalf #12863

FIX: translate timedeltas in _to_ordinalf #12863

Uh oh!

Conversation

jklymak commented Nov 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

jklymak commented Nov 21, 2018

Uh oh!

jklymak commented Nov 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pganssle commented Nov 26, 2018

Uh oh!

jklymak commented Nov 26, 2018

Uh oh!

pganssle commented Nov 26, 2018

Uh oh!

jklymak commented Nov 26, 2018

Uh oh!

pganssle commented Nov 26, 2018

Uh oh!

pganssle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dopplershift commented Nov 26, 2018

Uh oh!

jklymak commented Nov 26, 2018

Uh oh!

jklymak commented Nov 26, 2018

Uh oh!

dopplershift commented Nov 26, 2018

Uh oh!

pganssle commented Nov 27, 2018

Uh oh!

jklymak commented Nov 27, 2018

Uh oh!

jklymak commented Nov 27, 2018

Uh oh!

jklymak commented Nov 27, 2018

Uh oh!

wheeled commented Dec 22, 2021

Uh oh!

Uh oh!

jklymak commented Nov 21, 2018 •

edited

Loading