Skip to content

FIX: translate timedeltas in _to_ordinalf #12863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Nov 21, 2018

PR Summary

Allows timedeltas to be returned in untis of floating point days in dates._to_ordinalf so that mdates.date2num(timedelta(days=1, hours=2)) returns 1.0833.

mdates.date2num(np.timedelta64(26, 'h')) returns 1.0833.

Closes #12862

Needs tests and docs....

Needs to see if something similar can be done for np.datetime64 objects?

Note, supercedes #9120, and I think its in the right place (rather than on its own)...

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak
Copy link
Member Author

jklymak commented Nov 21, 2018

#9120 (comment)

I haven't had a chance to really look at and grok how the units framework works. I suppose it's possible that units are inferred from type and that everything gets translated into a number under the hood and plotted on the same plot,

Thats indeed what happens; datetimes are translated to an ordinal day (days since 1 Jan 0000 plus one). So the proposed behaviour for time deltas in bar plots (for instance) is to translate them to days so they can be relative to an actual datetime value that anchors the bars.

I don't think the approach in #9120 will help much - users are going to supply lists that mix datetime and timedelta objects, so this PR just decides on an element by element basis what to do w the object.

@jklymak jklymak force-pushed the fix-datetime-timedelta branch 6 times, most recently from c569ca7 to bec6488 Compare November 22, 2018 22:06
@jklymak
Copy link
Member Author

jklymak commented Nov 22, 2018

This is ready for review


def test_timedelta():
"""
test that timedelta objects are properly translated into days
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test that timedelta objects are properly translated into days
Test that timedelta objects are properly translated into days.

# check that mixed lists work....
assert mdates.date2num(dt)[0] == 730120.0
assert mdates.date2num(dt)[1] == 1 + 2 / 24
dt = (np.datetime64('2000-01-01'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line since a new part of the test starts here.


elif (isinstance(d, np.ndarray) and
(np.issubdtype(d.dtype, np.datetime64) or
np.issubdtype(d.dtype, np.timedelta64))):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
np.issubdtype(d.dtype, np.timedelta64))):
np.issubdtype(d.dtype, np.timedelta64))):

return _to_ordinalf_np_vectorized(d)
elif hasattr(d, 'size') and not d.size:
# this is an empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an empty what? 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure. If I remove that test (which I didn't write), all the tests pass, so its not tested. OTOH, who knows if someone needs it? Left comment admiring my ignorance, and maybe someone will remove.

dnew = np.zeros(len(d))
for nn, dd in enumerate(d):
dnew[nn] = _dt64_to_ordinalf(dd)
return(dnew)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work as well:

return np.array(_dt64_to_ordinalf(dd) for dd in d)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That fails some tests and returns <generator object _dt64_to_ordinalf_iterable.<locals>.<genexpr> :-(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe this is numpy-version dependent. Anyway, there has to be a way to not add each element one-by-one on the python level, which is slow. My next guess would be:

return np.from_iter((_dt64_to_ordinalf(dd) for dd in d), float, count=len(d))

and if all fails, we can still create a list-comprehension and turn that into an array:

return np.array([_dt64_to_ordinalf(dd) for dd in d])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.fromiter works. Not at all sure I understand why...

@jklymak jklymak force-pushed the fix-datetime-timedelta branch from bec6488 to e277333 Compare November 25, 2018 23:36
FIX: translate timedelta64 in _dt64_to_ordinalf
@jklymak jklymak force-pushed the fix-datetime-timedelta branch from e277333 to a79a42d Compare November 26, 2018 03:43
@pganssle
Copy link
Member

I am not sure how much this addresses my main concerns in #9120. My biggest issue is that timedelta and datetime are fundamentally different creatures, despite the fact that they both have to do with time.

Consider:

  1. It doesn't make sense to plot datetime and timedelta objects on the same axis, since one is an absolute position and the other is a delta.
  2. The default axis for timedelta and for datetime should be completely different. For timedelta, you expect an elapsed-time axis denoted in minutes, seconds, hours, days, etc. For datetime you expect absolute dates.
  3. Basically all the special time-handling stuff is datetime specific. It has to handle weird calendrical arithmetic, time zones and a number of other funky things. timedelta can be losslessly converted to a float with total_seconds() and converted back to timedelta with timedelta(seconds=x).

I think there are complications that you'll run into, particularly having to do with units (and compatibility with np.timedelta64), but they're completely different from the kinds of complications you expect with datetime, which is further evidence for not conflating the two abstractions into one.

@jklymak
Copy link
Member Author

jklymak commented Nov 26, 2018

@pganssle Unfortunately, we have lots of API that require x and delta x be specified in the same tuple or array, and the units machinery is just called on the whole tuple and array. That works fine for most data (i.e 30 kg +/- 2 kg are both the same units...) but time is time, and in order to support some functions we need to be able to process both t and dt in the same call to date2num((t, dt)). Refactoring those functions to make different calls to the units machinery seems like a far heavier lift than detecting the types here and doing the right thing.

timhoffm
timhoffm previously approved these changes Nov 26, 2018
@pganssle
Copy link
Member

@jklymak What do you mean by "the right thing"?

If I do plt.plot([1], [datetime.timedelta(3)]), what would be plotted? What axis would be used?

What happens if I do this:

plt.plot([1, 2], [datetime.timedelta(hours=3),
                  datetime.datetime(2018, 1, 1)])`

Will it throw an exception?

I realize that this is not a simple problem to solve, but if you merge something that just forces datetime and timedelta into the same units automatically, you are creating a situation that is hard to reverse without breaking backwards compatibility. It is worth taking the time to get this right.

(i.e 30 kg +/- 2 kg are both the same units...)

My suggestion for handling spans and other situations like this is the same as it was in #9120. I believe these problems would go away if you did unit conversions after any arithmetical operations. By converting everything to float before calculating spans and such, you lose all custom __add__ behavior and the like, which means you can't have a span that is a relativedelta or any other interesting custom behavior.

I strongly recommend against treating timedelta and its ilk like a datetime for the short-term expedience of being able to ignore the unit analysis.

@jklymak
Copy link
Member Author

jklymak commented Nov 26, 2018

The first will have a point at t = 1 Jan 0000, 03:00, and the second will have a point there and at 2018. I don't consider that so bad, though I agree the user shouldn't mix the vectors in this case, but, garbage in, garbage out.

My suggestion for handling spans and other situations like this is the same as it was in #9120. I believe these problems would go away if you did unit conversions after any arithmetical operations.

That would be great, but thats not how unit support works right now, and would require a lot of engineering to make it work that way. @dopplershift has been thinking about these sorts of issues, and its on our radar to fix units in a "better" way. Though to be honest, the idea of assuming variable 'x' in plot has an __add__ method might be a heavy ask.

I strongly recommend against treating timedelta and its ilk like a datetime for the short-term expedience of being able to ignore the unit analysis.

Any major changes to the units machinery is likely to not be backwards compatible anyways. Given that, and the fact such an overhaul is not being actively worked on right now, I think making the current system work better is the less of two evils. I certainly agree with your main point, but practically I still think this PR is specifically helpful.

BTW, I think folks would be willing to have someone dive deep on units if you are interested, but it'd require a MEP first to make sure we all think it is heading in the right direction.

@pganssle
Copy link
Member

The first will have a point at t = 1 Jan 0000, 03:00, and the second will have a point there and at 2018. I don't consider that so bad, though I agree the user shouldn't mix the vectors in this case, but, garbage in, garbage out.

Both of these are decidedly the wrong thing. The first one isn't "garbage in", it's a very reasonable thing to do, actually. There are many experiments where the "time" axis is "time elapsed" not "absolute time" (including nearly everything I've ever published). Real support for that in the form of a "TimeDeltaLocator" or "ElapsedTimeLocator" would be great.

The second one is nonsense and should either throw an exception or plot some garbage, depending on how it's usually handled when you inappropriately mix units in a single axis.

Any major changes to the units machinery is likely to not be backwards compatible anyways. Given that, and the fact such an overhaul is not being actively worked on right now, I think making the current system work better is the less of two evils. I certainly agree with your main point, but practically I still think this PR is specifically helpful.

I think it's fine to make incremental improvements, but this is the wrong way to do it. I don't have time to dig into this right now, but I think you're in a hole at the moment and the correct thing to do is to stop digging.

If you're looking to fix things like #11290 and #12862, you shouldn't do it in such a way that also makes fixing #8869 correctly a backwards-incompatible change.

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree with this approach, per my comments in the thread.

@@ -222,6 +222,10 @@ def _to_ordinalf(dt):
dt = dt.astimezone(UTC)
tzi = UTC

if isinstance(dt, datetime.timedelta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this goes through, you need to update the docstring for _to_ordinalf to include timedelta.

@dopplershift
Copy link
Contributor

I haven't understood all the nuances here, but the problem seems to stem from the fact that timedelta has valid uses standalone (e.g. should just be another registered type for the unit conversion machinery) and has uses as one of two classes used arithmetically with dates and times.

Is there a particular reason why we're favoring the latter use case over the former? As far as I can tell, we're not successfully solving both use cases here.

@jklymak
Copy link
Member Author

jklymak commented Nov 26, 2018

OK, I hadn't checked, but, because we aren't registering timedeltas of either flavours the date converter does not get called if a timedelta is the first element of the array. So, as is, the PR doesn't block whatever needs to be done to make #8869 work.

If I add timedelta to the registry, then there are still errors, but since this is not meant to make #8869 work, I'm not planning to work on this.

Overall, its not clear to me that this PR precludes properly handling full timedelta arrays, and adapting a new Locator (though is that necessary?) and Formatter (probably more necessary) for that kind of data.

import matplotlib.pyplot as plt
from datetime import datetime, timedelta, timezone

plt.scatter([timedelta(hours=3), timedelta(hours=4), timedelta(hours=5)], [1, 2, 3]);
plt.show()

gives

TypeError: float() argument must be a string or a number, not 'datetime.timedelta'
plt.plot([1, 2], [datetime(2018, 1, 1), timedelta(hours=3)])

gives

ValueError: view limit minimum -36834.61875 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units

(Actually, I'm not 100% sure why this happens)

Switching order yields:

plt.plot([1, 2], [timedelta(hours=3), datetime(2018, 1, 1)])
TypeError: float() argument must be a string or a number, not 'datetime.datetime'

Which is all to say that there is still scope for backwards compatible changes...

@jklymak
Copy link
Member Author

jklymak commented Nov 26, 2018

@dopplershift sorry, cross post. But you are correct, we are not solving the vector-of-timedelta case. But we are not precluding it from working here either.

@dopplershift
Copy link
Contributor

But we are not precluding it from working here either.

I feel nervous given how much code is floating around special casing all of the various date/time things in the ecosystem that we can make this statement for certain, given that we don't know how to solve #8869.

I'm not saying this is right or wrong, just that we should exercise caution when adding more duct tape to our mess here.

@pganssle
Copy link
Member

I feel nervous given how much code is floating around special casing all of the various date/time things in the ecosystem that we can make this statement for certain, given that we don't know how to solve #8869.

I think the right way to solve #8869 and the various "datetime axes can't have a span" issues floating around is clear. It's just not the approach anyone has taken, and if the various people working on it are to be believed (a qualification I only put in place because I personally have not been able to dig into this, I see no reason to doubt them), it's hard to do.

I would be a lot more comfortable with a refactoring based on dispatch. Also, throwing this out there, but how about something like this as a temporary measure (using @dataclass because it makes example code shorter):

@dataclass
class DeltaWrapper(float):
    _value : float

    def __add__(self, other):
        return self._value + other 

Doesn't have to be float and it probably needs to be somewhat more complicated than this, but the idea would be that in the plotting code you can detect the presence of a _DeltaWrapper and raise an exception if you try to plot it. In span code it will work as expected. If necessary, this could become a stepping stone to a version that looks like this:

@dataclass
class DeltaWrapper(float):
    _delta: object     # The pre-conversion object
    _value : float     # The post-conversion object

    def __add__(self, other):
        try:
            return self._delta + other
        except Exception:
            return self._value + other

Then you can incrementally convert over to the unit framework performing arithmetic pre-conversion without it throwing errors. Once everything is changed over, you can drop DeltaWrapper as vestigial.

@jklymak
Copy link
Member Author

jklymak commented Nov 27, 2018

Perhaps an easy idea is to add a convert_units_delta function that could go in parallel to convert_units but did the right thing for deltas. By default, it could just call convert_units. Then the task would be finding the places where a delta is expected and calling convert_units_delta instead of convert_units.

@jklymak
Copy link
Member Author

jklymak commented Nov 27, 2018

This new commit is a different approach as described above... The converter gets a new def convert_delta(value, units, axis) method that axis.convert_units_delta(self, dx) tries to call if the method exists. broken_barh then simply goes through its x values and converts the first element in the tuple differently than the second (delta) value.

I've not changed broken_barv, or the other axes methods where we expect the second element in a tuple etc to be a delta, but this gets the concept across as a proof of concept.

@jklymak
Copy link
Member Author

jklymak commented Nov 27, 2018

Ah, OK, I see what the fundamental problem w/ broken_barh was, but its not actually a problem for other functions with deltax or deltay (that I've seen).

The issue is that barh was doing the conversion on dx, whereas other functions indeed do the conversion on x+dx (i.e. errorbar or fill_between). So as @pganssle said above, we should do the math first and then convert. I mistakenly thought this issue was everywhere, but its not.

New PR on the way for the original issue. Closing this, but I'll open a new one based on this branch, since it does 75% of the work needed for #8869.

Thanks a lot @pganssle for your comments, which were largely correct.

@wheeled
Copy link

wheeled commented Dec 22, 2021

I may have stumbled across an issue with the timedelta64 handling. The following produces a numpy SystemError:

import matplotlib.pyplot as plt
import matplotlib.dates as plt_dt
import pandas as pd
from datetime import timedelta

timedelta_series = [timedelta(minutes=n) for n in range(0, 25, 5)]
td_df = pd.DataFrame([1, 2, 3, 4, 5], timedelta_series, columns=["Val"])
plt.plot(td_df.index, td_df.Val)
ax = plt.gca()
ax.xaxis.set_major_formatter(plt_dt.DateFormatter('%H:%M'))
plt.show()

Looking into the error, it seems that dates._from_ordinalf is receiving x as a timedelta64[ns] integer, which then produces a meaningless result when:

    dt = (np.datetime64(get_epoch()) +
          np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))

This may only be a symptom rather than the root cause, and perhaps DateFormatter is not passing the expected value?

If it is the expected value being passed in this case, then the following would not produce the error...

    dt = (np.datetime64(get_epoch()) +
          np.timedelta64(int(np.round(x), 'ns'))

... noting that x is already in ns (rather than 'us').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

broken_barh appears not to work with datetime/timedelta objects
5 participants