Skip to content

WIP: ENH: autodecode pandas timestamps #10638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Mar 1, 2018

PR Summary

attempt at #10533 after @efiring 's suggestion.

The following now works. (needs a recent pandas for the deregister step to work).

ping @jorisvandenbossche - if you have a more complete or canonical PR, feel free to push instead of this....

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

pd.plotting.deregister_matplotlib_converters()

xticks = pd.date_range(start="10/10/2017", end="14/10/2017",
                       freq='D')
print(type(xticks))
print(xticks.to_pydatetime())
plt.plot(xticks, np.arange(len(xticks)))
plt.xticks(xticks, xticks.strftime("%a %m-%d"))
plt.show()

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@@ -430,6 +430,9 @@ def date2num(d):
For details see the module docstring.
"""

if hasattr(d, "to_pydatetime"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that this is to give pandas a special treatment here? (and below)

@@ -430,6 +430,9 @@ def date2num(d):
For details see the module docstring.
"""

if hasattr(d, "to_pydatetime"):
d = d.to_pydatetime()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are unnecessarily losing efficiency here; to avoid that, use the same 2-part conditional here as you use in get_converter. That way, if the input is basically datetime64 it can be handled by your nice, efficient, fully vectorized converter instead of making a round-trip through the clunky datetime.datetime machinery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pandas go to datetime64? Ity my understanding that the underlying data type is datetime64[ns], but I don't know what attribute to test for to get that to come out...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, duh, I see...

@jklymak jklymak force-pushed the enh-autodecode-pandastimestamp branch from caa6de5 to 333b2ea Compare March 1, 2018 18:37
Copy link

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
I don't have a PR or branch yet (was planning to do that this evening). I was thinking to go the way to check for the dtype, but anyway, I think there are multiple options that are not necessarily better or worse.

if hasattr(d, "values"):
d = d.values
if hasattr(d, "to_pydatetime"):
d = d.to_pydatetime()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this one is needed, as date2num already handles Timestamp objects (them being datetime.datetime subclasses):

In [82]: date2num(pd.Timestamp('2012-01-01'))
Out[82]: 734503.0

In [85]: date2num(datetime.datetime(2012, 1, 1))
Out[85]: 734503.0

@jklymak jklymak force-pushed the enh-autodecode-pandastimestamp branch 2 times, most recently from 752e403 to 16ac8b7 Compare March 1, 2018 20:26
@@ -154,6 +154,10 @@ def get_converter(self, x):
# DISABLED cached = self._cached.get(idx)
# DISABLED if cached is not None: return cached

if hasattr(x, "values"):
# this unpacks pandas datetime objects...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually much more general than that; it is returning the underlying ndarray from a pandas Series or DataFrame.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@jklymak jklymak force-pushed the enh-autodecode-pandastimestamp branch from 16ac8b7 to cecf1f5 Compare March 1, 2018 22:29
Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give it a try.

@jklymak
Copy link
Member Author

jklymak commented Mar 1, 2018

I do wonder if this should be in 2.2, but I guess pandas is eager to drop matplotlib as an unconditional dependency?

@efiring
Copy link
Member

efiring commented Mar 1, 2018

It changes behavior for base mpl only in the case where an object with a ".values" attribute is supplied. The only danger I see there is that someone might have an ndarray subclass with a ".values" attribute that does something other than return the ndarray. So, it's fairly safe for mpl. It could be made even safer by checking to ensure .values really has returned an ndarray, but I doubt this will be necessary, at least for now.

I can imagine that it might have an undesired effect when pandas is in use, by always calling .values instead of simply passing the pandas object on, possibly to be handled by a registered converter that would be looking for the original object. It would be good to have a pandas person look at it from that standpoint. I don't know whether pandas does, or would, register such a converter.

@jklymak jklymak force-pushed the enh-autodecode-pandastimestamp branch from cecf1f5 to 873addf Compare March 1, 2018 23:14
@jklymak
Copy link
Member Author

jklymak commented Mar 1, 2018

Oooops, yes, the old version will bypass the pandas converter, beacuse I put the check in the wrong spot (not where you told me to). I think that was a mistake born of another mistake.

This newly pushed version still works w/ the above example, and has the converter after pandas should have chosen a converter...

@jklymak jklymak force-pushed the enh-autodecode-pandastimestamp branch from 873addf to 59bbccb Compare March 1, 2018 23:17
@jorisvandenbossche
Copy link

I will take a closer look tomorrow from the pandas side. But, if pandas registers converters, it registers the same converters for both pandas.Timestamp as datetime.datetime and np.datetime64. So I don't think we have to worry about bypassing the pandas converter, if it is registered.

It might be good to add a few tests for this (I suppose you can directly use pandas there to actually test it?)

@jklymak
Copy link
Member Author

jklymak commented Mar 1, 2018

Yes, we can use pandas, though test suggestions welcome.

@tacaswell tacaswell merged commit ea02861 into matplotlib:master Mar 3, 2018
@jklymak
Copy link
Member Author

jklymak commented Mar 4, 2018

@jorisvandenbossche Though this has been merged, it really should get some tests. happy to help with a PR to get those in.

@jklymak jklymak deleted the enh-autodecode-pandastimestamp branch July 16, 2018 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants