Skip to content

Recognize pandas Timestamp objects for DateConverter? #10533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Feb 19, 2018 · 24 comments
Closed

Recognize pandas Timestamp objects for DateConverter? #10533

jorisvandenbossche opened this issue Feb 19, 2018 · 24 comments

Comments

@jorisvandenbossche
Copy link

I am testing matplotlib 2.2rc from a point of view of "what if pandas no longer registers its unit converters", and one of the issues raised when we temporarily did that for the 0.21.0 release was the following (pandas-dev/pandas#18283):

Code for reproduction

import pandas as pd
import matplotlib.pyplot as plt

pd.plotting.deregister_matplotlib_converters()

xticks = pd.date_range(start="10/10/2017", end="11/11/2017",
                       freq='D')
plt.xticks(xticks, xticks.strftime("%a %m-%d"))

Actual outcome

The above raises TypeError: Cannot compare type 'Timestamp' with type 'float', because Timestamps are not recognized to use the DateConverter in the units framework.

Would it be possible to consider adding pandas Timestamps to the unit registry? It are subclasses of datetime.datetime, so code-wise it should not be hard to add support.

Matplotlib version

  • Operating system: Linux
  • Matplotlib version: '2.2.0rc1'
  • Python version: 3.5
@tacaswell
Copy link
Member

Is there a way to do that without importing pandas? I don't think we want to unconditionally try to import pandas (for the same reasons pandas does not want to unconditionally try to import matplotlib).

Could both mpl and pandas do

if other_one in sys.modules:
    pd.plotting.register_matplotlib_converters()

I think that would let both be lazy and not import the other and get around import-order issues.

@tacaswell tacaswell added this to the v2.2.0 milestone Feb 19, 2018
@jorisvandenbossche
Copy link
Author

Hmm, yes, I didn't think of that problem ..

I think that would let both be lazy and not import the other and get around import-order issues.

Yes, if we both tried to register it lazily, that sounds like a possible solution.
(we only need to register something different as pd.plotting.register_matplotlib_converters(), because that one is registering our own, custom converters, while in this case we want to register matplotlib's DateConverter)

@WeatherGod
Copy link
Member

WeatherGod commented Feb 19, 2018 via email

@tacaswell
Copy link
Member

Pushing this to 3 as I am not sure there is a quick fix.

@tacaswell tacaswell modified the milestones: v2.2.0, v3.0 Feb 19, 2018
@jorisvandenbossche
Copy link
Author

Pushing this to 3 as I am not sure there is a quick fix.

Your idea above would not be accepted if I did a PR?

Just spitballin' here, but have we considered maybe utilizing pluggy for things like this, and let other packages register conerters that way?

@WeatherGod it's not really the problem that it is difficult to register converters (pandas does now register converters by adding to the units.registry). Rather, what we want to fix, is that importing pandas does this implicitly.

@tacaswell
Copy link
Member

My suggested above would be, pluggy as @WeatherGod suggested would not.

@jorisvandenbossche
Copy link
Author

jorisvandenbossche commented Feb 19, 2018

Another problem is DatetimeIndex not being recognized as datetime64 data. Many of the bug reports we got after we removed the automatic registering of the converters (which is now temporarily reverted) was about people doing like ax.plot(s.index, s).
That still doesn't work on matplotlib 2.2, although ax.plot(s) does actually work (so in some way matplotlib is dealing with pandas objects in a specific way, not sure how the unit detection happens correctly in this code path)

I am making an overview of all reported issues and checking if they run now on matplotlib 2.2 (without pandas' converters): http://nbviewer.jupyter.org/gist/jorisvandenbossche/926234900d5f54ef4e1016d77098ff46

I think we could also solve this in a more general way that does not depend on checking the presence of pandas.
For example, for DatetimeIndex, we could either check the dtype (or dtype.type) which is just numpy datetime64, which is registered, or either do a np.asarray on then try to infer the units again (although this is more costly).
For Timestamp, we could also check that it is a subclass of one the registered classes (since it is a subclass of datetime.datetime) instead of only checking exact class equality.

But doing the lazy check if pandas is available as discussed above would also work. Not sure which approach would be preferential.

@jorisvandenbossche
Copy link
Author

Or, another implicit method without importing pandas is to also try to infer the converter for the .values of the passed argument. It is similar to what already happens in index_of utility:

return y.index.values, y.values

that is used to get the x and y data in case of plt.plot(df) (which is the reason that plt.plot(df) where df has a DatetimeIndex actually works, but plt.plot(df.index, df.values) does not work).

@jorisvandenbossche
Copy link
Author

cc @TomAugspurger

@WeatherGod
Copy link
Member

WeatherGod commented Feb 20, 2018 via email

@anntzer
Copy link
Contributor

anntzer commented Feb 20, 2018

fwiw I think setuptools entry points may (possibly) be a simpler solution for plugin registration.

@jorisvandenbossche
Copy link
Author

Maybe someone can open a separate issue for dicussing plugin registration? I would like to keep this one focused on matplotlib actually being able to handle pandas objects properly.

@tacaswell tacaswell modified the milestones: v3.0, v2.2.0 Feb 20, 2018
@jorisvandenbossche
Copy link
Author

@tacaswell do you have any preference in what way to solve this?

  1. actually registering pandas types (with checking if pandas is already imported, and then pandas needs to do the same, so the library that is first imported registers them)
  2. implicitly ensuring that the correct converter is recognized for those types, without importing pandas (checking that it is a subclass for Timestamp, checking the .dtype or the .values for DatetimeIndex)

My personal preference would be 2, but I can do a PR for whathever matplotlib prefers.

@tacaswell
Copy link
Member

I have a preference for 1 because it is simpler and (I hope) non-controversial and should work universally on plotting method (so far as the unit handling works universally).

I am concerned 2 would rapidly escalate in complexity (but I would be happy to be wrong).

@jklymak
Copy link
Member

jklymak commented Feb 21, 2018

I mildly disagree that Matplotlib should be responsible for registering converters provided by downstream libraries (following arguments @efiring made when this first came up).

I'm fine w/ native types, numpy types, and datetime types being registered, primarily because we maintain those converters. But if we register downstream libraries on our end, where does it stop? Should Matplotlib register astropy, yt and xarray converters too? I appreciate pandas is used by a lot of people, but I think its a slippery slope to give it preferential treatment in the code.

@tacaswell
Copy link
Member

@jklymak I hear you and @efiring on this, but I am still 👍 on giving pandas special treatment (we already do ;) ).

Given @WeatherGod 's comments above we should investigate doing a proper plugin system to solve the lazy-import-order issues to handle the general case.

@jklymak
Copy link
Member

jklymak commented Feb 28, 2018

OK, I looked at this quickly. import pandas as pd calls import matplotlib (via pandas.plotting?), so its a little hard to set up pandas as a conditional dependency the way things are in the code right now given that it loads the module automatically. I appreciate thats what we are trying to get away from, but I don't understand the pandas call stack well enough to understand what they want done. Suggest this not be a blocker to 2.2.0....

@jorisvandenbossche
Copy link
Author

OK, I looked at this quickly. import pandas as pd calls import matplotlib (via pandas.plotting?), so its a little hard to set up pandas as a conditional dependency the way things are in the code right now given that it loads the module automatically.

I think you can still check if pandas is already imported or not. And if not, but is importable, register the types.

However, I am personally more in favor of the other approach (approach 2):

I have a preference for 1 because it is simpler and (I hope) non-controversial and should work universally on plotting method (so far as the unit handling works universally).
I am concerned 2 would rapidly escalate in complexity (but I would be happy to be wrong).

I personally think that option 2 is cleaner, and can be done in a generic way without being to explicitly tied to pandas (checking for the dtype of arrays is rather generic).
But, I will try to do put this tomorrow or the day after in some actual code, and then it will be much easier to evaluate how complex it would be or not.

@efiring
Copy link
Member

efiring commented Feb 28, 2018

I think we should move in the direction of (2); as has been pointed out above, we are already part of the way along that path, and I don't think that going the rest of the way need be very difficult. It can be done incrementally, if necessary. (#10631 is actually one such incremental step.) It is a clean solution--so long as pandas doesn't change their data structures too much.

@jklymak
Copy link
Member

jklymak commented Feb 28, 2018

Units now work via a registry, whereby you need to register data types and then they get sent to converters. I don't see how that'll work cleanly with implicit data types without some surgery in units.py (i.e. units.Registry.get_converter). But I'm not very adept at type manipulation, so maybe it can be made to work.

@efiring
Copy link
Member

efiring commented Mar 1, 2018

Just a little surgery. Something like this: inside get_converter, in front of

        # If x is an array, look inside the array for data with units
        if isinstance(x, np.ndarray) and x.size:
            xravel = x.ravel()

put

if hasattr(x, "values"):
    x = x.values
elif hasattr(x, "to_pydatetime"):
    x = x.to_pydatetime()

That's the duck-typing approach. Maybe not squeaky-clean, but I think it would work. Easy to say before testing, of course.

@tacaswell tacaswell modified the milestones: v2.2.0, v2.2.1 Mar 4, 2018
@tacaswell
Copy link
Member

We have #10638 in which I think addresses this problem, but keeping this open and moving to 2.2.1 just in case.

@efiring
Copy link
Member

efiring commented Mar 12, 2018

@jorisvandenbossche Can this be closed now? Or is there more to do?

@jklymak jklymak modified the milestones: v2.2.1, v3.0 Mar 12, 2018
@jklymak
Copy link
Member

jklymak commented Mar 24, 2018

I'll assume this is closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants