Skip to content

Use data argument for scatter plotting timestamps from pandas #11391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ImportanceOfBeingErnest opened this issue Jun 6, 2018 · 15 comments
Closed

Comments

@ImportanceOfBeingErnest
Copy link
Member

ImportanceOfBeingErnest commented Jun 6, 2018

Bug report

Bug summary

This may be either a bug report or a feature request, depending how you view things. The issue is that you cannot use the data argument to scatter to plot a pandas dataframe Timestamp column.

Code for reproduction

Consider the following data

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(42)

df = pd.DataFrame(np.random.rand(10,2), columns=["x","y"])
df["time"] = pd.date_range("2018-01-01", periods=10, freq="D")

Using the data argument, you may plot

plt.plot("x", "y", data=df)

You may equally plot

plt.plot("time", "y", data=df)

You may also scatter

plt.scatter(x="x", y="y", data=df)

However, you may not scatter the timestamps

plt.scatter(x="time", y="y", data=df)

This fails with a TypeError: invalid type promotion

Traceback (most recent call last):
  File "D:/.../test/GI_scatterdataarg.py", line 40, in <module>
    plt.scatter(x="time", y="y", data=df)
  File "d:\...\matplotlib\lib\matplotlib\pyplot.py", line 2623, in scatter
    verts=verts, edgecolors=edgecolors, data=data, **kwargs)
  File "d:\...\matplotlib\lib\matplotlib\__init__.py", line 1775, in inner
    return func(ax, *args, **kwargs)
  File "d:\...\matplotlib\lib\matplotlib\axes\_axes.py", line 3982, in scatter
    offsets = np.column_stack([x, y])
  File "D:\...\matplotlib\lib\site-packages\numpy\lib\shape_base.py", line 369, in column_stack
    return _nx.concatenate(arrays, 1)
TypeError: invalid type promotion

This is a bit surprising since you may well plot timestamps through scatter

plt.scatter(x=df.time.values, y=df.x.values)

just not through the data argument.

Expected outcome

Given that one may plot the timestamps correctly with plot and timestamps with scatter by providing the values, it would be nice to be able to use the data argument with scatter as well.

... and ideally with bar, fill_between etc. ;-)

Matplotlib version

  • Operating system: Win 8.1
  • Matplotlib version: master 2.2.2.post1270+g6665bc739
  • Matplotlib backend: Qt5Agg
  • Python version: 3.6
  • Jupyter version (if applicable):
  • Other libraries: Pandas 0.22.0
@Sdoof
Copy link

Sdoof commented Jun 13, 2018

Fully agree, some consitency would be nice

@jklymak
Copy link
Member

jklymak commented Jun 21, 2018

This seems to be a bug in that the data isn’t being converted to the right form early enough.

@jklymak
Copy link
Member

jklymak commented Jul 8, 2018

I actually think this is likely downstream in pandas. The units converter called is pandas.plotting._converter.DatetimeConverter, and it doesn't convert to matplotlib datenums. If you call ax.scatter(x=df.time.values, y=df.x.values) then it does convert the values to matplotlib datenums.

I'm going to close as needing a downstream fix, but I may be just passing the buck and not understanding something.

@ImportanceOfBeingErnest
Copy link
Member Author

Yep, I guess the problem is more that ax.scatter(x=df.time, y=df.x) doesn't work either.

@tacaswell tacaswell reopened this Jul 10, 2018
@tacaswell tacaswell modified the milestones: v3.0, v3.1 Jul 10, 2018
@tacaswell
Copy link
Member

What is the type that is failing to be promoted?

I suspect that this is going to boil down to a datetime64 issue...

@jklymak
Copy link
Member

jklymak commented Jul 10, 2018

Again, if you call ax.scatter(x=df.time, y=df.x) you get the Pandas datetime converter: pandas.plotting._converter.DatetimeConverter.

As far as I can tell, the DatetimeConverter doesn't do any conversion on the Pandas series object:

Adding the following print statements to convert_units in axis.py

        ret = self.converter.convert(x, self.units, self)
        print(self.converter)
        print('x', x)
        print('\n x type', type(x))
        print('\nret', ret)
        print('\nret type', type(ret))
<pandas.plotting._converter.DatetimeConverter object at 0x1147f7d68>
x 0   2018-01-01
1   2018-01-02
2   2018-01-03
3   2018-01-04
4   2018-01-05
5   2018-01-06
6   2018-01-07
7   2018-01-08
8   2018-01-09
9   2018-01-10
Name: time, dtype: datetime64[ns]

 x type <class 'pandas.core.series.Series'>


ret 0   2018-01-01
1   2018-01-02
2   2018-01-03
3   2018-01-04
4   2018-01-05
5   2018-01-06
6   2018-01-07
7   2018-01-08
8   2018-01-09
9   2018-01-10
Name: time, dtype: datetime64[ns]

ret type <class 'pandas.core.series.Series'>

As you can see, the Pandas unit conversion doesn't do anything to the series.

@jklymak
Copy link
Member

jklymak commented Jul 15, 2018

Just to re-iterate the idea this is in panda's court; if we deregister the pandas converter this all works fine due to #10638. We did #10638 so that pandas didn't have to register their converter by default. I think a todo is to run a bunch of tests....

ping @TomAugspurger.

import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import pandas as pd

pd.plotting.deregister_matplotlib_converters()

dates = [datetime(2018,7,i) for i in range(1, 5)]
values = np.cumsum(np.random.rand(len(dates)))

df = pd.DataFrame({"dates":dates, "values" : values})

plt.plot(df["dates"],  df["values"])
plt.scatter(df["dates"],  df["values"])

plt.show()

@jklymak
Copy link
Member

jklymak commented Jul 15, 2018

import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import pandas as pd

pd.plotting.deregister_matplotlib_converters()

dates = [datetime(2018,7,i) for i in range(1, 5)]
values = np.cumsum(np.random.rand(len(dates)))

df = pd.DataFrame({"dates":dates, "values" : values})

plt.plot(df["dates"],  df["values"])
plt.scatter(df["dates"],  df["values"])
plt.plot("dates", "values", data=df)
plt.scatter(x="dates", y="values", data=df)

plt.show()

all works...

@TomAugspurger
Copy link
Contributor

Sorry for the delay. Yeah, this looks to be a pandas issue

diff --git a/pandas/plotting/_converter.py b/pandas/plotting/_converter.py
index beebf84b8..cc835cd95 100644
--- a/pandas/plotting/_converter.py
+++ b/pandas/plotting/_converter.py
@@ -320,7 +320,11 @@ class DatetimeConverter(dates.DateConverter):
             return values
         elif isinstance(values, compat.string_types):
             return try_parse(values)
-        elif isinstance(values, (list, tuple, np.ndarray, Index)):
+        elif isinstance(values, (list, tuple, np.ndarray, Index, ABCSeries)):
+            if isinstance(values, ABCSeries):
+                # https://github.com/matplotlib/matplotlib/issues/11391
+                # Series was skipped. Convert to DatetimeIndex to get asi8
+                values = Index(values)
             if isinstance(values, Index):
                 values = values.values
             if not isinstance(values, np.ndarray):

gh

I'll put that up as a pandas PR.

@jklymak
Copy link
Member

jklymak commented Jul 24, 2018

Awesome! I still think #11664 is the right thing to do from matplotlib's point of view, assuming pandas is still planning to not automatically register the pandas converters at import pandas. #11664 natively unpacks the pandas data type using np.asarray.

@jklymak
Copy link
Member

jklymak commented Jan 27, 2019

@TomAugspurger: What is panda's long-term plan vis-a-vis registering the matplotlib converter when pandas is imported?

I'm going to close this as fixed on panda's end....

@jklymak jklymak closed this as completed Jan 27, 2019
@TomAugspurger
Copy link
Contributor

Long-term, we'd like to not have import pandas also import matplotlib and register our converters. People will need to either make a plot with DataFrame.plot or Series.plot (which will lazily load matplotlib and register the converter), or expicility register the converter with pandas.plotting.register_matplotlib_converters().

Right now we're supposed to have a warning if people are implicitly relying on our converters, but that's apparently not working...

@jklymak
Copy link
Member

jklymak commented Jan 27, 2019

Thanks!, Ideally, I think matplotlib should be able to natively unpack panda's frames w/o pandas' converters, and I have such a PR submitted (#11664), but I wanted to make sure this was still pandas' long-term goal before pushing that PR forward. (i.e. there is not much point in making things work with the pandas' converters deregistered, if they are always going to be registered by import pandas.)

@jklymak jklymak mentioned this issue Jan 28, 2019
6 tasks
@ImportanceOfBeingErnest
Copy link
Member Author

It seems this has now become Schrödinger's converter.
The code from the original post works now if you either register the converter manually or deregister the converter manually. Meaning either of

pd.plotting.deregister_matplotlib_converters()

or

pd.plotting.register_matplotlib_converters()

makes this work. It will throw a warning if you do neither of the above. Is this a semantic problem? I guess one wouldn't expect to get the correct output by performing a registration or a deregistration, while leaving it in a transient state if doing neither?

@jklymak
Copy link
Member

jklymak commented Feb 15, 2019

Last I checked, importing pandas still registers their converter. If you unregister it, you are using matplotlibs converter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants