Skip to content

axes.bar fails when x is int-indexed pandas.Series #15162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
infrub opened this issue Aug 31, 2019 · 9 comments · Fixed by #15166
Closed

axes.bar fails when x is int-indexed pandas.Series #15162

infrub opened this issue Aug 31, 2019 · 9 comments · Fixed by #15166

Comments

@infrub
Copy link

infrub commented Aug 31, 2019

Bug report

Bug summary

The following codes fail in matplotlib 3.1.1, while work properly in 3.0.3

case 1

Code for reproduction

import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame({"x":[1,2,3],"width":[.2,.4,.6]},index=[1,2,3])

plt.figure()
plt.bar(df.x, 1, width=df.width)
plt.show()

Actual outcome

Traceback (most recent call last):
  File "/Desktop/fail_example.py", line 7, in <module>
    plt.bar(df.x, 1, width=df.width)
  File "/.pyenv/versions/3.7.3/lib/python3.7/site-packages/matplotlib/pyplot.py", line 2440, in bar
    **({"data": data} if data is not None else {}), **kwargs)
  File "/.pyenv/versions/3.7.3/lib/python3.7/site-packages/matplotlib/__init__.py", line 1601, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/.pyenv/versions/3.7.3/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 2430, in bar
    label='_nolegend_',
  File "/.pyenv/versions/3.7.3/lib/python3.7/site-packages/matplotlib/patches.py", line 707, in __init__
    Patch.__init__(self, **kwargs)
  File "/.pyenv/versions/3.7.3/lib/python3.7/site-packages/matplotlib/patches.py", line 89, in __init__
    self.set_linewidth(linewidth)
  File "/.pyenv/versions/3.7.3/lib/python3.7/site-packages/matplotlib/patches.py", line 368, in set_linewidth
    self._linewidth = float(w)

Expected outcome
Outcome in 3.0.3:
Figure_1

case 2

Code for reproduction

import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame({"x":[1,3,10]},index=[1000,2000,3000])

plt.figure()
plt.bar(df.x, 1, width=.2)
plt.show()

Actual outcome
Figure_2

Expected outcome
Outcome in 3.0.3:
Figure_3

Matplotlib version

  • Operating system: MacOSX
  • Matplotlib version: 3.1.1 (installed from pip)
  • Matplotlib backend (print(matplotlib.get_backend())): MacOSX
  • Python version: 3.7.3
  • Jupyter version (if applicable):
  • Other libraries: pandas 0.24.2

Cause of the bug

_axes.py, line 2363 in bar:
            x0 = x
            x = np.asarray(self.convert_xunits(x))
            width = self._convert_dx(width, x0, x, self.convert_xunits)
_axes.py, line 2166 in _convert_dx: 
            try:
                x0 = x0[0]
            except (TypeError, IndexError, KeyError):
                x0 = x0

I guess x0 is expected to be a scalar after this line, However when x0 is an int-indexed (and 0 is not in its indices) pandas.Series, it is continued to be pandas.Series and causes chained disorders to end in the error above.
To evidence, they worked properly when rewritten as (ugly!):

            try:
                x0 = x0[0]
            except (TypeError, IndexError, KeyError):
                try:
                    x0 = x0.iat[0]
                except:
                    x0 = x0
@infrub infrub changed the title Axes._convert_dx(dx,x0,x,convert) fails when x0 is an int-indexed pandas.Series bar(Axes._convert_dx(dx,x0,x,convert) fails when x0 is an int-indexed pandas.Series Aug 31, 2019
@infrub infrub changed the title bar(Axes._convert_dx(dx,x0,x,convert) fails when x0 is an int-indexed pandas.Series axes.bar fails when x and width are int-indexed pandas.Series Aug 31, 2019
@infrub infrub changed the title axes.bar fails when x and width are int-indexed pandas.Series axes.bar fails when x is int-indexed pandas.Series Aug 31, 2019
@ImportanceOfBeingErnest
Copy link
Member

ImportanceOfBeingErnest commented Aug 31, 2019

This came in with #12903.


Worth noting that plt.bar(df.x, 1, width=.1) also produces wrong output (though there is no error in that case).

@jklymak
Copy link
Member

jklymak commented Aug 31, 2019

Sorry for the bug. I guess we could just put the old messy code back in, but it'd be nice if someone who understood pandas data types and inheritance figured out the proper solution. Obviously if you remove the index argument this works, or changing the index to zero-indexing. Is iat the way to get the first element in the series regardless of how its been indexed? Its inelegant of us to have special pandas magic all over our code, so it would be nice if something turned x into a proper numpy array earlier.

@ImportanceOfBeingErnest
Copy link
Member

Apparently .iat is preferred over .iloc.

@jklymak
Copy link
Member

jklymak commented Aug 31, 2019

Yeah but why doesn’t df.x[0] return the first element?

@ImportanceOfBeingErnest
Copy link
Member

For the same reason {1 : True}[0] raises an error I suppose. The confusing bit is just that df.index[0] actually works, because an index behaves differently than a series.

@jklymak
Copy link
Member

jklymak commented Aug 31, 2019

OK, well, this comes back to the units parsing. We can't just np.asarray because the way some packages implement units, this kills the units information, and we have no good way to get the units out to save for later. But, if we don't do this, then pandas doesn't work if someone manually sets the indices because x[0] no longer works. Yes, we could keep special casing every possible way data gets passed to functions, or we could come up with a few rules that we always apply to everything that comes in.

The obvious stopgap is to just put in the x.iat[0], but again, I really think that its bad engineering to have triple-nested try/except blocks in the code as we try and guess what kind of data we have been passed. So I'll keep this open to hopefully engender discussion.

@ImportanceOfBeingErnest
Copy link
Member

ImportanceOfBeingErnest commented Aug 31, 2019

Is there any alternative to finding the type of the first element? The culprit seems to be the unit package that subclasses numpy.ndarray, right? So unless that package has some other means to know what the array contains, one would have no choice but to stick to the currently solution. And in that case, special-casing pandas series is the only way out. Instead of try..except one could also use hasattr(x, "iat"), or "series" in str(type(x)).split["."] or so.

@jklymak
Copy link
Member

jklymak commented Aug 31, 2019

I think I have a solution that squares all the constraints, but avoids using a library-specific method like iat. I’ll post later tonight.

@jklymak jklymak mentioned this issue Aug 31, 2019
6 tasks
@tacaswell
Copy link
Member

Yeah but why doesn’t df.x[0] return the first element?

It is better to think of data frames as "dicts of Series with a bunch of helper methods" and Series as "ordered dicts of values with an optional index based lookup and helper methods".

I have lobbied Jeff Reback to get the behavior of [] change to be positional (aka integer) by default, but a) that would cause an insane amount of code breakage b) there is a big constituency of pandas users who have non-trivial indexes and benefit from s.loc[k] / s.at[k] (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html the "label" based one) being fast-pathed rather and s.iloc[k] / s.iat[k]. The extra tricky thing is if your labels happen to be integers it looks like they behave the same. Everyone has questionable early design decision that they can not back out of 🤷‍♀️ .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants