Skip to content

Fix TypeError when plotting stacked bar chart with decimal #13738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
May 12, 2019

Conversation

sasoripathos
Copy link
Contributor

@sasoripathos sasoripathos commented Mar 22, 2019

PR Summary

Hi, I'm opening this as part of coursework from University of Toronto Scarborough's CSCD01 taught by Dr. Anya Tafliovich @atafliovich. (https://www.utsc.utoronto.ca/~atafliovich/cscd01/index.html)

Fixes #10788

Bug Analysis and Technical Commentary:

When try to plot a bar graph with, a TypeError exception will be raised if values for bar bases are of type Decimal.

The problem is the width and height used to calculate the bar location are always of type float or int. There is no operation defined between float/int and Decimal.

The fix is adding a type conversion right before starting calculate the bar location according to align and orientation. If the input values for bar bases are of type Decimal, then convert all elements in the width/height array into Decimal. With further investigation, we cannot postpone the type conversion until the calculation time since the height/width array will later be used to create Patches (Artist).

Acceptance Test:

create a bar chart with Decimal values as input
using plt.show() to view the generaed chart
Result:
A bar chart is successfully rendered without TypeError

Example code:

from decimal import Decimal
import matplotlib.pyplot as plt

if __name__ == "__main__":
    plt.subplot(111)
    x = [Decimal(x) for x in range(10)]
    y = [Decimal(x) for x in range(10)]
    plt.bar(x, y)
    plt.show()

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for looking at this. This is a really hard problem you have chosen because it comes down to converting numbers in matplotlib, and that is messy. I suspect that axis.convert should be doing the conversion here, and I think it should always return a floating point array. But there could be knock-on effects. In particular you will need to be careful of masked arrays.

Instead of adding type conversion right before alignment calculation, move Decimal-float conversion into the existing conversion block.
Create a helper function _decimal_convert(data) to force conversion, which support masked array.

Update the related test cases to using check_figures_equal decorator instead of image comparation.
def test_bar_decimal_center(fig_test, fig_ref):
# Test vertical, align-center bar chart with Decimal() input
# No exception should be raised
ax = fig_test.subplots()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for symmetry, please move this down to # Test image-

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @timhoffm for advice. Have moved it down.

def test_barh_decimal_center(fig_test, fig_ref):
# Test horizontal, align-center bar chart with Decimal() input
# No exception should be raised
ax = fig_test.subplots()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move down as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved it down.

x = np.asarray(self.convert_xunits(x))
# self.convert_xunits doesn't convert Decimal input, use
# _decimal_convert to force conversion
x = self._decimal_convert(self.convert_xunits(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the design here. On the one hand, needing a converter around the converter feels wrong. On the other hand, do we want to change the return type of convert_xunits()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the design seems(?) problematic (is this converter going to end up being needed everywhere?).

Copy link
Contributor Author

@sasoripathos sasoripathos Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for advice @timhoffm @anntzer . But I am not sure whether I should update convert_units(self, x) method in Axis directly.

convert_xunits mainly consumes convert_units(self, x) in Axis class, whose first line is:

# If x is already a number, doesn't need converting
if munits.ConversionInterface.is_numlike(x):
      return x

Since Decimal is actually a Number, this method doesn't change the input x if x is a list of Decimal.

One possible solution is like:

# If x is already a number, doesn't need converting
if munits.ConversionInterface.is_numlike(x):
      return np.asarray(x, dtype=np.float)

But I am afraid this way may cause side effect when input x is a list of integer or other number type.

If we check whether x contains Decimal to decide whether need to convert x, then I feel the is_numlike(x) function in ConversionInterface doesn't quite make sense here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, I am curious about the following code in is_numlike(x):

if iterable(x):
    for thisx in x:
        return isinstance(thisx, Number)

This only check the first element in a list. Is this what it supposed to do when input x is a list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps is_numlike shouldbe changed to isinstance(x, Number) and not isinstance(x, Decimal) then (dunno).
The unit system assumes that lists are homogeneous. Yes, it's a pretty awkward assumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt the precision matters as everything ultimately gets converted to float at some point.

The default converter would be implemented in units.py but I certainly haven't thought about the specifics...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't spent a lot of time to think about it, but perhaps the correct approach may be to remove the if is_numlike(x): return x in convert_units() and instead always go through the unit conversion system, and then add a converter for Number (which will thus be used both for e.g. floats and for Decimal) which casts the input to float.

On first thought, it sounds reasonable to not special-case numbers but just handle them as a united quantity. However, I'm not sure if that would work:

    def convert_units(self, x):
        # If x is already a number, doesn't need converting
        if munits.ConversionInterface.is_numlike(x):
            return x

        if self.converter is None:
            self.converter = munits.registry.get_converter(x)

If self.converter is not None, you wouldn't get the proposed NumberFormatter; and therefore all converters would be required to pass numbers through`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sasoripathos Sorry that we're quite slow and vague with responses here. You've chosen a tricky issue; and as you can see from the discussion, we do not have a clear idea what the appropriate fix should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timhoffm Thanks for reply! That's also a place I am confused about. I still haven't figure out what should be the return value of axisinfo(unit, axis) and default_units(x, axis) if i implemented the proposed NumberFormatter. Please pardon my beginner knowledge about matplotlib. I am still looking into the proposed approach and will give a try next week. I also think my current implementation is not that elegent since it (kind of) "breaks" the Number type, but i think it may be (hopefully) a working walk around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timhoffm @anntzer @jklymak Hi! I just pushed a DecimalConverter I implemented. Since originally no convertion will be done if x is a number-like input, I just make axisinfo(unit, axis) return empty AxisInfo and default_units(x, axis) return None. Besides, I use ConversionInterface.is_numlike(x) to avoid requiring all existing converters pass Numbers.

Could you please have a review and give me some feedback? Thanks very much!

Instead of creating a Decimal converter function in Axes, update the convert_units in Axis to convert x when x is a Decimal or a list of Decimal.
Update the comments in test_axes.py for symmetry.
In axis.py, instead of first checking whether input is number like, direct handle them as a united quantity.
Since Decimal is a kind of Number, so DecimalConvert doesn't need to provide extra AxisInfo or default unit.
ax.bar(x0, y0, align='center')


@check_figures_equal(extensions=["png"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think testing both bar and barh is particularly necessary and would just keep one of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if width (or height) is a Decimal?

Copy link
Contributor Author

@sasoripathos sasoripathos May 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be no problem since width will also be converted. In the following code, if width is of type Decimal, it will be converted to floats by convert_xunits in _convert_dx eventually.

 if self.xaxis is not None:
    x0 = x
    x = np.asarray(self.convert_xunits(x))
    width = self._convert_dx(width, x0, x, self.convert_xunits)
    ......

Similarly for height.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can generate plots with float x/y but Decimal width/height as this:
image

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment re: testing left, but other than that it looks good.
Congrats for tackling a fairly hairy problem :)

value : decimal.Decimal or iterable
Decimal or list of Decimal need to be converted
"""
# If value is a Decimal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be left out. The following code already states exactly that.

# If value is a Decimal
if isinstance(value, Decimal):
return np.float(value)
# else x is a list of Decimal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to

else:
    # assume x is a list of Decimal

@@ -1571,10 +1571,6 @@ def have_units(self):
return self.converter is not None or self.units is not None

def convert_units(self, x):
# If x is already a number, doesn't need converting
if munits.ConversionInterface.is_numlike(x):
Copy link
Member

@timhoffm timhoffm Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary this way? The purpose of converters is to convert to something number-like matplotlib can work with. So the original logic "if is number return else convert" seems more clear than "convert; if that fails, did we already have a number? Ok then we just use it".

Maybe munits.ConversionInterface.is_numlike() is just not the right function/name. I'd prefer this to stay and be

if not munits.ConversionInterface.needs_conversion(x):
    return x

That will allow to cleanly pass through Decimals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not sure on the sign of the logic is_natively_supported_numlike(x) may be better than not needs_conversion(x).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original cause of this problem is the operations like plus between Decimal and float is not supported by matplotlib. If there is not such operations in the logic, Decimal input will not cause any error. As we discussed before, it doesn't quite make sense to distinguish Decimal and other types of number since Decimal is a type of Number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I'm saying "is_numlike" is not the right check. Semantically, we want

if matplotlib_can_natively_work_with(x):
     return x
# try to convert x

It's not about number or not, it's about "do we need a converter". So, if matplotlib cannot natively handle Decimal that can fall through to the converter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we define natively handle? For Decimal input, matplotlib supports it if there is no operation between Decimal and float. For example:

x = [1, 2, 3, 4]
y = [Decimal(1.5), Decimal(2.5), Decimal(3.5), Decimal(4.5)]
ax.bar(x, y, align='center')

This piece of code works fine. I am not sure about other kinds of plot, but based on this, I think matplotlib does not totally support/unsupport Decimal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, operationally, "can natively handle" is currently is_numlike(x) and not isinstance(x, Decimal). I mean, that's what your proposed code does as well.

It's a bit clumsy to define is that way, but that's the current logic. And I prefer to have that logic explicitly written down in a single place in a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see currently is_numlike is used in some places like StrCategoryConverter.convert() in category.py. And from the code (as below), it looks like the Decimal input will not cause any problems here (since Decimal will be converted to floats).

# pass through sequence of non binary numbers
if all((units.ConversionInterface.is_numlike(v) and
        not isinstance(v, (str, bytes))) for v in values):
    return np.asarray(values, dtype=float)

So I think maybe instead of change is_numlike(x), which cause API changes, we can add a new function is_natively_supported_numlike(x) like proposed in ConversionInterface. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to leave is_numlike(x) as is and introduce and use ConversionInterface.is_natively_supported_numlike(x)

Naming: Not sure if we need the _numlike suffix in the name. On the one hand, this is exactly meant as is_natively_supported - no artificial restriction to numlikes necessary (even though it's unlikely that we'll natively support non-numlikes). On the other hand, the _numlike may give additional context to the reader, but I would see this more as a job of the is_natively_supported* docstring.

is_numlike(x) may become deprecated in the future, but we can handle this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a function named is_natively_supported() in ConversionInterface. Since the current unit system assume lists are homogeneous, I followed the same style as is_numlik(). Now I only consider all numbers except Decimal are natively supported.

@tacaswell tacaswell added this to the v3.2.0 milestone May 11, 2019
@jklymak jklymak dismissed their stale review May 12, 2019 01:37

stale

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK, though I'm somewhat leery of running this through a converter like this.

Can you check width and length of the bars because we have issues with dx and dy-like...

ax.bar(x0, y0, align='center')


@check_figures_equal(extensions=["png"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if width (or height) is a Decimal?

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should get a few more tests of different drawing methods. Maybe just scatter and plot are good enough.

But otherwise I think it looks good.

Add one test case for plot with Decimal input
Add one test case for scatter plot with Decimal input
Add two test cases for bar plot with Decimal height/width
@sasoripathos
Copy link
Contributor Author

@jklymak I added a few more test cases. Please have a check!

@jklymak jklymak merged commit 35b875c into matplotlib:master May 12, 2019
@jklymak
Copy link
Member

jklymak commented May 12, 2019

Thanks a lot @sasoripathos ; nice contribution on a tricky problem. Your persistence is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError when plotting stacked bar chart with decimal
5 participants