Skip to content

Fix unit support with plot and pint #4803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 14, 2015

Conversation

dopplershift
Copy link
Contributor

pint is a unit support library that wraps (almost) anything with mathematical-aware unit operations. Since it tries to quack like a numpy array, but is not actually an ndarray subclass, this creates problems when thrown into functions like atleast_1d. atleast_1d converts all arguments, regardless of whether they are properly dimensioned, into numpy arrays (or subclasses).

Currently, pint can't even trigger matplotlib's unit infrastructure, as evidenced by:

import pint
import numpy as np
import matplotlib.units as munits

units = pint.UnitRegistry()

# Implementation of matplotlib support for unit conversion using pint
class PintConverter(object):
    @staticmethod
    def convert(value, unit, axis):
        print('converting', value, 'to', unit, 'axis units: ', axis.get_units())
        if hasattr(value, 'units'):
            return value.to(unit)
        else:
            return units.Quantity(value, axis.get_units()).to(unit).magnitude

    @staticmethod
    def axisinfo(unit, axis):
        print('axisinfo')
        return None

    @staticmethod
    def default_units(x, axis):
        print('defaultunits')
        return x.units

# Register the class
munits.registry[units.Quantity] = PintConverter()

import matplotlib.pyplot as plt
t = np.linspace(0, 10) * units.sec
v = 30 * units('m/s')
d = t * v

fig,ax = plt.subplots(1,1)
l, = plt.plot(t, d)
ax.yaxis.set_units(units.inch)
ax.autoscale_view()
plt.show()

This example triggers no print statements on master. With these changes, unit support works (I think) properly with pint.

@mdboom
Copy link
Member

mdboom commented Jul 28, 2015

👍

Just an aside, I've been working on astropy.units support for matplotlib over here: astropy/astropy#3981.

@tacaswell
Copy link
Member

I am poking the same section of code in #4787 to pull out the index if it exists.

@shoyer
Copy link

shoyer commented Jul 29, 2015

This is nice idea, but it would almost certainly break existing code using plt.plot. The problem is that it's not safe to assume that all duck-array types (with a shape) can be indexed with np.newaxis.

To my surprise, this actually does work for pandas Index and Series objects:

>>> pd.Series(range(5))[:, None]
array([[0],
       [1],
       [2],
       [3],
       [4]])

But there are other duck array types that can't handle this operation, even though they implement __array__ to support conversion to numpy arrays. For example:

>>> xray.DataArray(range(5))[:, None]
IndexError: too many indices

h5py datasets are another example, off the top of my head.

To be fair, this tradeoff might be worthwhile. In many cases, plotting duck arrays already fails, and this may not be a feature that matplotlib ever intended to support. But it's certainly worth considering the broader question of what classes of duck array types matplotlib intends to support as input to plotting functions. In many cases, calling np.asarray is the appropriate response to unexpected input.

After all that discussion, it occurs to me that a better approach here would be simply to explicitly look for types that have been registered with matplotlib's unit system rather than to whitelist all objects with a shape attribute.

@tacaswell tacaswell added this to the proposed next point release milestone Jul 29, 2015
@dopplershift
Copy link
Contributor Author

Well _check_1d is the simplest possible implementation that accomplished the use cases I wanted. It's simple enough to enhance the duck typing to check the operations we actually need. I prefer this approach because, technically, this PR is orthogonal to the units support--it's fixing the conversion to array by atleast_1d when it's not needed.

@dopplershift
Copy link
Contributor Author

I've updated this to have more thorough checks. The bonus here is that it seems to eliminate any changes to the examples, though I won't completely trust that until Travis finishes...

'''
Converts a sequence of less than 1 dimension, to an array of 1
dimension; leaves everything else untouched.
'''
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about instead using something like the following?

def asarray_units_okay(x):
    if type(x) in units.registry:
        return x
    else:
        return np.asarray(x)

def atleast_1d_units_okay(x);
    x = asarray_units_okay(x)
    if x.ndim == 0:
        x = x[np.newaxis]
    return x

I am nervous about preemptively checking all allowed operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks anything using units that relies upon having atleast_1d called; this includes the "simple" unit framework in the examples.

Otherwise, the decomposition seems fine.

@dopplershift
Copy link
Contributor Author

@tacaswell Any chance we can sneak this in for 1.5?

@tacaswell
Copy link
Member

Yes, but this is going to conflict with the logic added in https://github.com/matplotlib/matplotlib/pull/4829/files#diff-a7f4f21aa80e975d813bc436acb68d8cL280
only question is how manage that.

I am sure how the discussion with @shoyer ended, it looks like the comment should have been folded as addressed, but is pinned to the documentation?

@dopplershift
Copy link
Contributor Author

I'm willing to rebase/rework this once #4829 lands.

As far as the discussion goes, @shoyer and I disagree about the approach. I don't think we should rely upon the units registry, and that pre-emptively checking operations should be fine. Your thoughts here would be most welcome.

@tacaswell
Copy link
Member

I like the preemptive checking better as I think it plays better with some of the label/index related checking else where. Relying on the unit registry also adds a bit more order dependence then is strictly necessary as in most cases the conversion to float for plotting is deferred to draw time where as the proposed alternative method would possibly strip some of the data earlier.

@jrevans might have useful thoughts on this as well.

@tacaswell tacaswell modified the milestones: next point release, proposed next point release Aug 14, 2015
@tacaswell tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Sep 2, 2015
@tacaswell tacaswell mentioned this pull request Sep 2, 2015
17 tasks
@tacaswell
Copy link
Member

@dopplershift This needs a rebase now

@jkseppan
Copy link
Member

There's a rebase on my branch fix-plot-units, but since this PR adds no tests, it's hard to say if the rebase still fixes the original issue. I would recommend adding a test case.

See jkseppan/matplotlib@ab04ebabcd

@dopplershift
Copy link
Contributor Author

Rebase done. I agree about the tests, but my problem is that I couldn't readily come up with a good test that didn't involve depending/ingesting pint.

@dopplershift
Copy link
Contributor Author

I can confirm that the "test" I posted at the top works with the rebased branch. I'll take a quick look at a stubbed out "pint" that works for testing.

@dopplershift
Copy link
Contributor Author

Alright, simple test added (and now I know mock!). I verified that this test fails before my changes, and succeeds after.

@tacaswell
Copy link
Member

remember to white-list the new test file __init__.py

@dopplershift
Copy link
Contributor Author

@tacaswell huh?

@tacaswell
Copy link
Member

This is the list of tests that are run by default https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/__init__.py#L1416 (and this is what travis runs)

@dopplershift
Copy link
Contributor Author

I don't understand the failures (or at least how they can only fail sometimes):

AttributeError: 'module' object has no attribute 'test_units'

@tacaswell
Copy link
Member

That helpful exception means that there is an import error in test_units (knowing that cost me too many hours of my life).

@dopplershift
Copy link
Contributor Author

Thanks, turns out I didn't understand about importing mock.

This is catching the problems with pint-like libraries who wrap numpy
arrays.
Even if the argument to atleast_1d is array-compatible and has
sufficient dimensions, it is still converted from its original type to a
numpy array. This is overly aggressive for our purposes (especially
unit support), so replace the direct call with a helper function that
only calls atleast_1d if it's actually necessary.
tacaswell added a commit that referenced this pull request Sep 14, 2015
FIX: unit support with `plot` and `pint`
@tacaswell tacaswell merged commit c27970c into matplotlib:master Sep 14, 2015
@dopplershift dopplershift deleted the fix-plot-units branch September 14, 2015 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. topic: units and array ducktypes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants