Skip to content

Try to get a units converter for a type before falling back to built-in behavior #20502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ryan-gunderson opened this issue Jun 24, 2021 · 7 comments

Comments

@ryan-gunderson
Copy link

ryan-gunderson commented Jun 24, 2021

When given an object to plot, should matplotlib attempt to use a registered units.ConversionInterface for that object's type (and mro) before falling back to other built-in defaults?

This question/suggestion prompted by the following example:

Say one has a class

class DateTimeNano(np.ndarray):
    ...

subclassing np.ndarray with the convention that DateTimeNano stores nano-seconds since epoch. When plotting a DateTimeNano, one may want to make use of the built in utilities for date tick location/formatting. If one writes a units.ConversionInterface like

class DateTimeNanoConverter(units.ConversionInterface):
    NANOS_PER_DAY = 86_400*1_000_000_000
    @staticmethod
    def convert(value, unit, axis):
        # Values should be in floating point days since epoch to play nice with mpl
        day_floats = value / DateTimeNanoConverter.NANOS_PER_DAY 
        return day_floats

    @staticmethod
    def axisinfo(unit, axis):
        majloc = dates.AutoDateLocator()
        majfmt = dates.AutoDateFormatter(majloc)
        return AxisInfo(majloc=majloc, majfmt=majfmt, label='date time')

    @staticmethod
    def default_units(x, axis):
            return 'date_time'

registers it with

units.registry[DateTimeNano] = DateTimeNanoConverter

and then calls

plt.plot(my_date_time_nano, my_y_values)

only DateTimeNanoConverter.default_units and DateTimeNanoConverter.axisinfo are used. DateTimeNanoConverter.convert is skipped entirely as units._is_natively_supported recognizes that it is an iterable and so a converter for the type DateTimeNano is never searched for (it also appears that this would be short circuited in units.Registry.get_converter).

If the step of searching for a converter by iterating over the type's mro were moved up in the order of trying to get values out of an object, this problem would not occur. It also seems like an overall preferred behavior as searching for a way to get an array of values out (using a converter or built-in code paths) should always start with the most specific implementation and work out.
This does not currently happen if types happen to have a particular inheritance structure or set of attributes.

@jklymak
Copy link
Member

jklymak commented Jun 24, 2021

My understanding is that the units registry works on the data type of the first element of the iterable, not the iterable. i.e. my_date_time_nano[0]. So if all your iterables are floats, matplotlib just assumes they are native. I don't think there is a way to get the units machinery to check the class of the container with the current registry, but it is something to consider for the future.

@ryan-gunderson
Copy link
Author

@jklymak That is the current behavior for iterables, yes. For objects not caught by the handful of built-in cases, the converter is used on the object. For example:

class Foo:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c
        return

class Foo:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c
        return

class FooConverter(units.ConversionInterface):
    @staticmethod
    def convert(value, unit, axis):
        xs = [value.a, value.b, value.c]
        return xs

units.registry[Foo] = FooConverter

During a call to plt.scatter(my_foo, my_other_foo), FooConverter.convert is called resulting in plotting three points (however this does not work for plt.plot as the value argument to the convert function is an ndarray in that case; I'm not familiar with why).

Perhaps this is an abuse of the units system and so it may not make sense as part of the units module, but it seems reasonable to be able to, given an arbitrary object, register how it would like to be interpreted for plotting.

@jklymak
Copy link
Member

jklymak commented Jun 24, 2021

I think you may want to create a custom dtype. https://stackoverflow.com/questions/2350072/custom-data-types-in-numpy-arrays

@jklymak
Copy link
Member

jklymak commented Jun 24, 2021

More to your immediate concern, matplotlib can handle nanoseconds from an epoch already using datetime64. Right now our default epoch is 1970-01-01, and nanoseconds in datetime64 have +/-292y of usable range. So unless you plan to do nanoseconds for more than 584 years, something like datetime = nano_seconds.astype('datetime64[ns]')should work with the default locators and formatters.

@ryan-gunderson
Copy link
Author

I agree that casting to datetime64 can solve that particular example. It seems that the units module is not intended to solve the problem of making arbitrary things plot-able. So a separate issue would be appropriate for discussing that. Thanks!

@tacaswell
Copy link
Member

It looks the behavior of looking at _is_natively_supported was added in 3.2 via #13738 which makes it so that we do not get to

for cls in type(x).__mro__: # Look up in the cache.
try:
return self[cls]
except KeyError:
pass
. I think the changes in #13738 inadvertently broke the functionality that @ryan-gunderson is looking for.

@ryan-gunderson
Copy link
Author

Perhaps moving the transversal of the mro to the front would make sense? But I am still unsure of whether or not the units module is intended to handle coercing arbitrary objects into arrays-likes for plotting or just coercing unrecognized scalars to numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants