Skip to content

ENH: Show subclass type in dtype repr and str of structured arrays #5483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2015

Conversation

ahaldane
Copy link
Member

Hello, this is an enhancement that clarifies (or solves?) #3581.

I discussed it on the mailing list in a message "Re: structured arrays, recarrays, and record arrays" on Jan 19 2015. I didn't get any replies, but hopefully that merely means "no opinion" rather than "bad idea". I though maybe implementing it would draw more interest.

What it does: For structured arrays, if the dtype is not np.void then print the dtype as (base_dtype, dtype). This is the standard form for subclassed dtypes as documented in reference/arrays.dtypes, although that page says this form is discouraged. (Perhaps that discouragement should also be removed?).

This solves the problem that array objects may "look" identical even though their dtypes differ, as happened in issue 3581.

Old Behavior:

>>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')])
>>> np.rec.array(a)
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')])
>>> a.view(np.recarray)
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')])

New Behavior:

>>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')])
>>> np.rec.array(a)
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=(record, [('foo', '<i8'), ('bar', 'S4')]))
>>> a.view(np.recarray)
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')])

Now, probably instead of saying simply "record" in these examples, it should say something like "numpy.record" or "np.record". Any preference?

if(dtype->type_num == NPY_VOID && dtype->typeobj != &PyVoidArrType_Type) {
PyObject *typestr = PyUString_FromString(dtype->typeobj->tp_name);

PyObject *ret = PyUString_FromString("(");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|They all return NULL or -1 if an exception occurs.

I wonder if ret should be checked for NULL? It's possible an error could occur and we wouldn't know it.

@charris
Copy link
Member

charris commented Jan 22, 2015

One thing that is likely a problem is that the dtype string needs to be parseable by the dtype constructor. That would affect pickling and other things.

@charris
Copy link
Member

charris commented Jan 22, 2015

And I think that is why the tests are failing.

@ahaldane
Copy link
Member Author

Yes, sorry, I completely forgot to run the tests!

Indeed, the test fails because "NameError: name 'record' is not defined". I think having it listed as numpy.record will fix this. I will figure out how to do this later.

@ahaldane
Copy link
Member Author

Whew, that was trickier than I expected.

The problem was that dtype->typeobj->tp_name does not reliably give you the name of the class in module.name form (If it did, my code before would have worked). I expected tp_name to be "numpy.core.records.record", but instead it was just "record".

As documented here, tp_name will be of form "module.typename" for (properly created) "statically allocated" types (ie, all the builtin numpy types). In contrast, for "dynamically allocated" types (ie, created in python code), tp_name is simply the typename (no "module.").

As far as I can tell, the reliable way to get a string of form "module.typename" is to get the __module__ and __name__ strings from the object's attributes and paste them together.

I'm pretty sure all the other uses of tp_name in descriptor.c are OK, assuming that user-defined types are always defined in C and are therefore statically allocated.

The next complication is that __module__ and __name__ do not make sense by default for numpy.record and numpy.recarray, so I set them manually in records.py. This also makes the type string look nicer:

Setup:

>>> arr = np.array([(1,'a'), (2,'b')], dtype=[('one', int), ('two', 'S1')])
>>> rec = np.rec.array(arr)

Old:

>>> type(arr), type(rec)
(numpy.ndarray, numpy.core.records.recarray)

New:

>>> type(arr), type(rec)
(numpy.ndarray, numpy.recarray)

Also, I think it's possible that this pull might be a bug fix rather than just an enhancement, because the repr of a dtype does not give back the same dtype without this fix. For example,

Old:

>>> rec.dtype.type
numpy.core.records.record
>>> eval(repr(rec.dtype)).type
numpy.void

New:

>>> rec.dtype.type
numpy.record
>>> eval(repr(rec.dtype)).type
numpy.record

I though this might screw up pickling in the old code, and that pickling+unpickling would cause the numpy.core.records.record type to be lost. But actually pickling retains the type attribute properly, so that's not a problem. I haven't yet found an existing clear case where the missing base_type causes a problem.

@ahaldane ahaldane force-pushed the dtype_showtype branch 4 times, most recently from bae39b0 to 745feb7 Compare January 23, 2015 22:08
@ahaldane
Copy link
Member Author

Thanks @charris for your comments. I've made changes you suggested.

I don't see your comments on this page anymore (I don't know why), but I replied to them. You can read my replies here: ahaldane@d2bf24b

@@ -215,6 +215,10 @@ def _createdescr(self, byteorder):
class record(nt.void):
"""A data-type scalar that allows field access as attribute lookup.
"""

__name__ = 'record'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why this is done. Same below.

@charris
Copy link
Member

charris commented Jan 24, 2015

LGTM modulo comment nitpick.

This is a modification to the dtype str and repr functions what helps
solve numpy#3581.

I discussed it on the mailing list in a message "Re: structured arrays,
recarrays, and record arrays" on Jan 19 2015. I didn't get any replies,
but hopefully that merely means "no opinion" rather than "bad idea".

What it does: For structured arrays, if the dtype is not np.void then
print the dtype as `(base_dtype, dtype)`.

New Behavior:

 >>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')])
 >>> np.rec.array(a)
 rec.array([(1, 'ABC'), (2, 'DEF')],
       dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')]))
 >>> a.view(np.recarray)
 rec.array([(1, 'ABC'), (2, 'DEF')],
       dtype=[('foo', '<i8'), ('bar', 'S4')])
@ahaldane
Copy link
Member Author

Done.

Actually changing the name and module isn't absolutely necessary now that the C code is right, but it's still nicer with them simplified this way.

@charris
Copy link
Member

charris commented Jan 24, 2015

Merged, thanks @ahaldane.

charris added a commit that referenced this pull request Jan 24, 2015
ENH: Show subclass type in dtype repr and str of structured arrays
@charris charris merged commit 7ce93ba into numpy:master Jan 24, 2015
@ahaldane ahaldane deleted the dtype_showtype branch January 25, 2015 19:15
ahaldane added a commit to ahaldane/numpy that referenced this pull request Jan 30, 2015
In numpy#5483, I solved the problem that a
"recarray" and a "record array" (nomenclature defined in
numpy#5482) looked identical by making
sure that a type's subclass was listed in the repr. However, recarrays
are still represented using the function 'rec.array' even though this
function technically creates record arrays, not recarrays.

So I have updated recarray.__repr__.

Setup:

    >>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')])
    >>> recordarr = np.rec.array(a)
    >>> recarr = a.view(np.recarray)

Behavior after numpy#5483:

    >>> recordarr
    rec.array([(1, 'ABC'), (2, 'DEF')],
          dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')]))
    >>> recarr
    rec.array([(1, 'ABC'), (2, 'DEF')],
          dtype=[('foo', '<i8'), ('bar', 'S4')])

New Behavior:

    >>> recordarr
    rec.array([(1, 'ABC'), (2, 'DEF')],
          dtype=[('foo', '<i8'), ('bar', '|S4')])
    >>> recarr
    array([(1, 'ABC'), (2, 'DEF')],
          dtype=[('foo', '<i8'), ('bar', 'S4')]).view(numpy.recarray)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants