-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Show subclass type in dtype repr and str of structured arrays #5483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if(dtype->type_num == NPY_VOID && dtype->typeobj != &PyVoidArrType_Type) { | ||
PyObject *typestr = PyUString_FromString(dtype->typeobj->tp_name); | ||
|
||
PyObject *ret = PyUString_FromString("("); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|They all return NULL or -1 if an exception occurs.
I wonder if ret
should be checked for NULL
? It's possible an error could occur and we wouldn't know it.
One thing that is likely a problem is that the dtype string needs to be parseable by the dtype constructor. That would affect pickling and other things. |
And I think that is why the tests are failing. |
Yes, sorry, I completely forgot to run the tests! Indeed, the test fails because "NameError: name 'record' is not defined". I think having it listed as numpy.record will fix this. I will figure out how to do this later. |
df3db29
to
1d6c3a0
Compare
Whew, that was trickier than I expected. The problem was that As documented here, As far as I can tell, the reliable way to get a string of form "module.typename" is to get the I'm pretty sure all the other uses of The next complication is that Setup:
Old:
New:
Also, I think it's possible that this pull might be a bug fix rather than just an enhancement, because the repr of a dtype does not give back the same dtype without this fix. For example, Old:
New:
I though this might screw up pickling in the old code, and that pickling+unpickling would cause the numpy.core.records.record type to be lost. But actually pickling retains the type attribute properly, so that's not a problem. I haven't yet found an existing clear case where the missing base_type causes a problem. |
bae39b0
to
745feb7
Compare
Thanks @charris for your comments. I've made changes you suggested. I don't see your comments on this page anymore (I don't know why), but I replied to them. You can read my replies here: ahaldane@d2bf24b |
@@ -215,6 +215,10 @@ def _createdescr(self, byteorder): | |||
class record(nt.void): | |||
"""A data-type scalar that allows field access as attribute lookup. | |||
""" | |||
|
|||
__name__ = 'record' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining why this is done. Same below.
LGTM modulo comment nitpick. |
This is a modification to the dtype str and repr functions what helps solve numpy#3581. I discussed it on the mailing list in a message "Re: structured arrays, recarrays, and record arrays" on Jan 19 2015. I didn't get any replies, but hopefully that merely means "no opinion" rather than "bad idea". What it does: For structured arrays, if the dtype is not np.void then print the dtype as `(base_dtype, dtype)`. New Behavior: >>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')]) >>> np.rec.array(a) rec.array([(1, 'ABC'), (2, 'DEF')], dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')])) >>> a.view(np.recarray) rec.array([(1, 'ABC'), (2, 'DEF')], dtype=[('foo', '<i8'), ('bar', 'S4')])
745feb7
to
86efa7d
Compare
Done. Actually changing the name and module isn't absolutely necessary now that the C code is right, but it's still nicer with them simplified this way. |
Merged, thanks @ahaldane. |
ENH: Show subclass type in dtype repr and str of structured arrays
In numpy#5483, I solved the problem that a "recarray" and a "record array" (nomenclature defined in numpy#5482) looked identical by making sure that a type's subclass was listed in the repr. However, recarrays are still represented using the function 'rec.array' even though this function technically creates record arrays, not recarrays. So I have updated recarray.__repr__. Setup: >>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')]) >>> recordarr = np.rec.array(a) >>> recarr = a.view(np.recarray) Behavior after numpy#5483: >>> recordarr rec.array([(1, 'ABC'), (2, 'DEF')], dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')])) >>> recarr rec.array([(1, 'ABC'), (2, 'DEF')], dtype=[('foo', '<i8'), ('bar', 'S4')]) New Behavior: >>> recordarr rec.array([(1, 'ABC'), (2, 'DEF')], dtype=[('foo', '<i8'), ('bar', '|S4')]) >>> recarr array([(1, 'ABC'), (2, 'DEF')], dtype=[('foo', '<i8'), ('bar', 'S4')]).view(numpy.recarray)
Hello, this is an enhancement that clarifies (or solves?) #3581.
I discussed it on the mailing list in a message "Re: structured arrays, recarrays, and record arrays" on Jan 19 2015. I didn't get any replies, but hopefully that merely means "no opinion" rather than "bad idea". I though maybe implementing it would draw more interest.
What it does: For structured arrays, if the dtype is not np.void then print the dtype as
(base_dtype, dtype)
. This is the standard form for subclassed dtypes as documented in reference/arrays.dtypes, although that page says this form is discouraged. (Perhaps that discouragement should also be removed?).This solves the problem that array objects may "look" identical even though their dtypes differ, as happened in issue 3581.
Old Behavior:
New Behavior:
Now, probably instead of saying simply "record" in these examples, it should say something like "numpy.record" or "np.record". Any preference?