-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: implement voidtype_repr and voidtype_str #8981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This needs some kind of prefix/suffix to make non-alphabetic hex-strings distinguishable from ints. But hex seems like the right output to me |
Maybe like |
It would also be nice if the
or just taking advantage of the existing bytes->void conversion, at the cost of verbosity (using uppercase for readability)
Of course, |
Hmm that's pretty good. Let me try that. |
While we're at it, |
if its hex you should prefix it with |
if (PyDataType_HASFIELDS(s->descr)) { | ||
return gentype_repr(self); | ||
} | ||
return _Py_strhex(s->obval, s->descr->elsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this is private api, does pypy have it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not.. I was going to update it once we dicided what format we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could fall back on the "invoke something in _internal.py
" strategy again - repr doesn't need to be incredibly fast
@juliantaylor: I'm not a fan of So then we're left comparing:
The first option is surprising, because passing bytes to void (py2) should give a buffer. So we're faced with the last two, and the first is just shorter |
hm yes as void can represent arbitrary bytes I like the |
I am actually trying out your idea of
I like that because it gives back the original array. |
if (PyDataType_HASFIELDS(s->descr)) { | ||
return gentype_repr(self); | ||
} | ||
bytes = gentype_generic_method(self, NULL, NULL, "tobytes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just call PyBytes_FromStringAndSize
on obval here (it is defined in our compat headers to work in python2 too)
if (bytes == NULL) { | ||
return NULL; | ||
} | ||
str = PyObject_Str(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably the same in this case, but shouldn't this be PyObject_Repr
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, already changed on my side
03408ec
to
fa50ebd
Compare
I still need to figure out what is going on with 0-d arrays. Apparently they are printed totally differently here which I want to try to fix. >>> a = np.zeros(4, dtype='V4')
>>> a
array([b'\x00\x00\x00\x00', b'\x00\x00\x00\x00', b'\x00\x00\x00\x00',
b'\x00\x00\x00\x00'],
dtype='|V4')
>>> np.array(a[0])
array(array([0, 0, 0, 0], dtype=int8),
dtype='|V4') |
if (bytes == NULL) { | ||
return NULL; | ||
} | ||
repr = PyObject_Repr(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't enough - I think that this should be our output:
>>> np.array("Hello world").view(np.void)
array(b'\x68\x65\x6C\x6C\x6F\x20\x77\x6F\x72\x6C\x64', dtype=void)
Showing ascii characters as text isn't useful - we already have the bytes_
dtype for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which you can produce with:
"b'{}'".format(''.join(map(r'\x{:02X}'.format, bytearray( voiditem.tobytes() ))))
I think that's a result of the |
Yes, that's right, since That code block (for |
Missed your link there |
|
||
j = 0; | ||
#if defined(NPY_PY3K) | ||
retbuf[j++] = 'b'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe just do this always - 2.7 supports the syntax, and that saves confusion when from __future__ import unicode_literals
is in place. extrachars
would go too, as a result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, done
c = (argbuf[i] >> 4) & 0xf; | ||
retbuf[j++] = Py_hexdigits[c]; | ||
c = argbuf[i] & 0xf; | ||
retbuf[j++] = Py_hexdigits[c]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is lowercase - which makes separating the hex chars from the x
a little harder on the eye
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase seems to be the python standard though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing else, I think this function should have a comment pointing out that the string it produces is lowercase.
Uppercase is pretty typical of hex memory viewers, isn't it? If anything, being inconsistent with bytes.__repr__
makes it more obvious to the user that they're looking at a void
if (PyDataType_HASFIELDS(s->descr)) { | ||
return gentype_str(self); | ||
} | ||
return _void_to_hex(s->obval, s->descr->elsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I feel about void.__str__
looking like byte.__repr__
. Seems that here printing out raw hex might be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the form void(hex='1b5b324b07410a08')
, are you suggesting we modify the void constructor to add a hex
argument? If not, I don't like the fact that the repr couldn''t actually be converted to an instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was indeed suggesting we allow it to take a kwarg. But that was an alternative to the bytes solution you settled for, and not my point here.
Here I'm suggesting something like:
>>> print(repr(v))
'\x12\x34\xAB`
>>> print(str(v))
12 34 AB
Which is sort of consistent with
>>> s = 'hello world'
>>> print(repr(v))
'hello world'
>>> print(str(v))
hello world
After looking at fixing the 0d array problem, it looks like a more general and more difficult problem better fixed in a future other PR. We seem to have double implementations of many of the dtype reprs: once in |
Wonderful. I agree, that can go in another PR |
25438a2
to
7e39112
Compare
if (PyDataType_HASFIELDS(s->descr)) { | ||
return gentype_str(self); | ||
} | ||
return _void_to_hex(s->obval, s->descr->elsize); | ||
return _void_to_hex(s->obval, s->descr->elsize, "0x", "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatching quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh oops
Saw this as it had drifted up in the sort-by-update list. Nice! I briefly wondered about documentation beyond the release notes, but it seems there are no examples where void arrays are shown. |
466d620
to
3248ed9
Compare
Release notes updated |
doc/release/1.14.0-notes.rst
Outdated
printing style for ``void`` datatypes is now customizable | ||
--------------------------------------------------------- | ||
The printing style of ``np.void`` arrays is customizable using the | ||
``formatter`` argument to ``np.set_printoptions``, using the ``'void'`` key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the void type was always customizable, but before it was done through the (still) mis-documented 'numpystr' key
r"array([b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'," "\n" | ||
r" b'\x1B\x5B\x33\x31\x6D\x52\x65\x64']," "\n" | ||
r" dtype='|V8')") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an eval(repr(x), vars(np)) == x
test, like we have elsewhere, for both scalars and arrays? That's one of the features of this patch too
Good idea, done. |
Needs rebase. |
Sounds like this is ready. Needs a rebase. |
Rebased |
Merged, thanks Allan. |
@@ -365,93 +365,78 @@ def stack(arrays, axis=0, out=None): | |||
return _nx.concatenate(expanded_arrays, axis=axis, out=out) | |||
|
|||
|
|||
def _block_check_depths_match(arrays, parent_index=[]): | |||
class _Recurser(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh oh - bad merge / rebase - this just reverted #9667
This restores the changes in numpygh-9667 that were overwritten.
This PR implements
voidtype_repr
andvoidtype_str
to output something more sensible.Currently, unstructured void types print their raw values directly to output, allowing you to do some funny things:
(if you paste this into an ANSI terminal the output willl be red colored and the terminal will beep)
In this PR I've implemented that the hex representation of the byte-string will be printed:
I also toyed with printing something like
<memory>
,<V8>
,<memory at 123456>
for the elements. Better suggestions are welcome.