Skip to content

ENH: implement voidtype_repr and voidtype_str #8981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 12, 2017

Conversation

ahaldane
Copy link
Member

This PR implements voidtype_repr and voidtype_str to output something more sensible.

Currently, unstructured void types print their raw values directly to output, allowing you to do some funny things:

>>> np.array([27, 91, 50, 75,  7, 65, 10, 8, 
              27, 91, 51, 49,109, 82,101,100], dtype='u1').view('V8')
       A
, Red], 
      dtype='|V8')

(if you paste this into an ANSI terminal the output willl be red colored and the terminal will beep)

In this PR I've implemented that the hex representation of the byte-string will be printed:

>>> np.array([27, 91, 50, 75,  7, 65, 10, 8, 
              27, 91, 51, 49,109, 82,101,100], dtype='u1').view('V8')
array([1b5b324b07410a08, 1b5b33316d526564],
      dtype='|V8')

I also toyed with printing something like <memory>, <V8>, <memory at 123456> for the elements. Better suggestions are welcome.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 23, 2017

This needs some kind of prefix/suffix to make non-alphabetic hex-strings distinguishable from ints.

But hex seems like the right output to me

@ahaldane
Copy link
Member Author

Maybe like <1b5b324b07410a08> ?

@eric-wieser
Copy link
Member

eric-wieser commented Apr 23, 2017

It would also be nice if the repr could be used in eval with suitable globals. So I'd be tentatively in favor of either adding explicit function calls, with a new void(hex=) constructor:

array([void(hex='1b5b324b07410a08'), void(hex='1b5b33316d526564')],
      dtype='|V8')

or just taking advantage of the existing bytes->void conversion, at the cost of verbosity (using uppercase for readability)

array(['\x1B\x5B\x32\x4B\x07\x41\x0A\x08', '\x1B\x5B\x33\x31\x6D\x52\x65\x64' ],
      dtype='|V8')

Of course, str could still be <1b5b324b07410a08>

@ahaldane
Copy link
Member Author

Hmm that's pretty good. Let me try that.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 23, 2017

While we're at it, void(hex='1b5b324b07410a08') would be a lot more readable as void(hex='1b5b324b_07410a08') - some heuristics for inserting underscores or spaces every 4/8/16 chars would help -

@juliantaylor
Copy link
Contributor

if its hex you should prefix it with 0x

if (PyDataType_HASFIELDS(s->descr)) {
return gentype_repr(self);
}
return _Py_strhex(s->obval, s->descr->elsize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is private api, does pypy have it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not.. I was going to update it once we dicided what format we want.

Copy link
Member

@eric-wieser eric-wieser Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could fall back on the "invoke something in _internal.py" strategy again - repr doesn't need to be incredibly fast

@eric-wieser
Copy link
Member

eric-wieser commented Apr 23, 2017

@juliantaylor: I'm not a fan of void(0x12345678). I think we have to show the bytes with the lowest-index at the leftmost of the string. But then np.array(0x12345678).view('V4') would mean something different on little-endian platforms.

So then we're left comparing:

  • void('0x1b5b324b07410a08')
  • void(hex='1b5b324b07410a08')
  • void(hex='0x1b5b324b07410a08')

The first option is surprising, because passing bytes to void (py2) should give a buffer. So we're faced with the last two, and the first is just shorter

@juliantaylor
Copy link
Contributor

hm yes as void can represent arbitrary bytes I like the \x representation like for python bytes most.

@ahaldane
Copy link
Member Author

I am actually trying out your idea of

array(['\x1b\x5b\x32\x4b\x07\x41\x0a\x08', '\x1b\x5b\x33\x31\x6d\x52\x65\x64' ],
 ...:       dtype='|V8')

I like that because it gives back the original array.

if (PyDataType_HASFIELDS(s->descr)) {
return gentype_repr(self);
}
bytes = gentype_generic_method(self, NULL, NULL, "tobytes");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just call PyBytes_FromStringAndSize on obval here (it is defined in our compat headers to work in python2 too)

if (bytes == NULL) {
return NULL;
}
str = PyObject_Str(bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably the same in this case, but shouldn't this be PyObject_Repr here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, already changed on my side

@ahaldane ahaldane force-pushed the void_repr branch 2 times, most recently from 03408ec to fa50ebd Compare April 23, 2017 19:08
@ahaldane
Copy link
Member Author

I still need to figure out what is going on with 0-d arrays. Apparently they are printed totally differently here which I want to try to fix.

>>> a = np.zeros(4, dtype='V4')
>>> a
array([b'\x00\x00\x00\x00', b'\x00\x00\x00\x00', b'\x00\x00\x00\x00',
       b'\x00\x00\x00\x00'],
      dtype='|V4')
>>> np.array(a[0])
array(array([0, 0, 0, 0], dtype=int8),
      dtype='|V4')

if (bytes == NULL) {
return NULL;
}
repr = PyObject_Repr(bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't enough - I think that this should be our output:

>>> np.array("Hello world").view(np.void)
array(b'\x68\x65\x6C\x6C\x6F\x20\x77\x6F\x72\x6C\x64', dtype=void)

Showing ascii characters as text isn't useful - we already have the bytes_ dtype for that

Copy link
Member

@eric-wieser eric-wieser Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which you can produce with:

"b'{}'".format(''.join(map(r'\x{:02X}'.format, bytearray( voiditem.tobytes() ))))

@eric-wieser
Copy link
Member

eric-wieser commented Apr 23, 2017

Apparently they are printed totally differently here

I think that's a result of the void_getitem code - it definitely appears to construct an ndarray in some cases

@ahaldane
Copy link
Member Author

I think that's a result of the void_getitem code

Yes, that's right, since item is called in the line I linked above. But why does that line exist?

That code block (for a.shape == ()) seems to exist to avoid a circular import problem involving np.matrix which occurs if it is removed. I'm still working it out.

@eric-wieser
Copy link
Member

Missed your link there


j = 0;
#if defined(NPY_PY3K)
retbuf[j++] = 'b';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe just do this always - 2.7 supports the syntax, and that saves confusion when from __future__ import unicode_literals is in place. extrachars would go too, as a result

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done

c = (argbuf[i] >> 4) & 0xf;
retbuf[j++] = Py_hexdigits[c];
c = argbuf[i] & 0xf;
retbuf[j++] = Py_hexdigits[c];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lowercase - which makes separating the hex chars from the x a little harder on the eye

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase seems to be the python standard though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nothing else, I think this function should have a comment pointing out that the string it produces is lowercase.

Uppercase is pretty typical of hex memory viewers, isn't it? If anything, being inconsistent with bytes.__repr__ makes it more obvious to the user that they're looking at a void

if (PyDataType_HASFIELDS(s->descr)) {
return gentype_str(self);
}
return _void_to_hex(s->obval, s->descr->elsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about void.__str__ looking like byte.__repr__. Seems that here printing out raw hex might be better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the form void(hex='1b5b324b07410a08'), are you suggesting we modify the void constructor to add a hex argument? If not, I don't like the fact that the repr couldn''t actually be converted to an instance.

Copy link
Member

@eric-wieser eric-wieser Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was indeed suggesting we allow it to take a kwarg. But that was an alternative to the bytes solution you settled for, and not my point here.

Here I'm suggesting something like:

>>> print(repr(v))
'\x12\x34\xAB`
>>> print(str(v))
12 34 AB

Which is sort of consistent with

>>> s = 'hello world'
>>> print(repr(v))
'hello world'
>>> print(str(v))
hello world

@ahaldane
Copy link
Member Author

ahaldane commented Apr 23, 2017

After looking at fixing the 0d array problem, it looks like a more general and more difficult problem better fixed in a future other PR.

We seem to have double implementations of many of the dtype reprs: once in scalartypes.c.src and again in arrayprint.py, and missing implementations in one file try to fall back to the other file, which causes infinite recursions if we're not careful.

@eric-wieser
Copy link
Member

Wonderful. I agree, that can go in another PR

@ahaldane ahaldane force-pushed the void_repr branch 3 times, most recently from 25438a2 to 7e39112 Compare April 23, 2017 21:45
if (PyDataType_HASFIELDS(s->descr)) {
return gentype_str(self);
}
return _void_to_hex(s->obval, s->descr->elsize);
return _void_to_hex(s->obval, s->descr->elsize, "0x", "'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatching quotes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh oops

@mhvk
Copy link
Contributor

mhvk commented May 18, 2017

Saw this as it had drifted up in the sort-by-update list. Nice! I briefly wondered about documentation beyond the release notes, but it seems there are no examples where void arrays are shown.

@ahaldane ahaldane force-pushed the void_repr branch 3 times, most recently from 466d620 to 3248ed9 Compare May 18, 2017 17:05
@ahaldane
Copy link
Member Author

Release notes updated

printing style for ``void`` datatypes is now customizable
---------------------------------------------------------
The printing style of ``np.void`` arrays is customizable using the
``formatter`` argument to ``np.set_printoptions``, using the ``'void'`` key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the void type was always customizable, but before it was done through the (still) mis-documented 'numpystr' key

eric-wieser
eric-wieser previously approved these changes May 18, 2017
r"array([b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'," "\n"
r" b'\x1B\x5B\x33\x31\x6D\x52\x65\x64']," "\n"
r" dtype='|V8')")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an eval(repr(x), vars(np)) == x test, like we have elsewhere, for both scalars and arrays? That's one of the features of this patch too

@ahaldane
Copy link
Member Author

Good idea, done.

@charris
Copy link
Member

charris commented Nov 11, 2017

Needs rebase.

@charris
Copy link
Member

charris commented Nov 12, 2017

Sounds like this is ready. Needs a rebase.

@ahaldane
Copy link
Member Author

Rebased

@charris charris merged commit 1204a35 into numpy:master Nov 12, 2017
@charris
Copy link
Member

charris commented Nov 12, 2017

Merged, thanks Allan.

@@ -365,93 +365,78 @@ def stack(arrays, axis=0, out=None):
return _nx.concatenate(expanded_arrays, axis=axis, out=out)


def _block_check_depths_match(arrays, parent_index=[]):
class _Recurser(object):
Copy link
Member

@eric-wieser eric-wieser Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh - bad merge / rebase - this just reverted #9667

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 13, 2017
This restores the changes in numpygh-9667 that were overwritten.
charris added a commit that referenced this pull request Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants