ENH: implement voidtype_repr and voidtype_str #8981

ahaldane · 2017-04-23T17:57:47Z

This PR implements voidtype_repr and voidtype_str to output something more sensible.

Currently, unstructured void types print their raw values directly to output, allowing you to do some funny things:

>>> np.array([27, 91, 50, 75,  7, 65, 10, 8, 
              27, 91, 51, 49,109, 82,101,100], dtype='u1').view('V8')
       A
, Red], 
      dtype='|V8')

(if you paste this into an ANSI terminal the output willl be red colored and the terminal will beep)

In this PR I've implemented that the hex representation of the byte-string will be printed:

>>> np.array([27, 91, 50, 75,  7, 65, 10, 8, 
              27, 91, 51, 49,109, 82,101,100], dtype='u1').view('V8')
array([1b5b324b07410a08, 1b5b33316d526564],
      dtype='|V8')

I also toyed with printing something like <memory>, <V8>, <memory at 123456> for the elements. Better suggestions are welcome.

eric-wieser · 2017-04-23T17:59:39Z

This needs some kind of prefix/suffix to make non-alphabetic hex-strings distinguishable from ints.

But hex seems like the right output to me

ahaldane · 2017-04-23T18:04:36Z

Maybe like <1b5b324b07410a08> ?

eric-wieser · 2017-04-23T18:04:47Z

It would also be nice if the repr could be used in eval with suitable globals. So I'd be tentatively in favor of either adding explicit function calls, with a new void(hex=) constructor:

array([void(hex='1b5b324b07410a08'), void(hex='1b5b33316d526564')],
      dtype='|V8')

or just taking advantage of the existing bytes->void conversion, at the cost of verbosity (using uppercase for readability)

array(['\x1B\x5B\x32\x4B\x07\x41\x0A\x08', '\x1B\x5B\x33\x31\x6D\x52\x65\x64' ],
      dtype='|V8')

Of course, str could still be <1b5b324b07410a08>

ahaldane · 2017-04-23T18:06:06Z

Hmm that's pretty good. Let me try that.

eric-wieser · 2017-04-23T18:07:54Z

While we're at it, void(hex='1b5b324b07410a08') would be a lot more readable as void(hex='1b5b324b_07410a08') - some heuristics for inserting underscores or spaces every 4/8/16 chars would help -

juliantaylor · 2017-04-23T18:10:47Z

if its hex you should prefix it with 0x

juliantaylor · 2017-04-23T18:12:17Z

numpy/core/src/multiarray/scalartypes.c.src

+    if (PyDataType_HASFIELDS(s->descr)) {
+        return gentype_repr(self);
+    }
+    return _Py_strhex(s->obval, s->descr->elsize);


as this is private api, does pypy have it?

Probably not.. I was going to update it once we dicided what format we want.

Could fall back on the "invoke something in _internal.py" strategy again - repr doesn't need to be incredibly fast

eric-wieser · 2017-04-23T18:15:56Z

@juliantaylor: I'm not a fan of void(0x12345678). I think we have to show the bytes with the lowest-index at the leftmost of the string. But then np.array(0x12345678).view('V4') would mean something different on little-endian platforms.

So then we're left comparing:

void('0x1b5b324b07410a08')
void(hex='1b5b324b07410a08')
void(hex='0x1b5b324b07410a08')

The first option is surprising, because passing bytes to void (py2) should give a buffer. So we're faced with the last two, and the first is just shorter

juliantaylor · 2017-04-23T18:18:46Z

hm yes as void can represent arbitrary bytes I like the \x representation like for python bytes most.

ahaldane · 2017-04-23T18:19:44Z

I am actually trying out your idea of

array(['\x1b\x5b\x32\x4b\x07\x41\x0a\x08', '\x1b\x5b\x33\x31\x6d\x52\x65\x64' ],
 ...:       dtype='|V8')

I like that because it gives back the original array.

juliantaylor · 2017-04-23T18:48:39Z

numpy/core/src/multiarray/scalartypes.c.src

+    if (PyDataType_HASFIELDS(s->descr)) {
+        return gentype_repr(self);
+    }
+    bytes = gentype_generic_method(self, NULL, NULL, "tobytes");


you can just call PyBytes_FromStringAndSize on obval here (it is defined in our compat headers to work in python2 too)

juliantaylor · 2017-04-23T18:52:13Z

numpy/core/src/multiarray/scalartypes.c.src

+    if (bytes == NULL) {
+        return NULL;
+    }
+    str = PyObject_Str(bytes);


probably the same in this case, but shouldn't this be PyObject_Repr here?

yes, already changed on my side

ahaldane · 2017-04-23T19:12:16Z

I still need to figure out what is going on with 0-d arrays. Apparently they are printed totally differently here which I want to try to fix.

>>> a = np.zeros(4, dtype='V4')
>>> a
array([b'\x00\x00\x00\x00', b'\x00\x00\x00\x00', b'\x00\x00\x00\x00',
       b'\x00\x00\x00\x00'],
      dtype='|V4')
>>> np.array(a[0])
array(array([0, 0, 0, 0], dtype=int8),
      dtype='|V4')

eric-wieser · 2017-04-23T19:14:24Z

numpy/core/src/multiarray/scalartypes.c.src

+    if (bytes == NULL) {
+        return NULL;
+    }
+    repr = PyObject_Repr(bytes);


This isn't enough - I think that this should be our output:

>>> np.array("Hello world").view(np.void) array(b'\x68\x65\x6C\x6C\x6F\x20\x77\x6F\x72\x6C\x64', dtype=void)

Showing ascii characters as text isn't useful - we already have the bytes_ dtype for that

Which you can produce with:

"b'{}'".format(''.join(map(r'\x{:02X}'.format, bytearray( voiditem.tobytes() ))))

eric-wieser · 2017-04-23T19:15:17Z

Apparently they are printed totally differently here

I think that's a result of the void_getitem code - it definitely appears to construct an ndarray in some cases

ahaldane · 2017-04-23T19:18:02Z

I think that's a result of the void_getitem code

Yes, that's right, since item is called in the line I linked above. But why does that line exist?

That code block (for a.shape == ()) seems to exist to avoid a circular import problem involving np.matrix which occurs if it is removed. I'm still working it out.

eric-wieser · 2017-04-23T19:18:53Z

Missed your link there

eric-wieser · 2017-04-23T21:01:26Z

numpy/core/src/multiarray/scalartypes.c.src

+
+    j = 0;
+#if defined(NPY_PY3K)
+    retbuf[j++] = 'b';


I think maybe just do this always - 2.7 supports the syntax, and that saves confusion when from __future__ import unicode_literals is in place. extrachars would go too, as a result

good point, done

eric-wieser · 2017-04-23T21:02:04Z

numpy/core/src/multiarray/scalartypes.c.src

+        c = (argbuf[i] >> 4) & 0xf;
+        retbuf[j++] = Py_hexdigits[c];
+        c = argbuf[i] & 0xf;
+        retbuf[j++] = Py_hexdigits[c];


This is lowercase - which makes separating the hex chars from the x a little harder on the eye

lowercase seems to be the python standard though

If nothing else, I think this function should have a comment pointing out that the string it produces is lowercase.

Uppercase is pretty typical of hex memory viewers, isn't it? If anything, being inconsistent with bytes.__repr__ makes it more obvious to the user that they're looking at a void

eric-wieser · 2017-04-23T21:03:37Z

numpy/core/src/multiarray/scalartypes.c.src

+    if (PyDataType_HASFIELDS(s->descr)) {
+        return gentype_str(self);
+    }
+    return _void_to_hex(s->obval, s->descr->elsize);


Not sure how I feel about void.__str__ looking like byte.__repr__. Seems that here printing out raw hex might be better

In the form void(hex='1b5b324b07410a08'), are you suggesting we modify the void constructor to add a hex argument? If not, I don't like the fact that the repr couldn''t actually be converted to an instance.

Yes, I was indeed suggesting we allow it to take a kwarg. But that was an alternative to the bytes solution you settled for, and not my point here.

Here I'm suggesting something like:

>>> print(repr(v)) '\x12\x34\xAB` >>> print(str(v)) 12 34 AB

Which is sort of consistent with

>>> s = 'hello world' >>> print(repr(v)) 'hello world' >>> print(str(v)) hello world

ahaldane · 2017-04-23T21:05:46Z

After looking at fixing the 0d array problem, it looks like a more general and more difficult problem better fixed in a future other PR.

We seem to have double implementations of many of the dtype reprs: once in scalartypes.c.src and again in arrayprint.py, and missing implementations in one file try to fall back to the other file, which causes infinite recursions if we're not careful.

eric-wieser · 2017-04-23T21:06:32Z

Wonderful. I agree, that can go in another PR

eric-wieser · 2017-04-23T22:08:11Z

numpy/core/src/multiarray/scalartypes.c.src

    if (PyDataType_HASFIELDS(s->descr)) {
        return gentype_str(self);
    }
-    return _void_to_hex(s->obval, s->descr->elsize);
+    return _void_to_hex(s->obval, s->descr->elsize, "0x", "'");


Mismatching quotes?

mhvk · 2017-05-18T14:57:03Z

Saw this as it had drifted up in the sort-by-update list. Nice! I briefly wondered about documentation beyond the release notes, but it seems there are no examples where void arrays are shown.

ahaldane · 2017-05-18T17:06:08Z

Release notes updated

eric-wieser · 2017-05-18T18:12:03Z

doc/release/1.14.0-notes.rst

+printing style for ``void`` datatypes is now customizable
+---------------------------------------------------------
+The printing style of ``np.void`` arrays is customizable using the
+``formatter`` argument to ``np.set_printoptions``, using the ``'void'`` key.


the void type was always customizable, but before it was done through the (still) mis-documented 'numpystr' key

eric-wieser · 2017-05-18T18:13:36Z

numpy/core/tests/test_arrayprint.py

+            r"array([b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'," "\n"
+            r"       b'\x1B\x5B\x33\x31\x6D\x52\x65\x64']," "\n"
+            r"      dtype='|V8')")
+


Can we add an eval(repr(x), vars(np)) == x test, like we have elsewhere, for both scalars and arrays? That's one of the features of this patch too

Misclick

ahaldane · 2017-05-18T20:20:27Z

Good idea, done.

charris · 2017-11-11T23:25:21Z

Needs rebase.

charris · 2017-11-12T15:15:52Z

Sounds like this is ready. Needs a rebase.

ahaldane · 2017-11-12T17:44:58Z

Rebased

charris · 2017-11-12T18:03:30Z

Merged, thanks Allan.

eric-wieser · 2017-11-13T06:52:51Z

numpy/core/shape_base.py

@@ -365,93 +365,78 @@ def stack(arrays, axis=0, out=None):
    return _nx.concatenate(expanded_arrays, axis=axis, out=out)


-def _block_check_depths_match(arrays, parent_index=[]):
+class _Recurser(object):


Uh oh - bad merge / rebase - this just reverted #9667

This restores the changes in numpygh-9667 that were overwritten.

REV: Undo bad rebase in 7fdfdd6 (#8981)

eric-wieser added 01 - Enhancement component: numpy._core labels Apr 23, 2017

juliantaylor reviewed Apr 23, 2017

View reviewed changes

ahaldane force-pushed the void_repr branch from 4aff7e2 to 43552ab Compare April 23, 2017 18:30

juliantaylor reviewed Apr 23, 2017

View reviewed changes

ahaldane force-pushed the void_repr branch 2 times, most recently from 03408ec to fa50ebd Compare April 23, 2017 19:08

eric-wieser reviewed Apr 23, 2017

View reviewed changes

ahaldane force-pushed the void_repr branch from 5b5825b to 49c92bb Compare April 23, 2017 21:00

eric-wieser reviewed Apr 23, 2017

View reviewed changes

ahaldane force-pushed the void_repr branch 3 times, most recently from 25438a2 to 7e39112 Compare April 23, 2017 21:45

eric-wieser reviewed Apr 23, 2017

View reviewed changes

ahaldane force-pushed the void_repr branch 3 times, most recently from 466d620 to 3248ed9 Compare May 18, 2017 17:05

eric-wieser reviewed May 18, 2017

View reviewed changes

eric-wieser previously approved these changes May 18, 2017

View reviewed changes

ahaldane force-pushed the void_repr branch from 3248ed9 to 7f0cfb1 Compare May 18, 2017 20:18

ahaldane force-pushed the void_repr branch from 7f0cfb1 to 5a4f157 Compare May 18, 2017 20:35

ahaldane mentioned this pull request May 19, 2017

WIP: MAINT: print 0d arrays using scalar str/repr #9143

Closed

shoyer mentioned this pull request May 24, 2017

Adding arbitrary object serialization pydata/xarray#1421

Open

4 tasks

ahaldane mentioned this pull request Jun 1, 2017

ENH: remove unneeded spaces in float/bool reprs, fixes 0d str #9139

Merged

ahaldane mentioned this pull request Jul 1, 2017

BUG: np.void(b'test') enters recursion loop in repr #9345

Closed

This was referenced Sep 25, 2017

ENH: fix 0d array printing using str or formatter. #9332

Merged

BUG: void .item() doesn't hold reference to original array #8157

Merged

ahaldane added this to the 1.14.0 release milestone Nov 9, 2017

ENH: print void repr/str using hex notation

7fdfdd6

ahaldane force-pushed the void_repr branch from 5a4f157 to 7fdfdd6 Compare November 12, 2017 16:29

charris merged commit 1204a35 into numpy:master Nov 12, 2017

eric-wieser reviewed Nov 13, 2017

View reviewed changes

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 13, 2017

REV: Undo bad rebase in numpygh-8981 (7fdfdd6)

ae338e4

This restores the changes in numpygh-9667 that were overwritten.

charris added a commit that referenced this pull request Nov 13, 2017

Merge pull request #10017 from eric-wieser/revert-8981-rebase

e06ec34

REV: Undo bad rebase in 7fdfdd6 (#8981)

tylerjereddy mentioned this pull request Aug 28, 2018

MAINT: reduce void type repr code duplication #11830

Merged

Uh oh!

ENH: implement voidtype_repr and voidtype_str #8981

ENH: implement voidtype_repr and voidtype_str #8981

Uh oh!

Conversation

ahaldane commented Apr 23, 2017

Uh oh!

eric-wieser commented Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane commented Apr 23, 2017

Uh oh!

eric-wieser commented Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane commented Apr 23, 2017

Uh oh!

eric-wieser commented Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliantaylor commented Apr 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliantaylor commented Apr 23, 2017

Uh oh!

ahaldane commented Apr 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Apr 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane commented Apr 23, 2017

Uh oh!

eric-wieser commented Apr 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented Apr 23, 2017

eric-wieser commented Apr 23, 2017 •

edited

Loading

eric-wieser commented Apr 23, 2017 •

edited

Loading

eric-wieser commented Apr 23, 2017 •

edited

Loading

eric-wieser Apr 23, 2017 •

edited

Loading

eric-wieser commented Apr 23, 2017 •

edited

Loading

eric-wieser Apr 23, 2017 •

edited

Loading

eric-wieser commented Apr 23, 2017 •

edited

Loading

eric-wieser Apr 23, 2017 •

edited

Loading

ahaldane commented Apr 23, 2017 •

edited

Loading

eric-wieser Nov 13, 2017 •

edited

Loading