ENH/API: Change flexible types to indicate resizability with elsize == -1 #8970

eric-wieser · 2017-04-21T11:41:27Z

This means that zero-size strings/unicode/void are now properly supported - in that S0, V0, and U0 always mean zero-size, not resizable. Follows from #6430, cc @embray

Some changes resulting from this:

np.dtype('S') and np.dtype('S0') now mean different things - "resizable" and "empty"
np.dtype(x).itemsize is None when x is: str, bytes, unicode, np.void, 'S', 'U', 'V'
Previously, it was 0
np.dtype([('a', float), ('b', str)]) is now a ValueError
Previously, it implied np.dtype([('a', float), ('b', str, 0)]), with the last field as size 0, likely masking bugs. This explicit form works without errors
~~np.empty(10, str) now has dtype S0, not S1.~~ - removed since this change gains nothing
We could be a little stricter here, and make it a ValueError as above

For the last two, we could take a more gradual deprecation path if necessary, keeping the existing behaviour but with a FutureWarning

~~It's probably worth adding a #define PyDescr_Resizable -1 somewhere, and using that everywhere - but I'm sure of the best name, nor the best file to put it in~~

embray · 2017-04-21T11:59:04Z

I think this all makes sense to me. I never paid as close attention to the cases of using str and bytes in constructing dtypes since it's a bit too vague and inexplicit. But as long as that's to be supported I think these semantics make sense.

juliantaylor · 2017-04-21T12:59:12Z

numpy/core/src/multiarray/ctors.c

+            return NULL;
+        }
+        PyTuple_SET_ITEM(tuple, 0, type);
+        PyTuple_SET_ITEM(tuple, 1, one);


this should be just Py_BuildValue('(Oi)', type, 1)

Yeah, I did it this way for speed - perhaps premature optimization

type needs casting to PyObject then.
The error checking above, while it won#t happen in practice may throw errors with static reference checker

also for speed PyTuple_Pack is usually good

Good catch. Think I might go for Py_BuildValue anyway here, since creating zeros of string type doesn't sound like something that should be a bottleneck

I'd be surprised if what I have here is slower than PyTuple_Pack, since that at best does a loop with these lines in

PyTuple_Pack is (marginally) slower than directly setting a few entries but faster than Py_BuildValue if you already have objects.
Here it would be useful to save some lines of code, this doesn't seem to be a performance relevant path to me.

Alright, changed to use Py_BuildValue("Ni", (PyObject*)type, 1)

juliantaylor · 2017-04-21T13:02:11Z

doc/release/1.13.0-notes.rst

+``PyArray_Descr.elsize`` is now ``-1`` for unsized flexible dtypes
+------------------------------------------------------------------
+Previously it was ``0`` - but that made it impossible to distinguish unsized
+from sized-to-0.


adding and example and dtypes is better, e.g. unsized dtypes (``S``) from sized-to-zero dtypes (``S0``)

it should also mention that the python-side returns None for unsized dtypes.

Is this too breaking a change? Any C code that constructs flexible dtypes manually is in for a bad time - but hopefully nothing ever should, since PyArray_DescrFromTypeObject(Py_String) etc already do the right things

yes it previously returned a number now it doesn't. Likely not a big deal in practice but it should be mentioned.

Sorry, I mean from the C api perspective. But yes, I agree it at the very least needs to be in the release notes

Yes it has the potential to break others code that assumes itemsize is always positive.
But at least it should be an easy to detect error as it will probably crash by trying to allocate 2GiB of memory.

I was thinking code that assumed setting elsize to 0 would allow it to resize would be surprised when it no longer does

juliantaylor · 2017-04-21T13:04:21Z

numpy/core/src/multiarray/ctors.c

+        PyTuple_SET_ITEM(tuple, 0, type);
+        PyTuple_SET_ITEM(tuple, 1, one);
+
+        status = PyArray_DescrConverter(tuple, &type);


does this work with overlapping input and output?

These don't overlap - tuple contains a copy of the value in type, not a pointer to the variable &type

really? I though setitem will just increase the reference count of type by 1 not actually make a copy of it.

SET_ITEM actually doesn't touch the reference count at all - it steals the reference.

But SetItem copies type but increases the reference count of *type. We would be free to set type = NULL after adding it to the tuple, and the tuple would still contain the right thing

yes but PyArray_DescrConverter would still be working inplace, its output type is the same object as its input type. It probably works fine but that should be checked.

No it isn't, because PyArray_DescrConverter does essentially *(&type) = CreateNewDescr. It never touches the original object - because normally that original object is NULL

if it does this it doesn't work because creating a new type into type overwrites the input before it is read.

oh it does work as its &type not type never mind then

No it doesn't, because the input is no longer the type variable - it's the copy of the pointer the type variable held, which is stored inside tuple. So for example

type = (PyObject *) 0x00001230 Tuple_SETITEM(tuple, 0, type); Tuple_GETITEM(tuple, 0); // returns (PyObject *) 0x00001230 type = NULL Tuple_GETITEM(tuple, 0); // still returns (PyObject *) 0x00001230 - we didn't store a pointer to type

juliantaylor

looks good with some minor comments

eric-wieser · 2017-04-21T13:12:00Z

Any suggestions for what to call PyDescr_Resizable, and where to put it?

juliantaylor · 2017-04-21T13:19:56Z

hm it seems to crash astropys testsuite which would make it a too invasive change for now.

eric-wieser · 2017-04-21T13:20:52Z

Which part causes the crash? It might be that we can just warn and keep the old behaviour in those places

juliantaylor · 2017-04-21T13:23:21Z

no clue, astropys test system is not exactly easy to use

eric-wieser · 2017-04-21T13:28:13Z

Comments addressed, hopefully.

@mhvk: Mind taking a look at the astropy failure, and seeing whether it can be worked around here?

juliantaylor · 2017-04-21T13:30:05Z

seems to be division by 0 in numpy/core/src/multiarray/methods.c:1672

eric-wieser · 2017-04-21T13:30:50Z

Nice catch! There are more of those elsewhere too...

MSeifert04 · 2017-04-21T13:39:58Z

no clue, astropys test system is not exactly easy to use

If you feel it's too complicated please open an issue on the astropy tracker or on the astropy-dev mailing list. It would be definetly helpful to get this kind of feedback - at least I would like to know what makes it "not easy to use". :)

Sorry for being off-topic here.

juliantaylor · 2017-04-21T13:43:36Z

@MSeifert04
pytests just eats up crashes so you can't debug (probably via forking so you could via inferiors but that is annoying)
python astropy/io/ascii/tests/test_c_reader.py and submodule tests python -c "import astropy.io; astropy.io.test()" don't work

mhvk · 2017-04-21T14:23:07Z

@eric-wieser - I cannot seem to install this branch:

numpy/core/src/multiarray/ctors.c: In function ‘PyArray_Zeros’:
numpy/core/src/multiarray/ctors.c:2852:41: warning: multi-character character constant [-Wmultichar]
         PyObject *tuple = Py_BuildValue('Ni', (PyObject *)type, 1);
                                         ^~~~
numpy/core/src/multiarray/ctors.c:2852:41: warning: passing argument 1 of ‘_Py_BuildValue_SizeT’ makes pointer from integer without a cast [-Wint-conversion]
In file included from /usr/include/python3.5m/Python.h:114:0,
                 from numpy/core/src/multiarray/ctors.c:2:
/usr/include/python3.5m/modsupport.h:35:24: note: expected ‘const char *’ but argument is of type ‘int’
 PyAPI_FUNC(PyObject *) _Py_BuildValue_SizeT(const char *, ...);
                        ^~~~~~~~~~~~~~~~~~~~
numpy/core/src/multiarray/ctors.c:2856:9: error: ‘status’ undeclared (first use in this function)
         status = PyArray_DescrConverter(tuple, &type);
         ^~~~~~
numpy/core/src/multiarray/ctors.c:2856:9: note: each undeclared identifier is reported only once for each function it appears in

mhvk · 2017-04-21T14:23:29Z

@juliantaylor - astropy uses pytest, calling parts of tests is done differently:

python3 setup.py test -P io
python3 setup.py test -t astropy/io/ascii/tests/test_c_reader.py

eric-wieser · 2017-04-21T14:25:01Z

@mhvk: fixed. Note that this last commit is fixing bugs that were already present, just really hard to come across before this changeset

juliantaylor · 2017-04-21T14:41:32Z

numpy/core/src/multiarray/ctors.c

            /*
              Grow PyArray_DATA(ret):
              this is similar for the strategy for PyListObject, but we use
              50% overallocation => 0, 4, 8, 14, 23, 36, 56, 86 ...
            */
            elcount = (i >> 1) + (i < 4 ? 4 : 2) + i;
-            if (elcount <= NPY_MAX_INTP/elsize) {
-                new_data = PyDataMem_RENEW(PyArray_DATA(ret), elcount * elsize);
+            if (!npy_mul_with_overflow_intp(elcount, elsize, &nbytes)) {


your mul_with_overflow are wrong, its *result, a, b

Alarming that runtest.py -b on windows does not warn me about that

our travis warning check catches it

Indeed, but that's pretty late in the pipeline, and it runs a bunch of other tests first

eric-wieser · 2017-04-23T10:17:29Z

(rebased on top of #8977)

homu · 2017-04-23T15:19:15Z

☔ The latest upstream changes (presumably #8971) made this pull request unmergeable. Please resolve the merge conflicts.

This allows us to change how flexible types with no length are represented in future, to allow zero-size dtypes (numpy#8970).

eric-wieser · 2017-11-03T09:17:15Z

(Rebased on top of #9953)

mattip · 2018-08-07T16:34:26Z

additional rebase needed

This allows empty strings to be unambigiously specified. Unsized strings continue to promote to single-character strings.

This argument is no longer used

Fixes numpy#8969

eric-wieser added the component: numpy._core label Apr 21, 2017

eric-wieser requested a review from juliantaylor April 21, 2017 11:42

eric-wieser mentioned this pull request Apr 21, 2017

Fix issues with zero-width string fields #6430

Merged

eric-wieser mentioned this pull request Apr 21, 2017

BUG: do not change size 0 description when viewing data #8971

Merged

eric-wieser force-pushed the dtype-S0 branch 3 times, most recently from 977268a to 0d96866 Compare April 21, 2017 12:44

juliantaylor reviewed Apr 21, 2017

View reviewed changes

juliantaylor previously approved these changes Apr 21, 2017

View reviewed changes

eric-wieser force-pushed the dtype-S0 branch from 0d96866 to c8b5586 Compare April 21, 2017 13:26

eric-wieser force-pushed the dtype-S0 branch from c8b5586 to 8384a96 Compare April 21, 2017 14:24

juliantaylor reviewed Apr 21, 2017

View reviewed changes

eric-wieser force-pushed the dtype-S0 branch 3 times, most recently from 14e5c4e to 9f746e6 Compare April 21, 2017 15:28

eric-wieser force-pushed the dtype-S0 branch from c55def7 to 0acdbd6 Compare April 23, 2017 10:16

mhvk mentioned this pull request May 2, 2017

Emit string truncation warnings for MaskedColumn astropy/astropy#5819

Merged

eric-wieser mentioned this pull request Nov 3, 2017

MAINT: Add a PyDataType_ISUNSIZED macro #9953

Merged

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 3, 2017

MAINT: Add a PyDataType_ISUNSIZED macro

7cfbaf6

This allows us to change how flexible types with no length are represented in future, to allow zero-size dtypes (numpy#8970).

eric-wieser force-pushed the dtype-S0 branch from 0acdbd6 to d9b6dba Compare November 3, 2017 09:16

eric-wieser mentioned this pull request Jan 4, 2019

BUG: Fix reference counting for subarrays containing objects #12650

Merged

eric-wieser added 4 commits October 10, 2020 14:49

API: Use elsize == -1 to indicate an unsized flexible dtype

c9cd8f8

This allows empty strings to be unambigiously specified. Unsized strings continue to promote to single-character strings.

ENH: Make .itemsize return none when .elsize == -1

28858d5

MAINT: Remove allow_emptystring from PyArray_NewFromDescr_int

67f8fe7

This argument is no longer used

BUG: Disallow flexible dtypes in compound types

083bc1a

Fixes numpy#8969

eric-wieser force-pushed the dtype-S0 branch 2 times, most recently from d9b2d23 to bc393a8 Compare October 10, 2020 14:09

eric-wieser added 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. 63 - C API Changes or additions to the C API. Mailing list should usually be notified. component: numpy.dtype and removed 55 - Needs work labels Oct 10, 2020

Merge remote-tracking branch 'upstream/master' into dtype-S0

7a705cb

eric-wieser force-pushed the dtype-S0 branch from bc393a8 to 7a705cb Compare October 11, 2020 12:18

Base automatically changed from master to main March 4, 2021 02:03

charris added the 52 - Inactive Pending author response label Apr 6, 2022

charris closed this Apr 6, 2022

Uh oh!

ENH/API: Change flexible types to indicate resizability with elsize == -1 #8970

ENH/API: Change flexible types to indicate resizability with elsize == -1 #8970

Uh oh!

Conversation

eric-wieser commented Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

embray commented Apr 21, 2017

Uh oh!

juliantaylor Apr 21, 2017 • edited by eric-wieser Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliantaylor left a comment

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Apr 21, 2017

Uh oh!

juliantaylor commented Apr 21, 2017

Uh oh!

eric-wieser commented Apr 21, 2017

eric-wieser commented Apr 21, 2017 •

edited

Loading

juliantaylor Apr 21, 2017 •

edited by eric-wieser

Loading

eric-wieser Apr 21, 2017 •

edited

Loading

eric-wieser Apr 21, 2017 •

edited

Loading

eric-wieser Apr 21, 2017 •

edited

Loading

eric-wieser Apr 21, 2017 •

edited

Loading

eric-wieser Apr 21, 2017 •

edited

Loading

eric-wieser Apr 21, 2017 •

edited

Loading

eric-wieser Apr 21, 2017 •

edited

Loading

eric-wieser commented Apr 21, 2017 •

edited

Loading

juliantaylor commented Apr 21, 2017 •

edited

Loading

eric-wieser commented Apr 21, 2017 •

edited

Loading