-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
NEP: issue deprecation warning when creating ragged array (NEP 34) #15119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c6083ad
to
bc04f1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nitpicks. Others may have opinions about using context
for this purpose, but it is convenient. Mainly waiting for a clear decision the whole thing I guess.
waiting on gh-15083 to resolve the name and need for the singleton |
It turns out downstream libraries (I checked pandas, scipy) adapted to NEP 34 even without the sentinel. Once this is ready and reviewed, I would like to try again to see if it causes less noise this time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (minor things). IIRC this is pretty much what we had before, right? We probably still need to (re-) add the release notes?
numpy/core/src/multiarray/ctors.c
Outdated
if (is_object) { | ||
if (is_object != DISCOVERED_OK) { | ||
if (is_object == DISCOVERED_RAGGED && requested_dtype == NULL) { | ||
/* NumPy 1.18, 2019-11-01 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 19 now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was missed? I can fix it up though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for dragging this out, but I am considering if we should try to spell out/target the error message for the unexperienced user who likely just got it by accident. We could even add that other tools may exist to handle such data, but not sure about that?
numpy/core/src/multiarray/ctors.c
Outdated
if (is_object) { | ||
if (is_object != DISCOVERED_OK) { | ||
if (is_object == DISCOVERED_RAGGED && requested_dtype == NULL) { | ||
/* NumPy 1.18, 2019-11-01 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was missed? I can fix it up though.
numpy/core/src/multiarray/ctors.c
Outdated
/* NumPy 1.18, 2019-11-01 */ | ||
if (DEPRECATE("Creating an ndarray from ragged nested sequences " | ||
"and automatic object dtype detection is deprecated. Use " | ||
"dtype=object or an exact dtype") < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that Eric pointed out that we may want to include "ragged", I am thinking we should actually bloat this up a lot more. Maybe even create a VisibleDeprecationWarning
(although that is a bit annoying).
The reason is that we are currently aiming the error at users who want to keep the old behaviour, when actually most users who see it are likely people who simply mistyped and thus have a bug. For them "ragged" is probably not helpful except as a google term.
It is a bit long I know, but am considering something like:
You are attempting to create an array, or doing an array operation, with uneven shape (ragged). This is not supported and may be an input error such as
np.array([[1], [1, 2]])
where the first list is shorter than the second. If you wish to create an array containing sequences of different length you must create an array withdtype=object
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a VisibleDeprecationWarning
, extended the message, rebased off master, and fixed the deprecation version in e11d57a.
Although in my mind the difference between a VisibleDeprecationWarning and an Error is negligible: emitting a VisibleDeprecationWarning is so annoying it means there is no real deprecation period, users will, in practice, have to fix their code immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching for "ragged nested sequences numpy" the first hit is NEP 34, so I think this text is good enough.
Should be squashed before merging so we can revert a single changeset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the googlebility of "ragged nested sequences". I am OK with this, just thoughts. About the VisibleDeprecationWarning
if anyone else has an opinion, please just note. I think I am still a little bit in favor: The assumption is that in the vast majority of (non-library) use cases, it currently is actually a silent error. If anyone thinks that is not true, we should first do the normal deprecation.
if (is_object) { | ||
if (is_object != DISCOVERED_OK) { | ||
static PyObject * visibleDeprecationWarning = NULL; | ||
if (visibleDeprecationWarning == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npy_cache_import
includes this if (and is guaranteed to be inlined)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing
numpy/core/src/multiarray/ctors.c
Outdated
if (is_object == DISCOVERED_RAGGED && requested_dtype == NULL) { | ||
/* NumPy 1.19, 2019-11-01 */ | ||
if (PyErr_WarnEx(visibleDeprecationWarning, "Creating an " | ||
" ndarray from ragged nested sequences (which is a " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" ndarray from ragged nested sequences (which is a " | |
"ndarray from ragged nested sequences (which is a " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, forgot to press submit: Maybe rather than the bracketed explanation, put in the example: such as '[[1], [2, 3]]' where '[2, 3]' is longer than '[1]'
? I would hope that is more useful.
Not sure though, and I am fine with it as is (we can always revise as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing
Squashed to one commit |
Thanks matti, let me put this in. I feel the error message could be nicer, but do not have a good idea right now and its easy to fix up. (Will squash some whitespace ony changes, so not waiting for it). |
…cation dimension=None makes no forced dimension conversion handle numpy.VisibleDeprecationWarning and future ValueError produced by creating a numpy.ndarray with elements of varying shape or type introduced by numpy/numpy#15119 Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
…cation dimension=None makes no forced dimension conversion handle numpy.VisibleDeprecationWarning and future ValueError produced by creating a numpy.ndarray with elements of varying shape or type introduced by numpy/numpy#15119 Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
…cation dimension=None makes no forced dimension conversion handle numpy.VisibleDeprecationWarning and future ValueError produced by creating a numpy.ndarray with elements of varying shape or type introduced by numpy/numpy#15119 Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
…cation dimension=None makes no forced dimension conversion handle numpy.VisibleDeprecationWarning and future ValueError produced by creating a numpy.ndarray with elements of varying shape or type introduced by numpy/numpy#15119 Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
Thanks for this fix! Just pointing out to whoever googled their way here later:
|
Redo gh-14794
but add anp.allow_object
sentinel to retain legacy behaviour.Also see gh-15118 which is tangentially connected to this PR, where the
context
argument now serves dual purposes.