Skip to content

NEP: issue deprecation warning when creating ragged array (NEP 34) #15119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 21, 2020

Conversation

mattip
Copy link
Member

@mattip mattip commented Dec 16, 2019

Redo gh-14794 but add a np.allow_object sentinel to retain legacy behaviour.

Also see gh-15118 which is tangentially connected to this PR, where the context argument now serves dual purposes.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nitpicks. Others may have opinions about using context for this purpose, but it is convenient. Mainly waiting for a clear decision the whole thing I guess.

@mattip
Copy link
Member Author

mattip commented Dec 30, 2019

waiting on gh-15083 to resolve the name and need for the singleton

@mattip
Copy link
Member Author

mattip commented Jan 13, 2020

It turns out downstream libraries (I checked pandas, scipy) adapted to NEP 34 even without the sentinel. Once this is ready and reviewed, I would like to try again to see if it causes less noise this time

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (minor things). IIRC this is pretty much what we had before, right? We probably still need to (re-) add the release notes?

@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jan 14, 2020
if (is_object) {
if (is_object != DISCOVERED_OK) {
if (is_object == DISCOVERED_RAGGED && requested_dtype == NULL) {
/* NumPy 1.18, 2019-11-01 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 19 now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was missed? I can fix it up though.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dragging this out, but I am considering if we should try to spell out/target the error message for the unexperienced user who likely just got it by accident. We could even add that other tools may exist to handle such data, but not sure about that?

if (is_object) {
if (is_object != DISCOVERED_OK) {
if (is_object == DISCOVERED_RAGGED && requested_dtype == NULL) {
/* NumPy 1.18, 2019-11-01 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was missed? I can fix it up though.

/* NumPy 1.18, 2019-11-01 */
if (DEPRECATE("Creating an ndarray from ragged nested sequences "
"and automatic object dtype detection is deprecated. Use "
"dtype=object or an exact dtype") < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that Eric pointed out that we may want to include "ragged", I am thinking we should actually bloat this up a lot more. Maybe even create a VisibleDeprecationWarning (although that is a bit annoying).

The reason is that we are currently aiming the error at users who want to keep the old behaviour, when actually most users who see it are likely people who simply mistyped and thus have a bug. For them "ragged" is probably not helpful except as a google term.

It is a bit long I know, but am considering something like:

You are attempting to create an array, or doing an array operation, with uneven shape (ragged). This is not supported and may be an input error such as np.array([[1], [1, 2]]) where the first list is shorter than the second. If you wish to create an array containing sequences of different length you must create an array with dtype=object in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a VisibleDeprecationWarning, extended the message, rebased off master, and fixed the deprecation version in e11d57a.

Although in my mind the difference between a VisibleDeprecationWarning and an Error is negligible: emitting a VisibleDeprecationWarning is so annoying it means there is no real deprecation period, users will, in practice, have to fix their code immediately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching for "ragged nested sequences numpy" the first hit is NEP 34, so I think this text is good enough.

@mattip
Copy link
Member Author

mattip commented Jan 16, 2020

Should be squashed before merging so we can revert a single changeset.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the googlebility of "ragged nested sequences". I am OK with this, just thoughts. About the VisibleDeprecationWarning if anyone else has an opinion, please just note. I think I am still a little bit in favor: The assumption is that in the vast majority of (non-library) use cases, it currently is actually a silent error. If anyone thinks that is not true, we should first do the normal deprecation.

if (is_object) {
if (is_object != DISCOVERED_OK) {
static PyObject * visibleDeprecationWarning = NULL;
if (visibleDeprecationWarning == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npy_cache_import includes this if (and is guaranteed to be inlined)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing

if (is_object == DISCOVERED_RAGGED && requested_dtype == NULL) {
/* NumPy 1.19, 2019-11-01 */
if (PyErr_WarnEx(visibleDeprecationWarning, "Creating an "
" ndarray from ragged nested sequences (which is a "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" ndarray from ragged nested sequences (which is a "
"ndarray from ragged nested sequences (which is a "

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, forgot to press submit: Maybe rather than the bracketed explanation, put in the example: such as '[[1], [2, 3]]' where '[2, 3]' is longer than '[1]'? I would hope that is more useful.

Not sure though, and I am fine with it as is (we can always revise as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing

@mattip
Copy link
Member Author

mattip commented Jan 16, 2020

Squashed to one commit

@mattip mattip removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jan 19, 2020
@mattip mattip requested a review from seberg January 19, 2020 23:32
@seberg
Copy link
Member

seberg commented Jan 21, 2020

Thanks matti, let me put this in. I feel the error message could be nicer, but do not have a good idea right now and its easy to fix up. (Will squash some whitespace ony changes, so not waiting for it).

@seberg seberg merged commit 287a9d0 into numpy:master Jan 21, 2020
@seberg seberg changed the title NEP, ENH: implement NEP34 (deprecate ragged object array creation) NEP: issue deprecation warning when creating ragged array (NEP 34) Jan 21, 2020
kmantel added a commit to kmantel/PsyNeuLink that referenced this pull request Jun 24, 2020
…cation

dimension=None makes no forced dimension conversion

handle numpy.VisibleDeprecationWarning and future ValueError produced by
creating a numpy.ndarray with elements of varying shape or type introduced by
numpy/numpy#15119

Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
kmantel added a commit to kmantel/PsyNeuLink that referenced this pull request Jun 30, 2020
…cation

dimension=None makes no forced dimension conversion

handle numpy.VisibleDeprecationWarning and future ValueError produced by
creating a numpy.ndarray with elements of varying shape or type introduced by
numpy/numpy#15119

Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
kmantel added a commit to kmantel/PsyNeuLink that referenced this pull request Jun 30, 2020
…cation

dimension=None makes no forced dimension conversion

handle numpy.VisibleDeprecationWarning and future ValueError produced by
creating a numpy.ndarray with elements of varying shape or type introduced by
numpy/numpy#15119

Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
kmantel added a commit to kmantel/PsyNeuLink that referenced this pull request Jun 30, 2020
…cation

dimension=None makes no forced dimension conversion

handle numpy.VisibleDeprecationWarning and future ValueError produced by
creating a numpy.ndarray with elements of varying shape or type introduced by
numpy/numpy#15119

Also see https://numpy.org/neps/nep-0034-infer-dtype-is-object.html
@mattip mattip deleted the nep-0034-impl branch November 2, 2020 08:29
@idanmiara
Copy link

Thanks for this fix! Just pointing out to whoever googled their way here later:
The VisibleDeprecationWarning was replaced by ValueError in numpy v1.24:
https://numpy.org/devdocs/release/1.24.0-notes.html

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants