-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
ENH: Add ndmax parameter to np.array to control recursion depth #29569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kibitzing - thanks for the PR. I think this is rather nice, especially that you found all it really takes is exposing existing functionality!
However, to me at least there is one annoyance: it would seem to me that there is no reason not to support np.array(..., ndmax=0)
as meaning that the user wants an array with ndim=0
(i.e., a scalar array).
Unfortunately, this clashes with the numpy C API, which has PyArray_FromAny
and friends interpret max_depth=0
as arbitrary (it is actually not documented as such, but since it is used like that throughout the code base, it surely is used outside of numpy as well).
But I think we can make it work by adjusting PyArray_FromAny
to change 0 to NPY_MAXDIMS
and then have PyArray_FromAny_int
not do any checks. See in-line comments.
What do you think?
@@ -1747,8 +1748,19 @@ array_array(PyObject *NPY_UNUSED(ignored), | |||
op = args[0]; | |||
} | |||
|
|||
if (ndmax > NPY_MAXDIMS || ndmax <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here I'd also check ndmax < 0
rather than <= 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, I've changed as you suggested.
numpy/_core/src/multiarray/ctors.c
Outdated
@@ -1545,6 +1545,10 @@ PyArray_FromAny_int(PyObject *op, PyArray_Descr *in_descr, | |||
int ndim = 0; | |||
npy_intp dims[NPY_MAXDIMS]; | |||
|
|||
if (max_depth == 0 || max_depth > NPY_MAXDIMS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support actually using max_depth = 0
as a real option, the check here would need to be removed.
But that means adjusting code that calls it. The main user is PyArray_FromAny
(in ctors.c
as well), which would then need this if
statement (indeed, perhaps the whole statement should be moved there; it is not unreasonable to assume that for internal calls the values are known to be good).
One other user is PyArray_CheckFromAny_int
(also inctors.c
), though logically there also perhaps PyArray_CheckFromAny
is the one that should be adjusted.
The final user seems to be scalartypes.c.src
-- there one could just replace the 0 with NPY_MAX_DIMS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clear guidance.
As you suggested, I've refactored the logic so PyArray_FromAny_int
can treat max_depth=0
as a request for a 0-D array.
The responsibility for handling the "no limit" case (by converting 0
to NPY_MAXDIMS
) has been moved to its callers:
- The check was added to
PyArray_FromAny
andPyArray_CheckFromAny
. - Call sites in
scalartypes.c.src
andarray_converter_new
were updated to useNPY_MAXDIMS
explicitly.
Hello @mhvk, I also considered supporting However, the detailed guidance you've provided makes the path forward very clear, so I'm happy to get to work on implementing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kibitzing - this looks great. I was about to approve, but then I realized we should make sure this is actually documented. So, could you add a description in _add_newdocs.py
(search for 'array'
- l.806; you'll need a.. versionadded:...
in there, see examples in that file).
Furthermore, personally, I feel this is worth a little what's-new entry - there are many who have run into this issue! (We've got some code in astropy that we can now simplify!) For this you need to add a fragment in doc/release/upcoming_changes
- see the README.rst
in that directory).
Since that means another CI run anyway, I also put two absoltely nitpicky comments in-line, which you might as well do too... (but feel free to disagree and ignore).
Let me also ping @mattip and @rkern, in case they want to have a look as well (though in the end this is so much a case of just exposing what was effectively in place already that I'm quite happy to just get it in myself too).
@@ -1747,8 +1748,19 @@ array_array(PyObject *NPY_UNUSED(ignored), | |||
op = args[0]; | |||
} | |||
|
|||
if (ndmax > NPY_MAXDIMS || ndmax < 0) { | |||
if (ndmax > NPY_MAXDIMS) { | |||
PyErr_Format(PyExc_ValueError, "ndmax must be <= NPY_MAXDIMS (=%d)", NPY_MAXDIMS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I wouldn't ask if there wasn't another reason to push a further commit, but to have error path code interrupt the flow as little as possible, I'd tend to have just one PyErr_Format
, with (e.g.) "must have 0 <= ndmax <= NPY_MAXDIMS (=%d)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback!
Updated to use a single PyErr_Format
with the combined bounds check message.
Hello @mhvk,
Please let me know if there's anything else you'd like me to adjust! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super, this now looks great! Let's get it in. Thanks very much for solving a long-standing annoyance in an elegant way!
Hello @mhvk, |
Indeed, a nice collaboration! |
This follows numpy#29569, and also fills in the missing parameter defaults, towards numpy#28428.
Implementaion of #29499
Description
This PR introduces a new parameter,
ndmax
, to thenp.array
function. This parameter allows users to explicitly limit the maximum number of dimensions that NumPy will create when converting nested sequences (like lists of lists) into an array.To ensure backward compatibility, the default behavior remains unchanged. When
ndmax
is not specified, the function falls back to its current heuristic. This means it will create as many dimensions as possible from nested sequences, up to the compile-time limit defined byNPY_MAXDIMS
.Motivation and Context
As discussed in issue #29499, the current behavior of
np.array
can lead to unexpected results whendtype=object
is specified. For example:list
objects.ndarray
, even withdtype=object
.This inconsistency arises because, as @rkern explained, NumPy has a strong heuristic that assumes uniformly nested sequences are intended to be multi-dimensional arrays. While this is often the desired outcome, it overrides the user's explicit intent in cases where a 1D array of
list
objects is needed, regardless of their shape. This can cause subtle bugs, particularly in data processing pipelines for machine learning.The existing workarounds, such as pre-allocating with
np.empty
and then assigning or usingnp.fromiter
, are effective but less intuitive than a direct parameter. The addition ofndmax
provides a clear and explicit way to control this behavior directly within the most commonly used array creation function.How to use
ndmax
The
ndmax
parameter provides fine-grained control over the array creation process.ndmax=1
(Desired Outcome):This aligns the behavior for uniform lists with that of non-uniform lists when the user's intent is to create an array of objects.
Implementation Details
The implementation was made straightforward by leveraging the pre-existing
max_depth
parameter in the core C-API function,PyArray_FromAny_int
. The new Python-levelndmax
argument is passed directly to this internal parameter.This approach ensures that the new feature is built upon NumPy's established and tested array creation logic, minimizing risk and maintaining internal consistency.