ENH: Add ndmax parameter to np.array to control recursion depth #29569

kibitzing · 2025-08-15T12:15:32Z

Implementaion of #29499

Description

This PR introduces a new parameter, ndmax, to the np.array function. This parameter allows users to explicitly limit the maximum number of dimensions that NumPy will create when converting nested sequences (like lists of lists) into an array.

To ensure backward compatibility, the default behavior remains unchanged. When ndmax is not specified, the function falls back to its current heuristic. This means it will create as many dimensions as possible from nested sequences, up to the compile-time limit defined by NPY_MAXDIMS.

Motivation and Context

As discussed in issue #29499, the current behavior of np.array can lead to unexpected results when dtype=object is specified. For example:

A list of lists with non-uniform lengths correctly creates a 1D array of list objects.
A list of lists with uniform lengths is implicitly promoted to a 2D ndarray, even with dtype=object.

This inconsistency arises because, as @rkern explained, NumPy has a strong heuristic that assumes uniformly nested sequences are intended to be multi-dimensional arrays. While this is often the desired outcome, it overrides the user's explicit intent in cases where a 1D array of list objects is needed, regardless of their shape. This can cause subtle bugs, particularly in data processing pipelines for machine learning.

The existing workarounds, such as pre-allocating with np.empty and then assigning or using np.fromiter, are effective but less intuitive than a direct parameter. The addition of ndmax provides a clear and explicit way to control this behavior directly within the most commonly used array creation function.

How to use `ndmax`

The ndmax parameter provides fine-grained control over the array creation process.

Current Behavior (Unexpected Promotion):

# Even with dtype=object, this creates a 2D NumPy array
b = np.array([[1, 2, 3], [4, 5, 6]], dtype=object)
print(b.shape)      # (2, 3)
print(type(b[0]))   # <class 'numpy.ndarray'>

New Behavior with ndmax=1 (Desired Outcome):

# With ndmax=1, NumPy stops at 1 dimension, preserving inner lists
b_new = np.array([[1, 2, 3], [4, 5, 6]], dtype=object, ndmax=1)
print(b_new.shape)      # (2,)
print(type(b_new[0]))   # <class 'list'>

This aligns the behavior for uniform lists with that of non-uniform lists when the user's intent is to create an array of objects.

Implementation Details

The implementation was made straightforward by leveraging the pre-existing max_depth parameter in the core C-API function, PyArray_FromAny_int. The new Python-level ndmax argument is passed directly to this internal parameter.

This approach ensures that the new feature is built upon NumPy's established and tested array creation logic, minimizing risk and maintaining internal consistency.

mhvk

@kibitzing - thanks for the PR. I think this is rather nice, especially that you found all it really takes is exposing existing functionality!

However, to me at least there is one annoyance: it would seem to me that there is no reason not to support np.array(..., ndmax=0) as meaning that the user wants an array with ndim=0 (i.e., a scalar array).

Unfortunately, this clashes with the numpy C API, which has PyArray_FromAny and friends interpret max_depth=0 as arbitrary (it is actually not documented as such, but since it is used like that throughout the code base, it surely is used outside of numpy as well).

But I think we can make it work by adjusting PyArray_FromAny to change 0 to NPY_MAXDIMS and then have PyArray_FromAny_int not do any checks. See in-line comments.

What do you think?

mhvk · 2025-08-15T13:42:11Z

numpy/_core/src/multiarray/multiarraymodule.c

@@ -1747,8 +1748,19 @@ array_array(PyObject *NPY_UNUSED(ignored),
        op = args[0];
    }

+    if (ndmax > NPY_MAXDIMS || ndmax <= 0) {


So here I'd also check ndmax < 0 rather than <= 0.

Thank you for the suggestion, I've changed as you suggested.

mhvk · 2025-08-15T13:59:12Z

numpy/_core/src/multiarray/ctors.c

@@ -1545,6 +1545,10 @@ PyArray_FromAny_int(PyObject *op, PyArray_Descr *in_descr,
    int ndim = 0;
    npy_intp dims[NPY_MAXDIMS];

+    if (max_depth == 0 || max_depth > NPY_MAXDIMS) {


To support actually using max_depth = 0 as a real option, the check here would need to be removed.

But that means adjusting code that calls it. The main user is PyArray_FromAny (in ctors.c as well), which would then need this if statement (indeed, perhaps the whole statement should be moved there; it is not unreasonable to assume that for internal calls the values are known to be good).

One other user is PyArray_CheckFromAny_int (also inctors.c), though logically there also perhaps PyArray_CheckFromAny is the one that should be adjusted.

The final user seems to be scalartypes.c.src -- there one could just replace the 0 with NPY_MAX_DIMS.

Thank you for the clear guidance.

As you suggested, I've refactored the logic so PyArray_FromAny_int can treat max_depth=0 as a request for a 0-D array.
The responsibility for handling the "no limit" case (by converting 0 to NPY_MAXDIMS) has been moved to its callers:

The check was added to PyArray_FromAny and PyArray_CheckFromAny.

Call sites in scalartypes.c.src and array_converter_new were updated to use NPY_MAXDIMS explicitly.

kibitzing · 2025-08-15T15:41:37Z

Hello @mhvk,
Thank you for the excellent feedback and for proposing a clear solution!

I also considered supporting ndmax=0 for scalar creation. I initially decided against it, partly because of the C-API's convention that you mentioned, and partly because I was cautious about modifying such a core piece of code.

However, the detailed guidance you've provided makes the path forward very clear, so I'm happy to get to work on implementing it.

mhvk

@kibitzing - this looks great. I was about to approve, but then I realized we should make sure this is actually documented. So, could you add a description in _add_newdocs.py (search for 'array' - l.806; you'll need a.. versionadded:... in there, see examples in that file).

Furthermore, personally, I feel this is worth a little what's-new entry - there are many who have run into this issue! (We've got some code in astropy that we can now simplify!) For this you need to add a fragment in doc/release/upcoming_changes - see the README.rst in that directory).

Since that means another CI run anyway, I also put two absoltely nitpicky comments in-line, which you might as well do too... (but feel free to disagree and ignore).

Let me also ping @mattip and @rkern, in case they want to have a look as well (though in the end this is so much a case of just exposing what was effectively in place already that I'm quite happy to just get it in myself too).

mhvk · 2025-08-15T19:56:27Z

numpy/_core/src/multiarray/multiarraymodule.c

@@ -1747,8 +1748,19 @@ array_array(PyObject *NPY_UNUSED(ignored),
        op = args[0];
    }

+    if (ndmax > NPY_MAXDIMS || ndmax < 0) {
+        if (ndmax > NPY_MAXDIMS) {
+            PyErr_Format(PyExc_ValueError, "ndmax must be <= NPY_MAXDIMS (=%d)", NPY_MAXDIMS);


Again, I wouldn't ask if there wasn't another reason to push a further commit, but to have error path code interrupt the flow as little as possible, I'd tend to have just one PyErr_Format, with (e.g.) "must have 0 <= ndmax <= NPY_MAXDIMS (=%d)".

Thank you for the feedback!
Updated to use a single PyErr_Format with the combined bounds check message.

numpy/_core/src/multiarray/ctors.c

kibitzing · 2025-08-16T10:01:56Z

Hello @mhvk,
Thanks for the thorough review! I've addressed all the points:

Added documentation and examples in _add_newdocs.py with the .. versionadded:: directive
Created a news fragment in doc/release/upcoming_changes for the what's new entry (happy to hear it will help simplify things for you!)
Applied the inline suggestions. Thanks for spotting those, they definitely improve the code quality

Please let me know if there's anything else you'd like me to adjust!

mhvk

Super, this now looks great! Let's get it in. Thanks very much for solving a long-standing annoyance in an elegant way!

kibitzing · 2025-08-16T14:29:39Z

Hello @mhvk,
Thank you so much for your insightful, thoughtful and detailed feedback throughout the process!
I've learned a lot. It was a real pleasure working together on this!

mhvk · 2025-08-16T16:36:07Z

Indeed, a nice collaboration!

This follows numpy#29569, and also fills in the missing parameter defaults, towards numpy#28428.

kibitzing added 3 commits August 15, 2025 17:16

ENH: add ndmax parameter to np.array

d7509ec

ENH: validate ndmax argument is positive

4f698ca

TST: add ndmax tests for array creation

fe974b9

github-actions bot added the 01 - Enhancement label Aug 15, 2025

kibitzing mentioned this pull request Aug 15, 2025

BUG: Inconsistent behavior in np.array with dtype=object when inner list lengths are uniform #29499

Closed

mhvk reviewed Aug 15, 2025

View reviewed changes

ENH: allow np.array with ndmax=0 to create 0-D array

ecaaee2

mhvk reviewed Aug 15, 2025

View reviewed changes

kibitzing added 6 commits August 16, 2025 12:20

MNT: simplify ndmax validation and error message

cfce2c2

MNT: improve consistency in error message formatting

dabaeda

DOC: add comment explaining legacy behavior of max_depth=0

0f1ca84

TST: update tests to reflect new ndmax validation and error message

89f36ef

DOC: add documentation and examples for np.array ndmax parameter

9ae4f19

DOC: add release note for numpy.array ndmax parameter

8ba91fc

mhvk approved these changes Aug 16, 2025

View reviewed changes

mhvk merged commit 4e00e4d into numpy:main Aug 16, 2025
55 of 56 checks passed

jorenham added a commit to jorenham/numpy that referenced this pull request Aug 16, 2025

TYP: add ndmax parameter to np..array

b9dcaa3

This follows numpy#29569, and also fills in the missing parameter defaults, towards numpy#28428.

jorenham mentioned this pull request Aug 16, 2025

TYP: add ndmax parameter to np..array #29574

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add ndmax parameter to np.array to control recursion depth #29569

ENH: Add ndmax parameter to np.array to control recursion depth #29569

kibitzing commented Aug 15, 2025

Uh oh!

mhvk left a comment

Uh oh!

mhvk Aug 15, 2025

Uh oh!

kibitzing Aug 15, 2025

Uh oh!

mhvk Aug 15, 2025

Uh oh!

kibitzing Aug 15, 2025

Uh oh!

kibitzing commented Aug 15, 2025

Uh oh!

mhvk left a comment

Uh oh!

mhvk Aug 15, 2025

Uh oh!

kibitzing Aug 16, 2025

Uh oh!

Uh oh!

kibitzing commented Aug 16, 2025 •

edited

Loading

Uh oh!

mhvk left a comment

Uh oh!

Uh oh!

kibitzing commented Aug 16, 2025

Uh oh!

mhvk commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

ENH: Add ndmax parameter to np.array to control recursion depth #29569

ENH: Add ndmax parameter to np.array to control recursion depth #29569

Conversation

kibitzing commented Aug 15, 2025

Description

Motivation and Context

How to use ndmax

Implementation Details

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

mhvk Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

kibitzing Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

mhvk Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

kibitzing Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

kibitzing commented Aug 15, 2025

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

mhvk Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

kibitzing Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kibitzing commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kibitzing commented Aug 16, 2025

Uh oh!

mhvk commented Aug 16, 2025

Uh oh!

Uh oh!

How to use `ndmax`

kibitzing commented Aug 16, 2025 •

edited

Loading