Skip to content

DEP: Deprecate setting the strides and dtype of a numpy array #28901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

See #28800

@eendebakpt eendebakpt marked this pull request as draft May 4, 2025 21:37
@eendebakpt eendebakpt changed the title Draft: DEP: Deprecate setting the strides and dtype of a numpy array DEP: Deprecate setting the strides and dtype of a numpy array May 4, 2025
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error paths are both incorrect and untested, so that is definitely required.

Other than that, should have a release note as well. But I guess you had it as draft for a reason :).

@@ -124,6 +124,11 @@ array_strides_set(PyArrayObject *self, PyObject *obj, void *NPY_UNUSED(ignored))
npy_intp upper_offset = 0;
Py_buffer view;

/* DEPRECATED 2025-05-04, NumPy 2.3 */
PyErr_WarnEx(PyExc_DeprecationWarning,
"Setting the strides on a Numpy array has been deprecated in Numpy 2.3.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could point to strided_window_view and stride_tricks.as_strided. Although, not sure it is needed for this one.

msg = "Setting the strides on a Numpy array has been deprecated"
arr = np.arange(48)
with pytest.warns(DeprecationWarning, match=msg):
arr.strides = arr.strides
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the tests to test_deprecations.py. They should test the error path as well (which test_deprecations machinery does for you, although you can do it explicitly also.).

@jorenham

This comment was marked as resolved.

@eendebakpt
Copy link
Contributor Author

eendebakpt commented May 5, 2025

The .view() method sets the dtype and this is a bit hard to handle. In PyArray_View (see convert.c) first a new array is constructed with PyArray_NewFromDescr_int and then the dtype is updated with PyObject_SetAttrString. I replaced the call to PyObject_SetAttrString with a direct call to the code updating the dtype, bypassing the deprecation warning. This works fine for exact arrays, but gives issues for subclasses, in particular the masked array.

We could use PyObject_SetAttrString inside PyArray_View and catch the generated deprecation warning. This would hide changing the dtype in PyArray_View (but it is ok, there is only a single reference to the array), but users changing the dtype from python will get the deprecation warning.

Any other ideas on how to approach this?

Update: refactored the PR to use unique references for warning. This will exclude the calls from inside PyArray_View.

@eendebakpt eendebakpt force-pushed the deprecate_array_stride_set branch from df42838 to 2fa5086 Compare May 5, 2025 21:22
@seberg
Copy link
Member

seberg commented May 6, 2025

Any other ideas on how to approach this?
Update: refactored the PR to use unique references for warning. This will exclude the calls from inside PyArray_View.

Unique reference cheeck seems OK to me, also just to be nice for a bit. Overall, I think there is no reason not to pass the dtype already while creating the initial view (that currently has the same dtype), but not quite sure how that pans out.
My first guess would be that the only reason for this organization is that arr.dtype = includes the necessary checks for whether a view is OK.

As for some newer code changes here, let's avoid any warning filtering (yes I guess on new Python it's threadsafe at least, but...). Either, this still works fine now that you did the refcount check, or we should just use a arr.view() which could be guarded by dtype != dtype.

(I am not certain the refcount checks will work on PyPy as is, so there might be a problem with using it for avoiding th view() change.)

@eendebakpt eendebakpt force-pushed the deprecate_array_stride_set branch from af05298 to c27f61a Compare May 6, 2025 09:46
@eendebakpt
Copy link
Contributor Author

As for some newer code changes here, let's avoid any warning filtering (yes I guess on new Python it's threadsafe at least, but...). Either, this still works fine now that you did the refcount check, or we should just use a arr.view() which could be guarded by dtype != dtype.

The warning filtering is indeed not nice. But using a view won't work directly. For example for the recarray the dtype is updated in the array finalizer.

def __array_finalize__(self, obj):
if self.dtype.type is not record and self.dtype.names is not None:
# if self.dtype is not np.record, invoke __setattr__ which will
# convert it to a record if it is a void dtype.
with warnings.catch_warnings():
# gh-28901
warnings.filterwarnings("ignore", category=DeprecationWarning)
self.dtype = self.dtype

I can create a view with the new dtype (maybe this will create an infinite recursion, I will have to try), but I cannot replace self with the new view.

@seberg
Copy link
Member

seberg commented May 6, 2025

but I cannot replace self with the new view.

Right, this was maybe the reason for why it wasn't deprecated before. I think we need a solution for it, though. And as much as I dislike this record array stuff and we should maybe discourage its use.

My first thought is one or both of:

  • Allow this specific change for record-arrays, where the new dtype is known to be fully equivalent and behave exactly the same.
  • Just add a by-pass, with something _dtype = setting, or a (semi?)private function that forces it. (I slightly fear that _dtype might be used, so maybe something more awkward.)

Unless the "unique reference" path can fix this (but I guess it didn't).

The point is, if our code needs a warning manager, then we are just kicking the real solution down the road, since eventually that should be an error and the warning context manager will stop working anyway.

@eendebakpt eendebakpt force-pushed the deprecate_array_stride_set branch from b933d58 to 11f6569 Compare May 6, 2025 16:41
@eendebakpt
Copy link
Contributor Author

but I cannot replace self with the new view.

Right, this was maybe the reason for why it wasn't deprecated before. I think we need a solution for it, though. And as much as I dislike this record array stuff and we should maybe discourage its use.

My first thought is one or both of:

  • Allow this specific change for record-arrays, where the new dtype is known to be fully equivalent and behave exactly the same.

Do you means checking inside the dtype setter whether the object is a subclass of masked array or recarray? That is possible (a bit inconvenient as the masked array and rec array are defined in python).

  • Just add a by-pass, with something _dtype = setting, or a (semi?)private function that forces it. (I slightly fear that _dtype might be used, so maybe something more awkward.)

This seems a reasonable approach. Any user making use of _dtype would be aware this is an internal method and best avoided.

It would be much nicer if we could rewrite the code for masked array and recarray to not update the dtype at all. If we are not able to do this, then perhaps there are users with their own subclasses that will face the same issues. For masked array the most changes are introduced in #10314 and #5943. @ahaldane @mhvk do you know whether avoiding setting the dtype in the finalizer would be possible?

Unless the "unique reference" path can fix this (but I guess it didn't).

The unique reference path does not help here, at the point where the recarray or masked array sets the dtype, we already have 5 references to the object.

The point is, if our code needs a warning manager, then we are just kicking the real solution down the road, since eventually that should be an error and the warning context manager will stop working anyway.

Kicking down the solution might be ok at first. This PR is to signal users we are deprecating setting the dtype and strides. We can adapt our own code in a followup PR.

@seberg
Copy link
Member

seberg commented May 7, 2025

Do you means checking inside the dtype setter whether the object is a subclass of masked array or recarray?

I was thinking for record dtypes, you could check that the only difference in the dtype is indeed that it's a "record" one. (I don't quite recall what that meant, it's an annoying concept that exists solely to return different scalars, IIRC.)

It would be nice to just not need any of this and a great improvement to try and do something about it. But it might be a rather deep rabbit hole...

Kicking down the solution might be ok at first. This PR is to signal users we are deprecating setting the dtype and strides. We can adapt our own code in a followup PR.

Yeah it may be fine. It would be nice not to if it isn't hard, for three reasons (none of which are quite blocking maybe):

  1. Doing it know gives us full confidence we can go through with it.
  2. Whatever work-around we add may be useful/needed by downstream. Which makes life easier for them (worst case they need one version dependent set of code, not update that again).
  3. Warning context managers are not very fast and not thread safe on older Python versions.

It would be much nicer if we could rewrite the code for masked array and recarray to not update the dtype at all.

Yes, but especially for record arrays it may be a rather deep rabbit hole as mentioned above.

@jorenham jorenham self-requested a review May 8, 2025 00:36
Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squigglies appear where they should (at least with Pylance on VSCode):
image

So typing-wise there's nothing I could possibly complain about now 👌🏻

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is nice! Some mostly small comments in-line. My main comment echoes the discussion -- ideally, we avoid this whole _set_dtype. Will try to have a bit of a look at least for MaskedArray...

@@ -124,6 +124,14 @@ array_strides_set(PyArrayObject *self, PyObject *obj, void *NPY_UNUSED(ignored))
npy_intp upper_offset = 0;
Py_buffer view;

/* DEPRECATED 2025-05-04, NumPy 2.3 */
int ret = PyErr_WarnEx(PyExc_DeprecationWarning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the DEPRECATE() macro? As e.g. done at

if (DEPRECATE("Data type alias 'a' was deprecated in NumPy 2.0. "
"Use the 'S' alias instead.") < 0) {
return NULL;
}

array_descr_set(PyArrayObject *self, PyObject *arg, void *NPY_UNUSED(ignored))
{
// to be replaced with PyUnstable_Object_IsUniquelyReferenced https://github.com/python/cpython/pull/133144
int unique_reference = (Py_REFCNT(self) == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we probably do not care too much about a missed deprecation warning in some cases on python 3.14, but we do have

check_unique_temporary(PyObject *lhs)
, and using it could make sure we don't miss updating this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arr isn't a temporary though, so it probably doesn't apply but the "unique" part of that function should apply.


def set_dtype(x):
x.dtype = int
self.assert_deprecated(lambda: set_dtype(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simpler and clearer as self.assert_deprecated(setattr, x, "dtype", int)?

s = x.strides
x.strides = s

self.assert_deprecated(lambda: set_strides(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

@@ -1215,7 +1216,9 @@ def test_zero_stride(self):
arr = np.broadcast_to(arr, 10)
assert arr.strides == (0,)
with pytest.raises(ValueError):
arr.dtype = "i1"
with warnings.catch_warnings(): # gh-28901
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well test that the deprecation warning happens here?

with pytest.warns(DeprecationWarning, match="Setting the dtype"):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases where we sometimes get a warning, and sometimes not (depending on optimizations perhaps? or pypy? refcounting is funny stuff). For that reason I picked warnings.catch_warnings() which works for either a warning or no warning.

I will check once more whether we can use a pytest.warns here.

@@ -366,7 +368,11 @@ def make_array(size, offset, strides):
offset=offset * x.itemsize)
except Exception as e:
raise RuntimeError(e)
r.strides = strides = strides * x.itemsize

with warnings.catch_warnings(): # gh-28901
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think one might as well use pytest.warns (and below too)


# test 0d
arr_0d = np.array(0)
arr_0d.strides = ()
arr_0d = np.lib.stride_tricks.as_strided(arr_0d, strides = ())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove spaces around = for arguments to functions (PEP8) -- also in the other cases below.

@@ -101,7 +102,10 @@ def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
array = np.asarray(DummyArray(interface, base=x))
# The route via `__interface__` does not preserve structured
# dtypes. Since dtype should remain unchanged, we set it explicitly.
array.dtype = x.dtype
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to have _set_dtype at all, but if we keep it, then this seems a good place to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the idea of having this work around, is exactly the ability to avoid any warning context managers both in NumPy (and potentially outside of NumPy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I wanted to be on the safe side since there is a difference between .dtype = ... and ._set_dtype(...) for subclasses that override setting attributes. Here array is created by np.asarray so it is fine and I will update the code.

@@ -3492,7 +3492,8 @@ def dtype(self):

@dtype.setter
def dtype(self, dtype):
super(MaskedArray, type(self)).dtype.__set__(self, dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - isn't it OK to have deprecation warnings also for MaskedArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a valid code path from numpy where the dtype of a masked array is updated (I think it was the PyArray_View) and the refcount is already larger than 1. In that case we do not want to generate a warning, hence the use of _set_dtype.
The downside of this approach is that if a user changes the dtype, we will not get a deprecation warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I was suggesting to have the warning only for ndarray proper -- then we can just not worry about MaskedArray and recarray yet.

@@ -2126,7 +2126,11 @@ def test_assign_dtype(self):
a = np.zeros(4, dtype='f4,i4')

m = np.ma.array(a)
m.dtype = np.dtype('f4')
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here for sure use pytest.warns - we do want to check warnings are emitted also for MaskedArray (and also below of course)

@mhvk
Copy link
Contributor

mhvk commented May 8, 2025

Darn, what a rabbit hole indeed! One possible solution is to start giving the warning only on ndarray itself instead of also on subclasses - then we do not need any changes to masked and record arrays. I.e.,

--- a/numpy/_core/src/multiarray/getset.c
+++ b/numpy/_core/src/multiarray/getset.c
@@ -527,9 +527,7 @@ static int
 array_descr_set(PyArrayObject *self, PyObject *arg, void *NPY_UNUSED(ignored))
 {
     // to be replaced with PyUnstable_Object_IsUniquelyReferenced https://github.com/python/cpython/pull/133144
-    int unique_reference = (Py_REFCNT(self) == 1);
-
-    if (!unique_reference) {
+    if (PyArray_CheckExact(self) && (Py_REFCNT(self) != 1)) {
         // this will not emit deprecation warnings for all cases, but for most it will
         /* DEPRECATED 2025-05-04, NumPy 2.3 */
         int ret = PyErr_WarnEx(PyExc_DeprecationWarning,

Not ideal, but at least it makes a start, and we don't need the new _set_dtype method...

@seberg
Copy link
Member

seberg commented May 8, 2025

Not ideal, but at least it makes a start, and we don't need the new _set_dtype method...

I guess I am not too worried about a _set_dtype if that is the a way make broader progress without larger refactors (including calling it semi-public, but discouraging its use).
However, 2.3 branching is around the corner (<~2 weeks) so especially if you want to push for that and the alternatives are just unwieldy, then omitting subclasses seems like a good way to make a very good start here.

@mhvk
Copy link
Contributor

mhvk commented May 8, 2025

Good point about 2.3 being around the corner. I think following my suggestion is easiest in the sense that it introduces no new API, and we can then think about how to do PyArray_View better. E.g., one could envisage only calling the dtype setter if it is different from the standard one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants