Skip to content

MAINT,API: Make metadata, c_metadata, fields, and names only exist on old-style dtypes #25802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
439bc0f
MAINT: Internally split some Descriptor fields into LegacyDescriptor
seberg Feb 10, 2024
7329d7e
MAINT: Temporary fix for cython to build random (not super bad though)
seberg Feb 11, 2024
c82245a
MAINT: Massive fixup to cast to _PyArray_LegacyDescr or use macro access
seberg Feb 10, 2024
ba10883
MAINT: Further places where to prefer casts (and rename in cast prep)
seberg Feb 11, 2024
873d3ec
MAINT: Further preferences for casts instead of macro with check
seberg Feb 11, 2024
87d9d95
MAINT: Further use of direct access on LegacyDescr (and one related fix)
seberg Feb 11, 2024
8e7cc23
MAINT: Add missing Legacydescr cast
seberg Feb 11, 2024
b58c790
Revert "MAINT: Temporary fix for cython to build random (not super ba…
seberg Feb 12, 2024
03b4277
MAINT: Use static inline functions rather than macros as per review
seberg Feb 12, 2024
70409c4
MAINT: Use direct struct access again in more places
seberg Feb 12, 2024
0910b58
DOC: Add release note snippet for descr field access indirection
seberg Feb 13, 2024
a8f4618
MAINT: Delete ``PyDataType_GetDatetimeMetaData`` completely
seberg Feb 18, 2024
5a3eafa
MAINT: Keep ``->metadata`` as direct access internally
seberg Feb 18, 2024
80beb25
MAINT: Fix merge/rebase conflict
seberg Feb 18, 2024
80b6d6e
Merge branch 'main' into create-legacy-descr
seberg Mar 4, 2024
b37cbf7
MAINT: Add a few casts back to normal descriptor for flag checking
seberg Mar 4, 2024
82ac0ba
MAINT: Add explicitly casts for functions rather than void *
seberg Mar 4, 2024
56194f0
DOC: Add brief documentation on new accessors and move release notes
seberg Mar 4, 2024
31e5d47
MAINT,DOC: Fixups based on review
seberg Mar 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions doc/source/reference/c-api/array.rst
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,34 @@ General check of Python Type
:c:data:`PyArray_Type` whose dimensionality is 0.


Data-type accessors
~~~~~~~~~~~~~~~~~~~

Some of the descriptor attributes may not always be defined and should or
cannot not be accessed directly.

.. versionchanged:: 2.0
Prior to NumPy 2.0 the ABI was different but unnecessary large for user
DTypes. These accessors were all added in 2.0.

.. c:function:: PyObject *PyDataType_METADATA(PyArray_Descr *descr)
The Metadata attached to a dtype, either ``NULL`` or a dictionary.

.. c:function:: PyObject *PyDataType_NAMES(PyArray_Descr *descr)
``NULL`` or a list of structured field names attached to a dtype,
this list should not be mutated, NumPy may change the way fields are
stored in the future.

.. c:function:: PyObject *PyDataType_FIELDS(PyArray_Descr *descr)
``NULL``, ``None``, or a dict of structured dtype fields, this dict must
not be mutated, NumPy may change the way fields are stored in the future.

.. c:function:: NpyAuxData *PyDataType_C_METADATA(PyArray_Descr *descr)
C-metadata object attached to a descriptor. This accessor should not
be needed usually. The C-Metadata field does provide access to the
datetime/timedelta time unit information.


Data-type checking
~~~~~~~~~~~~~~~~~~

Expand Down
17 changes: 17 additions & 0 deletions doc/source/release/2.0.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,17 @@ to adapt your code and achieve compatibility with both 1.x and 2.x.
(`gh-25792 <https://github.com/numpy/numpy/pull/25792>`__)


Structured dtype information access through functions
-----------------------------------------------------
The dtype structures fields `c_metadata``, ``names``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing backtick on c_metadata

``fields``, and ``subarray`` must now be accessed through new
functions following the same names, such as ``PyDataType_NAMES``.
Direct access of the fields is not valid as they do not exist for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you write PyDataType_NAMES but in the .rst above you call it PyTypeNum_NAMES - I think the former is better but it should be consistent.

all ``PyArray_Descr`` instances.
The ``metadata`` field is kept, but the macro version should also be preferred.

(`gh-25802 <https://github.com/numpy/numpy/pull/25802>`__)

NumPy 2.0 C API removals
========================

Expand Down Expand Up @@ -487,6 +498,12 @@ NumPy 2.0 C API removals

(`gh-25292 <https://github.com/numpy/numpy/pull/25292>`__)


* ``PyDataType_GetDatetimeMetaData`` has been removed, it did not actually
do anything since at least NumPy 1.7.

(`gh-25802 <https://github.com/numpy/numpy/pull/25802>`__)

``PyArray_GetCastFunc`` was removed
-----------------------------------
Note that custom legacy user dtypes can still provide a castfunc
Expand Down
5 changes: 5 additions & 0 deletions numpy/_core/include/numpy/arrayscalars.h
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,12 @@ typedef struct {
typedef struct {
PyObject_VAR_HEAD
char *obval;
#if defined(NPY_INTERNAL_BUILD) && NPY_INTERNAL_BUILD
/* Internally use the subclass to allow accessing names/fields */
_PyArray_LegacyDescr *descr;
#else
PyArray_Descr *descr;
#endif
int flags;
PyObject *base;
#if NPY_FEATURE_VERSION >= NPY_1_20_API_VERSION
Expand Down
85 changes: 83 additions & 2 deletions numpy/_core/include/numpy/ndarraytypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,10 @@ typedef struct {
NPY_ITEM_IS_POINTER | NPY_ITEM_REFCOUNT | \
NPY_NEEDS_INIT | NPY_NEEDS_PYAPI)

#if !(defined(NPY_INTERNAL_BUILD) && NPY_INTERNAL_BUILD)
/*
* Public version of the Descriptor struct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - why does the public version still have those things like names. Isn't the point that one should use the accessors? Or am I misreading the above if?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are not: The point is that I am splitting this into multiple PRs and this one doesn't actually change anything. The only reason I added the release note here, was because Nathan asked for it, and I agreed it might be easier to reason about it here than in a PR that breaks more things.

*/
typedef struct _PyArray_Descr {
PyObject_HEAD
/*
Expand Down Expand Up @@ -633,6 +637,55 @@ typedef struct _PyArray_Descr {

} PyArray_Descr;

#else /* internal build */

// TODO: This split definition only exists for piece-meal transitioning
// as it allows change internal use without worrying about public API.
typedef struct _PyArray_Descr {
PyObject_HEAD
PyTypeObject *typeobj;
char kind;
char type;
char byteorder;
char flags;
int type_num;
int elsize;
int alignment;
/* except hash, the below fields will be legacy descriptor specific */
struct _arr_descr *unreachable_subarray;
PyObject *unreachable_fields;
PyObject *unreachable_names;
PyArray_ArrFuncs *_former_f;
PyObject *metadata;
NpyAuxData *unreachable_c_metadata;
npy_hash_t hash;
} PyArray_Descr;

#endif /* internal build */


/*
* Semi-private struct with additional field of legacy descriptors (must
* check NPY_DT_is_legacy before casting/accessing).
*/
typedef struct {
PyObject_HEAD
PyTypeObject *typeobj;
char kind;
char type;
char byteorder;
char flags;
int type_num;
int elsize;
int alignment;
struct _arr_descr *subarray;
PyObject *fields;
PyObject *names;
PyArray_ArrFuncs *_former_f;
PyObject *metadata;
NpyAuxData *c_metadata;
npy_hash_t hash;
} _PyArray_LegacyDescr;


/*
Expand Down Expand Up @@ -1620,6 +1673,7 @@ PyArray_CLEARFLAGS(PyArrayObject *arr, int flags)
#define PyTypeNum_ISOBJECT(type) ((type) == NPY_OBJECT)


#define PyDataType_ISLEGACY(dtype) ((dtype)->type_num < NPY_VSTRING && ((dtype)->type_num >= 0))
#define PyDataType_ISBOOL(obj) PyTypeNum_ISBOOL(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISUNSIGNED(obj) PyTypeNum_ISUNSIGNED(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISSIGNED(obj) PyTypeNum_ISSIGNED(((PyArray_Descr*)(obj))->type_num)
Expand All @@ -1633,8 +1687,8 @@ PyArray_CLEARFLAGS(PyArrayObject *arr, int flags)
#define PyDataType_ISUSERDEF(obj) PyTypeNum_ISUSERDEF(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISEXTENDED(obj) PyTypeNum_ISEXTENDED(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISOBJECT(obj) PyTypeNum_ISOBJECT(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_HASFIELDS(obj) (((PyArray_Descr *)(obj))->names != NULL)
#define PyDataType_HASSUBARRAY(dtype) ((dtype)->subarray != NULL)
#define PyDataType_HASFIELDS(obj) (PyDataType_ISLEGACY((PyArray_Descr*)(obj)) && ((_PyArray_LegacyDescr *)(obj))->names != NULL)
#define PyDataType_HASSUBARRAY(dtype) (PyDataType_ISLEGACY(dtype) && ((_PyArray_LegacyDescr *)dtype)->subarray != NULL)
#define PyDataType_ISUNSIZED(dtype) ((dtype)->elsize == 0 && \
!PyDataType_HASFIELDS(dtype))
#define PyDataType_MAKEUNSIZED(dtype) ((dtype)->elsize = 0)
Expand All @@ -1643,6 +1697,33 @@ PyArray_CLEARFLAGS(PyArrayObject *arr, int flags)
* npy_2_compat.h and are not defined here.
*/

/*
* Access inline functions for legacy fields. Except metadata these fields are
* specific to structured arrays (names, fields) or datetime (c_metadata).
* Although technically they may be used (but normally ignored) on non-struct
* dtypes as well.
* For structured dtypes, new ways to define and access fields make sense.
*/
static inline PyArray_ArrayDescr *
PyDataType_SUBARRAY(PyArray_Descr *dtype) {
return !PyDataType_ISLEGACY(dtype) ? NULL : ((_PyArray_LegacyDescr *)dtype)->subarray;
}

static inline PyObject *
PyDataType_NAMES(PyArray_Descr *dtype) {
return !PyDataType_ISLEGACY(dtype) ? NULL : ((_PyArray_LegacyDescr *)dtype)->names;
}

static inline PyObject *
PyDataType_FIELDS(PyArray_Descr *dtype) {
return !PyDataType_ISLEGACY(dtype) ? NULL : ((_PyArray_LegacyDescr *)dtype)->fields;
}

static inline NpyAuxData *
PyDataType_C_METADATA(PyArray_Descr *dtype) {
return !PyDataType_ISLEGACY(dtype) ? NULL : ((_PyArray_LegacyDescr *)dtype)->c_metadata;
}

#define PyArray_ISBOOL(obj) PyTypeNum_ISBOOL(PyArray_TYPE(obj))
#define PyArray_ISUNSIGNED(obj) PyTypeNum_ISUNSIGNED(PyArray_TYPE(obj))
#define PyArray_ISSIGNED(obj) PyTypeNum_ISSIGNED(PyArray_TYPE(obj))
Expand Down
7 changes: 0 additions & 7 deletions numpy/_core/include/numpy/npy_1_7_deprecated_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,6 @@
/* This way of accessing the default type is deprecated as of NumPy 1.7 */
#define PyArray_DEFAULT NPY_DEFAULT_TYPE

/* These DATETIME bits aren't used internally */
#define PyDataType_GetDatetimeMetaData(descr) \
((descr->metadata == NULL) ? NULL : \
((PyArray_DatetimeMetaData *)(PyCapsule_GetPointer( \
PyDict_GetItemString( \
descr->metadata, NPY_METADATA_DTSTR), NULL))))

/*
* Deprecated as of NumPy 1.7, this kind of shortcut doesn't
* belong in the public API.
Expand Down
2 changes: 2 additions & 0 deletions numpy/_core/include/numpy/npy_2_compat.h
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ PyArray_ImportNumPyAPI()
/* Aliases of 2.x names to 1.x only equivalent names */
#define NPY_NTYPES NPY_NTYPES_LEGACY
#define PyArray_DescrProto PyArray_Descr
/* NumPy 2 definition always works, but add it for 1.x only */
#define PyDataType_ISLEGACY(dtype) (1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically not true because of the experimental dtype flag but it's probably a weird thing to do that across the numpy 1.0/2.0 boundary so this is fine IMO.

#else
#define NPY_DEFAULT_INT \
(PyArray_RUNTIME_VERSION >= NPY_2_0_API_VERSION ? NPY_INTP : NPY_LONG)
Expand Down
6 changes: 3 additions & 3 deletions numpy/_core/src/multiarray/_multiarray_tests.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,7 @@ static PyObject *
create_custom_field_dtype(PyObject *NPY_UNUSED(mod), PyObject *args)
{
PyArray_DescrProto proto;
PyArray_Descr *dtype;
_PyArray_LegacyDescr *dtype; /* Is checked for void, so legacy is OK */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tend to go the other way around, and just recast or define a new variable when you need the legacy one. But this is fine as is too.

PyTypeObject *scalar_type;
int error_path;

Expand All @@ -510,7 +510,7 @@ create_custom_field_dtype(PyObject *NPY_UNUSED(mod), PyObject *args)
if (dtype->type_num != NPY_VOID || dtype->fields == NULL ||
!PyDict_CheckExact(dtype->fields) ||
PyTuple_Size(dtype->names) != 1 ||
!PyDataType_REFCHK(dtype) ||
!PyDataType_REFCHK((PyArray_Descr *)dtype) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically feel one should avoid recasting a legacy type into a "real" one (but really am nitpicking here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gets annoying since you cannot assign using the macros. Having both around is reasonable.
Also, what you don't see is that I removed a lot of changed lines of code again by going the cast route.

It would be nice to just define some of these internally in a way that supports the subclass LegacyDescr *, but it was few enough instances where this was needed that I didn't think it was very important.

dtype->elsize != sizeof(PyObject *)) {
PyErr_SetString(PyExc_ValueError,
"Bad dtype passed to test function, must be an object "
Expand All @@ -531,7 +531,7 @@ create_custom_field_dtype(PyObject *NPY_UNUSED(mod), PyObject *args)
proto.subarray = dtype->subarray;
proto.fields = dtype->fields;
proto.names = dtype->names;
proto.f = PyDataType_GetArrFuncs(dtype);
proto.f = PyDataType_GetArrFuncs((PyArray_Descr *)dtype);
proto.metadata = dtype->metadata;
proto.c_metadata = dtype->c_metadata;

Expand Down
3 changes: 2 additions & 1 deletion numpy/_core/src/multiarray/array_coercion.c
Original file line number Diff line number Diff line change
Expand Up @@ -1267,7 +1267,8 @@ PyArray_DiscoverDTypeAndShape(
flags |= DISCOVER_STRINGS_AS_SEQUENCES;
}
else if (requested_descr->type_num == NPY_VOID &&
(requested_descr->names || requested_descr->subarray)) {
(((_PyArray_LegacyDescr *)requested_descr)->names
|| ((_PyArray_LegacyDescr *)requested_descr)->subarray)) {
/* Void is a chimera, in that it may or may not be structured... */
flags |= DISCOVER_TUPLES_AS_ELEMENTS;
}
Expand Down
9 changes: 5 additions & 4 deletions numpy/_core/src/multiarray/arrayobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -598,11 +598,9 @@ _void_compare(PyArrayObject *self, PyArrayObject *other, int cmp_op)
return NULL;
}
if (PyArray_HASFIELDS(self) && PyArray_HASFIELDS(other)) {
PyArray_Descr *self_descr = PyArray_DESCR(self);
PyArray_Descr *other_descr = PyArray_DESCR(other);

/* Use promotion to decide whether the comparison is valid */
PyArray_Descr *promoted = PyArray_PromoteTypes(self_descr, other_descr);
PyArray_Descr *promoted = PyArray_PromoteTypes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do this promotion check before even defining self_descr and other_descr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably a bit nicer, changed it.

PyArray_DESCR(self), PyArray_DESCR(other));
if (promoted == NULL) {
PyErr_SetString(PyExc_TypeError,
"Cannot compare structured arrays unless they have a "
Expand All @@ -612,6 +610,9 @@ _void_compare(PyArrayObject *self, PyArrayObject *other, int cmp_op)
}
Py_DECREF(promoted);

_PyArray_LegacyDescr *self_descr = (_PyArray_LegacyDescr *)PyArray_DESCR(self);
_PyArray_LegacyDescr *other_descr = (_PyArray_LegacyDescr *)PyArray_DESCR(other);

npy_intp result_ndim = PyArray_NDIM(self) > PyArray_NDIM(other) ?
PyArray_NDIM(self) : PyArray_NDIM(other);

Expand Down
Loading