Skip to content

Missingdata - Finish masked iteration, fix ufunc 'where=' with buffered output, add documentation #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
45 changes: 44 additions & 1 deletion doc/source/reference/c-api.array.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2127,6 +2127,49 @@ an element copier function as a primitive.::
A macro which calls the auxdata's clone function appropriately,
returning a deep copy of the auxiliary data.

Masks for Selecting Elements to Modify
--------------------------------------

.. versionadded:: 1.7.0

The array iterator, :ctype:`NpyIter`, has some new flags which
allow control over which elements are intended to be modified,
providing the ability to do masking even when doing casts to a buffer
of a different type. Some inline functions have been added
to facilitate consistent usage of these masks.

A mask dtype can be one of three different possibilities. It can
be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose
fields are all mask dtypes.

A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying
value 1, for an element that is exposed, and False, with underlying
value 0, for an element that is hidden.

A mask of :cdata:`NPY_MASK` can additionally carry a payload which
is a value from 0 to 127. This allows for missing data implementations
based on such masks to support multiple reasons for data being missing.

A mask of a struct dtype can only pair up with another struct dtype
with the same field names. In this way, each field of the mask controls
the masking for the corresponding field in the associated data array.

Inline functions to work with masks are as follows.

.. cfunction:: npy_bool NpyMask_IsExposed(npy_mask mask)

Returns true if the data element corresponding to the mask element
can be modified, false if not.

.. cfunction:: npy_uint8 NpyMask_GetPayload(npy_mask mask)

Returns the payload contained in the mask. The return value
is between 0 and 127.

.. cfunction:: npy_mask NpyMask_Create(npy_bool exposed, npy_int8 payload)

Creates a mask from a flag indicating whether the element is exposed
or not and a payload value.

Array Iterators
---------------
Expand Down Expand Up @@ -2997,7 +3040,7 @@ Group 2
Priority
^^^^^^^^

.. cvar:: NPY_PRIOIRTY
.. cvar:: NPY_PRIORITY

Default priority for arrays.

Expand Down
174 changes: 165 additions & 9 deletions doc/source/reference/c-api.dtype.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,171 @@ Enumerated Types
There is a list of enumerated types defined providing the basic 24
data types plus some useful generic names. Whenever the code requires
a type number, one of these enumerated types is requested. The types
are all called :cdata:`NPY_{NAME}` where ``{NAME}`` can be
are all called :cdata:`NPY_{NAME}`:

**BOOL**, **BYTE**, **UBYTE**, **SHORT**, **USHORT**, **INT**,
**UINT**, **LONG**, **ULONG**, **LONGLONG**, **ULONGLONG**,
**HALF**, **FLOAT**, **DOUBLE**, **LONGDOUBLE**, **CFLOAT**,
**CDOUBLE**, **CLONGDOUBLE**, **DATETIME**, **TIMEDELTA**,
**OBJECT**, **STRING**, **UNICODE**, **VOID**
.. cvar:: NPY_BOOL

The enumeration value for the boolean type, stored as one byte.
It may only be set to the values 0 and 1.

.. cvar:: NPY_BYTE
.. cvar:: NPY_INT8

The enumeration value for an 8-bit/1-byte signed integer.

.. cvar:: NPY_SHORT
.. cvar:: NPY_INT16

The enumeration value for a 16-bit/2-byte signed integer.

.. cvar:: NPY_INT
.. cvar:: NPY_INT32

The enumeration value for a 32-bit/4-byte signed integer.

.. cvar:: NPY_LONG

Equivalent to either NPY_INT or NPY_LONGLONG, depending on the
platform.

.. cvar:: NPY_LONGLONG
.. cvar:: NPY_INT64

The enumeration value for a 64-bit/8-byte signed integer.

.. cvar:: NPY_UBYTE
.. cvar:: NPY_UINT8

The enumeration value for an 8-bit/1-byte unsigned integer.

.. cvar:: NPY_USHORT
.. cvar:: NPY_UINT16

The enumeration value for a 16-bit/2-byte unsigned integer.

.. cvar:: NPY_UINT
.. cvar:: NPY_UINT32

The enumeration value for a 32-bit/4-byte unsigned integer.

.. cvar:: NPY_ULONG

Equivalent to either NPY_UINT or NPY_ULONGLONG, depending on the
platform.

.. cvar:: NPY_ULONGLONG
.. cvar:: NPY_UINT64

The enumeration value for a 64-bit/8-byte unsigned integer.

.. cvar:: NPY_HALF
.. cvar:: NPY_FLOAT16

The enumeration value for a 16-bit/2-byte IEEE 754-2008 compatible floating
point type.

.. cvar:: NPY_FLOAT
.. cvar:: NPY_FLOAT32

The enumeration value for a 32-bit/4-byte IEEE 754 compatible floating
point type.

.. cvar:: NPY_DOUBLE
.. cvar:: NPY_FLOAT64

The enumeration value for a 64-bit/8-byte IEEE 754 compatible floating
point type.

.. cvar:: NPY_LONGDOUBLE

The enumeration value for a platform-specific floating point type which is
at least as large as NPY_DOUBLE, but larger on many platforms.

.. cvar:: NPY_CFLOAT
.. cvar:: NPY_COMPLEX64

The enumeration value for a 64-bit/8-byte complex type made up of
two NPY_FLOAT values.

.. cvar:: NPY_CDOUBLE
.. cvar:: NPY_COMPLEX128

The enumeration value for a 128-bit/16-byte complex type made up of
two NPY_DOUBLE values.

.. cvar:: NPY_CLONGDOUBLE

The enumeration value for a platform-specific complex floating point
type which is made up of two NPY_LONGDOUBLE values.

.. cvar:: NPY_DATETIME

The enumeration value for a data type which holds dates or datetimes with
a precision based on selectable date or time units.

.. cvar:: NPY_TIMEDELTA

The enumeration value for a data type which holds lengths of times in
integers of selectable date or time units.

.. cvar:: NPY_STRING

The enumeration value for ASCII strings of a selectable size. The
strings have a fixed maximum size within a given array.

.. cvar:: NPY_UNICODE

The enumeration value for UCS4 strings of a selectable size. The
strings have a fixed maximum size within a given array.

.. cvar:: NPY_OBJECT

The enumeration value for references to arbitrary Python objects.

.. cvar:: NPY_VOID

Primarily used to hold struct dtypes, but can contain arbitrary
binary data.

Some useful aliases of the above types are

.. cvar:: NPY_INTP

The enumeration value for a signed integer type which is the same
size as a (void \*) pointer. This is the type used by all
arrays of indices.

.. cvar:: NPY_UINTP

The enumeration value for an unsigned integer type which is the
same size as a (void \*) pointer.

.. cvar:: NPY_MASK

The enumeration value of the type used for masks, such as with
the :cdata:`NPY_ITER_ARRAYMASK` iterator flag. This is equivalent
to :cdata:`NPY_UINT8`.

.. cvar:: NPY_DEFAULT_TYPE

The default type to use when no dtype is explicitly specified, for
example when calling np.zero(shape). This is equivalent to
:cdata:`NPY_DOUBLE`.

Other useful related constants are

.. cvar:: NPY_NTYPES

The total number of built-in NumPy types. The enumeration covers
the range from 0 to NPY_NTYPES-1.

.. cvar:: NPY_NOTYPE

A signal value guaranteed not to be a valid type enumeration number.

.. cvar:: NPY_USERDEF

**NTYPES**, **NOTYPE**, **USERDEF**, **DEFAULT_TYPE**
The start of type numbers used for Custom Data types.

The various character codes indicating certain types are also part of
an enumerated list. References to type characters (should they be
Expand Down Expand Up @@ -116,9 +272,9 @@ types are available.
Integer that can hold a pointer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The constants **PyArray_INTP** and **PyArray_UINTP** refer to an
The constants **NPY_INTP** and **NPY_UINTP** refer to an
enumerated integer type that is large enough to hold a pointer on the
platform. Index arrays should always be converted to **PyArray_INTP**
platform. Index arrays should always be converted to **NPY_INTP**
, because the dimension of the array is of type npy_intp.


Expand Down
40 changes: 40 additions & 0 deletions doc/source/reference/c-api.iterator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -579,6 +579,46 @@ Construction and Destruction
Ensures that the input or output matches the iteration
dimensions exactly.

.. cvar:: NPY_ITER_ARRAYMASK

Indicates that this operand is the mask to use for
selecting elements when writing to operands which have
the :cdata:`NPY_ITER_WRITEMASKED` flag applied to them.
Only one operand may have :cdata:`NPY_ITER_ARRAYMASK` flag
applied to it.

The data type of an operand with this flag should be either
:cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype
whose fields are all valid mask dtypes. In the latter case,
it must match up with a struct operand being WRITEMASKED,
as it is specifying a mask for each field of that array.

This flag only affects writing from the buffer back to
the array. This means that if the operand is also
:cdata:`NPY_ITER_READWRITE` or :cdata:`NPY_ITER_WRITEONLY`,
code doing iteration can write to this operand to
control which elements will be untouched and which ones will be
modified. This is useful when the mask should be a combination
of input masks, for example. Mask values can be created
with the :cfunc:`NpyMask_Create` function.

.. cvar:: NPY_ITER_WRITEMASKED

Indicates that only elements which the operand with
the ARRAYMASK flag indicates are intended to be modified
by the iteration. In general, the iterator does not enforce
this, it is up to the code doing the iteration to follow
that promise. Code can use the :cfunc:`NpyMask_IsExposed`
inline function to test whether the mask at a particular
element allows writing.

When this flag is used, and this operand is buffered, this
changes how data is copied from the buffer into the array.
A masked copying routine is used, which only copies the
elements in the buffer for which :cfunc:`NpyMask_IsExposed`
returns true from the corresponding element in the ARRAYMASK
operand.

.. cfunction:: NpyIter* NpyIter_AdvancedNew(npy_intp nop, PyArrayObject** op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, npy_uint32* op_flags, PyArray_Descr** op_dtypes, int oa_ndim, int** op_axes, npy_intp* itershape, npy_intp buffersize)

Extends :cfunc:`NpyIter_MultiNew` with several advanced options providing
Expand Down
34 changes: 34 additions & 0 deletions numpy/core/include/numpy/ndarraytypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1440,6 +1440,40 @@ struct NpyAuxData_tag {
#define NPY_AUXDATA_CLONE(auxdata) \
((auxdata)->clone(auxdata))

/*********************************************************************
* NumPy functions for dealing with masks, such as in masked iteration
*********************************************************************/

typedef npy_uint8 npy_mask;
#define NPY_MASK NPY_UINT8

/*
* Bit 0 of the mask indicates whether a value is exposed
* or hidden. This is compatible with a 'where=' boolean
* mask, because NumPy booleans are 1 byte, and contain
* either the value 0 or 1.
*/
static NPY_INLINE npy_bool
NpyMask_IsExposed(npy_mask mask)
{
return (mask&0x01) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space around '&'.

}

/*
* Bits 1 through 7 of the mask contain the payload.
*/
static NPY_INLINE npy_uint8
NpyMask_GetPayload(npy_mask mask)
{
return ((npy_uint8)mask) >> 1;
}

static NPY_INLINE npy_mask
NpyMask_Create(npy_bool exposed, npy_uint8 payload)
{
return (npy_mask)(exposed != 0) | (npy_mask)(payload << 1);
}

/*
* This is the form of the struct that's returned pointed by the
* PyCObject attribute of an array __array_struct__. See
Expand Down
Loading