Skip to content

Missingdata - Reorganize nditer implementation into a few files #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 89 additions & 9 deletions doc/neps/missing-data.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
:Title: Missing Data Functionality in NumPy
:Author: Mark Wiebe <mwwiebe@gmail.com>
:Copyright: Copyright 2011 by Enthought, Inc
:License: CC By-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)
:Date: 2011-06-23

*****************
Expand Down Expand Up @@ -167,14 +169,14 @@ Because the above discussions of the different concepts and their
relationships are tricky to understand, here are more succinct
definitions of the terms used in this NEP.

NA (Not Available)
NA (Not Available/Propagate)
A placeholder for a value which is unknown to computations. That
value may be temporarily hidden with a mask, may have been lost
due to hard drive corruption, or gone for any number of reasons.
For sums and products this means to produce NA if any of the inputs
are NA. This is the same as NA in the R project.
are NA. This is the same as NA in the R project.

IGNORE (Skip/Ignore)
IGNORE (Ignore/Skip)
A placeholder which should be treated by computations as if no value does
or could exist there. For sums, this means act as if the value
were zero, and for products, this means act as if the value were one.
Expand Down Expand Up @@ -690,13 +692,91 @@ There are 2 (or 3) flags which must be added to the array flags::
/* To possibly add in a later revision */
NPY_ARRAY_HARDNAMASK

******************************
C API Access: Masked Iteration
******************************
To allow the easy detection of NA support, and whether an array
has any missing values, we add the following functions:

TODO: Describe details about how the nditer will be extended to allow
functions to do masked iteration, transparently working with both
NA dtypes or masked arrays in one implementation.
PyDataType_HasNASupport(PyArray_Descr* dtype)
Returns true if this is an NA dtype, or a struct
dtype where every field has NA support.

PyArray_HasNASupport(PyArrayObject* obj)
Returns true if the array dtype has NA support, or
the array has an NA mask.

PyArray_ContainsNA(PyArrayObject* obj)
Returns false if the array has no NA support. Returns
true if the array has NA support AND there is an
NA anywhere in the array.

********************************************
C Iterator API Changes: Iteration With Masks
********************************************

For iteration and computation with masks, both in the context of missing
values and when the mask is used like the 'where=' parameter in ufuncs,
extending the nditer is the most natural way to expose this functionality.

Masked operations need to work with casting, alignment, and anything else
which causes values to be copied into a temporary buffer, something which
is handled nicely by the nditer but difficult to do outside that context.

First we describe iteration designed for use of masks outside the
context of missing values, then the features which include missing
value support.

Iterator Mask Features
======================

We add several new per-operand flags:

NPY_ITER_WRITEMASKED
Indicates that any copies done from a buffer to the array are
masked. This is necessary because READWRITE mode could destroy
data if a float array was being treated like an int array, so
copying to the buffer and back would truncate to integers. No
similar flag is provided for reading, because it may not be possible
to know the mask ahead of time, and copying everything into
the buffer will never destroy data.

NPY_ITER_ARRAYMASK
Indicates that this array is a boolean mask to use when copying
any WRITEMASKED argument from a buffer back to the array. There
can be only one such mask, and there cannot also be a virtual
mask.

As a special case, if the flag NPY_ITER_USE_NAMASK is specified
at the same time, the mask for the operand is used instead
of the operand itself. If the operand has no mask but is
based on an NA dtype, that mask exposed by the iterator converts
into the NA bitpattern when copying from the buffer to the
array.

NPY_ITER_VIRTUALMASK
Indicates that the mask is not an array, but rather created on
the fly by the inner iteration code. This allocates enough buffer
space for the code to write the mask into, but does not have
an actual array backing the data. There can only be one such
mask, and there cannot also be an array mask.

Iterator NA-array Features
==========================

We add several new per-operand flags:

NPY_ITER_USE_NAMASK
If the operand has an NA dtype, an NA mask, or both, this adds a new
virtual operand to the end of the operand list which iterates
over the mask of the particular operand.

NPY_ITER_IGNORE_NAMASK
If an operand has an NA mask, by default the iterator will raise
an exception unless NPY_ITER_USE_NAMASK is specified. This flag
disables that check, and is intended for cases where one has first
checked that all the elements in the array are not NA using the
PyArray_ContainsNA function.

If the dtype is an NA dtype, this also strips the NA-ness from the
dtype, showing a dtype that does not support NA.

********************
Rejected Alternative
Expand Down
4 changes: 3 additions & 1 deletion numpy/core/SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@ umath_loops_src = env.GenerateFromTemplate(pjoin('src', 'umath', 'loops.c.src'))
arraytypes_src = env.GenerateFromTemplate(
pjoin('src', 'multiarray', 'arraytypes.c.src'))
nditer_src = env.GenerateFromTemplate(
pjoin('src', 'multiarray', 'nditer.c.src'))
pjoin('src', 'multiarray', 'nditer_templ.c.src'))
lowlevel_strided_loops_src = env.GenerateFromTemplate(
pjoin('src', 'multiarray', 'lowlevel_strided_loops.c.src'))
einsum_src = env.GenerateFromTemplate(pjoin('src', 'multiarray', 'einsum.c.src'))
Expand Down Expand Up @@ -469,6 +469,8 @@ if ENABLE_SEPARATE_COMPILATION:
pjoin('src', 'multiarray', 'buffer.c'),
pjoin('src', 'multiarray', 'numpymemoryview.c'),
pjoin('src', 'multiarray', 'scalarapi.c'),
pjoin('src', 'multiarray', 'nditer_api.c'),
pjoin('src', 'multiarray', 'nditer_constr.c'),
pjoin('src', 'multiarray', 'nditer_pywrap.c'),
pjoin('src', 'multiarray', 'dtype_transfer.c')]
multiarray_src.extend(arraytypes_src)
Expand Down
4 changes: 3 additions & 1 deletion numpy/core/code_generators/genapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,10 @@
join('multiarray', 'datetime_strings.c'),
join('multiarray', 'datetime_busday.c'),
join('multiarray', 'datetime_busdaycal.c'),
join('multiarray', 'nditer.c.src'),
join('multiarray', 'nditer_api.c'),
join('multiarray', 'nditer_constr.c'),
join('multiarray', 'nditer_pywrap.c'),
join('multiarray', 'nditer_templ.c.src'),
join('multiarray', 'einsum.c.src'),
join('umath', 'ufunc_object.c'),
join('umath', 'ufunc_type_resolution.c'),
Expand Down
72 changes: 48 additions & 24 deletions numpy/core/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -686,7 +686,7 @@ def generate_multiarray_templated_sources(ext, build_dir):
subpath = join('src', 'multiarray')
sources = [join(local_dir, subpath, 'scalartypes.c.src'),
join(local_dir, subpath, 'arraytypes.c.src'),
join(local_dir, subpath, 'nditer.c.src'),
join(local_dir, subpath, 'nditer_templ.c.src'),
join(local_dir, subpath, 'lowlevel_strided_loops.c.src'),
join(local_dir, subpath, 'einsum.c.src')]

Expand Down Expand Up @@ -714,6 +714,7 @@ def generate_multiarray_templated_sources(ext, build_dir):
join('src', 'multiarray', 'mapping.h'),
join('src', 'multiarray', 'methods.h'),
join('src', 'multiarray', 'multiarraymodule.h'),
join('src', 'multiarray', 'nditer_impl.h'),
join('src', 'multiarray', 'numpymemoryview.h'),
join('src', 'multiarray', 'number.h'),
join('src', 'multiarray', 'numpyos.h'),
Expand All @@ -723,45 +724,68 @@ def generate_multiarray_templated_sources(ext, build_dir):
join('src', 'multiarray', 'shape.h'),
join('src', 'multiarray', 'ucsnarrow.h'),
join('src', 'multiarray', 'usertypes.h'),
join('src', 'private', 'lowlevel_strided_loops.h')]
join('src', 'private', 'lowlevel_strided_loops.h'),
join('include', 'numpy', 'arrayobject.h'),
join('include', 'numpy', '_neighborhood_iterator_imp.h'),
join('include', 'numpy', 'npy_endian.h'),
join('include', 'numpy', 'old_defines.h'),
join('include', 'numpy', 'arrayscalars.h'),
join('include', 'numpy', 'noprefix.h'),
join('include', 'numpy', 'npy_interrupt.h'),
join('include', 'numpy', 'oldnumeric.h'),
join('include', 'numpy', 'npy_3kcompat.h'),
join('include', 'numpy', 'npy_math.h'),
join('include', 'numpy', 'halffloat.h'),
join('include', 'numpy', 'npy_common.h'),
join('include', 'numpy', 'npy_os.h'),
join('include', 'numpy', 'utils.h'),
join('include', 'numpy', 'ndarrayobject.h'),
join('include', 'numpy', 'npy_cpu.h'),
join('include', 'numpy', 'numpyconfig.h'),
join('include', 'numpy', 'ndarraytypes.h'),
join('include', 'numpy', 'npy_deprecated_api.h'),
join('include', 'numpy', '_numpyconfig.h.in'),
]

multiarray_src = [
join('src', 'multiarray', 'multiarraymodule.c'),
join('src', 'multiarray', 'hashdescr.c'),
join('src', 'multiarray', 'arrayobject.c'),
join('src', 'multiarray', 'numpymemoryview.c'),
join('src', 'multiarray', 'arraytypes.c.src'),
join('src', 'multiarray', 'buffer.c'),
join('src', 'multiarray', 'calculation.c'),
join('src', 'multiarray', 'common.c'),
join('src', 'multiarray', 'convert.c'),
join('src', 'multiarray', 'convert_datatype.c'),
join('src', 'multiarray', 'conversion_utils.c'),
join('src', 'multiarray', 'ctors.c'),
join('src', 'multiarray', 'datetime.c'),
join('src', 'multiarray', 'datetime_strings.c'),
join('src', 'multiarray', 'datetime_busday.c'),
join('src', 'multiarray', 'datetime_busdaycal.c'),
join('src', 'multiarray', 'numpyos.c'),
join('src', 'multiarray', 'conversion_utils.c'),
join('src', 'multiarray', 'flagsobject.c'),
join('src', 'multiarray', 'descriptor.c'),
join('src', 'multiarray', 'dtype_transfer.c'),
join('src', 'multiarray', 'einsum.c.src'),
join('src', 'multiarray', 'flagsobject.c'),
join('src', 'multiarray', 'getset.c'),
join('src', 'multiarray', 'hashdescr.c'),
join('src', 'multiarray', 'item_selection.c'),
join('src', 'multiarray', 'iterators.c'),
join('src', 'multiarray', 'lowlevel_strided_loops.c.src'),
join('src', 'multiarray', 'mapping.c'),
join('src', 'multiarray', 'methods.c'),
join('src', 'multiarray', 'multiarraymodule.c'),
join('src', 'multiarray', 'nditer_templ.c.src'),
join('src', 'multiarray', 'nditer_api.c'),
join('src', 'multiarray', 'nditer_constr.c'),
join('src', 'multiarray', 'nditer_pywrap.c'),
join('src', 'multiarray', 'number.c'),
join('src', 'multiarray', 'getset.c'),
join('src', 'multiarray', 'numpymemoryview.c'),
join('src', 'multiarray', 'numpyos.c'),
join('src', 'multiarray', 'refcount.c'),
join('src', 'multiarray', 'sequence.c'),
join('src', 'multiarray', 'methods.c'),
join('src', 'multiarray', 'ctors.c'),
join('src', 'multiarray', 'convert_datatype.c'),
join('src', 'multiarray', 'convert.c'),
join('src', 'multiarray', 'shape.c'),
join('src', 'multiarray', 'item_selection.c'),
join('src', 'multiarray', 'calculation.c'),
join('src', 'multiarray', 'common.c'),
join('src', 'multiarray', 'usertypes.c'),
join('src', 'multiarray', 'scalarapi.c'),
join('src', 'multiarray', 'refcount.c'),
join('src', 'multiarray', 'arraytypes.c.src'),
join('src', 'multiarray', 'scalartypes.c.src'),
join('src', 'multiarray', 'nditer.c.src'),
join('src', 'multiarray', 'lowlevel_strided_loops.c.src'),
join('src', 'multiarray', 'dtype_transfer.c'),
join('src', 'multiarray', 'nditer_pywrap.c'),
join('src', 'multiarray', 'einsum.c.src')]
join('src', 'multiarray', 'usertypes.c')]

if PYTHON_HAS_UNICODE_WIDE:
multiarray_src.append(join('src', 'multiarray', 'ucsnarrow.c'))
Expand Down
6 changes: 3 additions & 3 deletions numpy/core/src/multiarray/arraytypes.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -627,9 +627,9 @@ VOID_getitem(char *ip, PyArrayObject *ap)
strides[0] = 1;
descr = PyArray_DescrNewFromType(PyArray_BYTE);
u = PyArray_NewFromDescr(&PyArray_Type, descr, 1, dims, strides,
ip,
PyArray_ISWRITEABLE(ap) ? NPY_WRITEABLE : 0,
NULL);
ip,
PyArray_ISWRITEABLE(ap) ? NPY_ARRAY_WRITEABLE : 0,
NULL);
((PyArrayObject*)u)->base = ap;
Py_INCREF(ap);
}
Expand Down
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/getset.c
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ static PyObject *
array_data_get(PyArrayObject *self)
{
#if defined(NPY_PY3K)
return PyMemoryView_FromObject(self);
return PyMemoryView_FromObject((PyObject *)self);
#else
intp nbytes;
if (!(PyArray_ISONESEGMENT(self))) {
Expand Down
Loading