Skip to content

Missingdata - add NA mask, multitude of other improvements #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 167 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
167 commits
Select commit Hold shift + click to select a range
ac21b5c
ENH: missingdata: Add the NA mask members to PyArrayObject
Jul 19, 2011
b1de3d3
ENH: Create NA singleton file
Jul 21, 2011
7ed10a2
ENH: missingdata: Add boilerplate for NA singleton type
Jul 22, 2011
1a428dc
ENH: missingdata: Write NA repr function
Jul 22, 2011
2330869
ENH: missingdata: Make comparisons with NA return NA, raise on 'if np…
Jul 22, 2011
55787fa
ENH: missingdata: Define the standard Python arithmetic operations fo…
Jul 22, 2011
3a5729b
ENH: missingdata: Use NPY_NA_NOPAYLOAD instead of constant 255 everyw…
Jul 22, 2011
baa91f1
ENH: missingdata: Add (untested) functions for creating the NA mask
Jul 25, 2011
43625c1
ENH: missingdata: Have some basic assignment and indexing with NA wor…
Jul 25, 2011
e852eea
ENH: missingdata: Really simple printing with NA works in some cases …
Jul 25, 2011
2be9e2f
ENH: missingdata: Clean up object dtype detection to work with NAs pr…
Jul 26, 2011
e4c83b2
ENH: missingdata: Get printing of NAs to work a little bit better
Jul 26, 2011
08d2ad2
BLD: missingdata: Signature of arraydescr_short_construction_repr cha…
Jul 26, 2011
6920121
ENH: missingdata: Part way through supporting the NPY_ITER_USE_MASKNA…
Jul 26, 2011
3b81de2
ENH: missingdata: More progress towards NPY_ITER_USE_MASKNA flag support
Jul 26, 2011
fcc4aa5
ENH: missingdata: In progress exposing USE_MASKNA to Python numpy.nditer
Jul 27, 2011
62a9330
ENH: missingdata: Finish the initial implementation of numpy.isna
Jul 27, 2011
1e73500
ENH: missingdata: In progress making slicing work with NA masks
Jul 27, 2011
22a5433
ENH: missingdata: Get the NA mask working with slice indexing
Jul 28, 2011
2118213
ENH: missingdata: Rewrote boolean indexing to support NA masks
Jul 28, 2011
303976b
ENH: missingdata: Implemented boolean assignment, working with NA masks
Jul 28, 2011
e24070d
BUG: ma: Fix a bug in numpy.ma hardmasks, exposed by the boolean inde…
Jul 28, 2011
80cd93c
TST: core: Regression test of boolean indexing was invalid, fixed it
Jul 28, 2011
e258b15
ENH: core: Revert optimistic use of PyNumber_Index for indexing
Jul 29, 2011
4e0c34a
ENH: missingdata: Write NA mask support into PyArray_CopyInto
Jul 29, 2011
c2cc89d
ENH: missingdata: Fill in view function, add many tests for NA view f…
Jul 29, 2011
9fd0930
BUG: missingdata: Add deallocation of the NA mask
Jul 29, 2011
c1e9321
TST: missingdata: Cover some more unmasking assignment cases
Jul 29, 2011
8d23e09
TST: missingdata: Move maskna array tests to their own file
Jul 29, 2011
b2df767
ENH: missingdata: Fill in buffered NAMASK nditer, add maskna= to zero…
Jul 29, 2011
c03dfa2
ENH: missingdata: Some tests for MASKNA iteration in Python
Jul 29, 2011
c43b83d
TST: missingdata: Add test for buffered MASKNA iteration
Jul 30, 2011
e8133b2
ENH: missingdata: Make the nditer USE_MASKNA mode work with buffering
Jul 30, 2011
a4eb4e3
ENH: nditer: Allow a virtual mask of ones for non-masked USE_MASKNA o…
mwiebe Jul 30, 2011
b8ff043
ENH: missingdata: Got masked element-wise ufuncs working in prelimina…
mwiebe Jul 31, 2011
025cf09
ENH: missingdata: Progress towards supporting ufunc reduce with NA masks
mwiebe Jul 31, 2011
e351662
ENH: missingdata: Improve some raw iteration methods
mwiebe Aug 1, 2011
a19d7b2
ENH: missingdata: Write function for reducing the NA mask array
Aug 1, 2011
38b1501
ENH: ufunc: Split Reduce and Accumulate apart so that adding NA suppo…
Aug 1, 2011
88ef487
API: nditer: Rename NpyIter_GetMaskNAIndices to NpyIter_GetMaskNAInde…
Aug 2, 2011
b8bc5ba
ENH: ufunc: Rewrite PyUFunc_Reduce to be more general and easier to a…
Aug 2, 2011
e03ee65
TST: ufunc: Tweak ma test to have a good output parameter, disable cr…
Aug 2, 2011
9c1e319
ENH: umath: Make sum, prod, any, or functions use the .reduce method …
Aug 2, 2011
4c30971
ENH: ufunc: Add tests for zero-sized array reductions
Aug 2, 2011
35f4aad
DOC: ufunc: Document the 'axis=' improvments to reduction functions
Aug 2, 2011
7d66f42
ENH: Work in progress on arr.reshape, other misc changes
Aug 3, 2011
971c0fe
ENH: missingdata: Fix up more NA mask indexing, make things work in 2D
mwiebe Aug 3, 2011
ec558a0
ENH: missingdata: Change boolean indexing to broadcast to the left ma…
Aug 3, 2011
0dce68d
ENH: missingdata: Change things to help scipy pass its tests
Aug 3, 2011
4932736
ENH: missingdata: Fix the remaining scipy errors
Aug 3, 2011
c6e2035
ENH: missingdata: Fix an iterator MASKNA bug, fill in more missing stuff
Aug 3, 2011
32614bf
ENH: missingdata: Get ufunc reduce skipna=False working better
Aug 3, 2011
14fb150
ENH: missingdata: Some tests/fixes for reduce method with skipna=True
Aug 4, 2011
2de3c5e
ENH: missingdata: Make reduction with skipna=True work better in some…
Aug 4, 2011
265fd05
ENH: missingdata: Fix construction of maskna array with all NAs
Aug 4, 2011
6127547
ENH: missingdata: Make the ndarray.take function work with NA masks
Aug 4, 2011
fd8b4cf
ENH: missingdata: Try to get basic NA printing to be ok
Aug 4, 2011
d023467
ENH: missingdata: Another NA array formatting tweak
Aug 4, 2011
1ab340c
ENH: missingdata: Add skipna parameters to sum, prod, etc
Aug 4, 2011
18a00c5
BUG: missingdata: Was adding MASKNA even when not desired currently
Aug 4, 2011
5968d49
ENH: missingdata: Make NA ufuncs work with output parameters properly
Aug 4, 2011
bdaad3f
BUG: missingdata: Negative strides bug in USE_MASKNA nditer mode
Aug 5, 2011
d684260
ENH: missingdata: Adding NA support to various methods
mwiebe Aug 5, 2011
3a9cb0b
WIP: fixing reduce NA bug
mwiebe Aug 5, 2011
612269c
ENH: missingdata: Writing some low level general array assignment rou…
mwiebe Aug 6, 2011
0d154a0
ENH: missingdata: Implement wheremask scalar assignment
mwiebe Aug 6, 2011
0c8ed1e
ENH: missingata: Move the alignment check out of the assignment funct…
mwiebe Aug 6, 2011
31d39c4
ENH: missingdata: Use a static buffer in scalar assignment for small …
mwiebe Aug 6, 2011
1ada942
ENH: missingdata: Implement wheremasked scalar assignment with overwr…
mwiebe Aug 7, 2011
723d330
ENH: missingdata: Finish implementation of scalar to array assignment
mwiebe Aug 7, 2011
49860a6
ENH: missingdata: Fix remaining issues in scalar -> array assignment …
mwiebe Aug 7, 2011
5467312
ENH: missingdata: Change FillWithZero/One to AssignZero/One, add para…
mwiebe Aug 7, 2011
c4531f5
ENH: core: Rewrite PyArray_FillWithScalar to use array_assign_scalar
mwiebe Aug 7, 2011
3cece15
ENH: core: Make 1-byte low level loops use memset
mwiebe Aug 7, 2011
7bdbc84
ENH: missingdata: Implement routine for array to array assignment
mwiebe Aug 8, 2011
9b70c5a
ENH: core: Make the array assignment routine handle overlapping arrays
mwiebe Aug 8, 2011
9934d62
ENH: core: Make PyArray_CopyInto and PyArray_MoveInto be calls to arr…
mwiebe Aug 8, 2011
3b0ae1d
ENH: core: Add tests for copyto function with new array_assign_array …
mwiebe Aug 8, 2011
4322278
ENH: umath: Fix reduce with NAs for ufuncs that have no unit
mwiebe Aug 8, 2011
f27cb2b
BUG: missingdata: Fleshing things out, tracking down a memory corruption
mwiebe Aug 12, 2011
69c8ae3
DOC: missingdata: Add some NA mask info to the documentation
mwiebe Aug 15, 2011
ce491d1
ENH: missingdata: Make arr.item() and arr.itemset() work with NA masks
mwiebe Aug 16, 2011
d9582a0
ENH: missingdata: Move the new MultiIndex Get/Set Item functions into…
mwiebe Aug 16, 2011
755a017
BUG: missingdata: Fix src_itemsize in USE_MASKNA copy to buffer
mwiebe Aug 16, 2011
4d8f9b1
ENH: ufunc: Move the default SAME_KIND casting rule out of the ufunc …
mwiebe Aug 16, 2011
4f17ef7
ENH: missingdata: Add NA support to np.diagonal, change np.diagonal t…
mwiebe Aug 16, 2011
106fbbb
ENH: missingdata: Rewrite PyArray_Concatenate to work with NA masks
mwiebe Aug 16, 2011
9e5e58f
BUG: missingdata: The ndmin parameter to np.array wasn't respecting N…
mwiebe Aug 17, 2011
15cd288
BUG: missingdata: Fix mask usage in PyArray_TakeFrom, add tests for it
mwiebe Aug 17, 2011
381a3dc
ENH: missingdata: Add nastr= parameter to np.set_printoptions()
mwiebe Aug 17, 2011
62326cb
ENH: missingdata: trying some more functions to see how they treat NAs
mwiebe Aug 17, 2011
428712c
ENH: missingdata: Change default to create NA-mask when NAs are in lists
mwiebe Aug 17, 2011
b21a148
ENH: missingdata: Move some of the refactored reduction code into the…
mwiebe Aug 17, 2011
fca113a
ENH: missingdata: Towards making count_nonzero a full-featured reduct…
mwiebe Aug 17, 2011
1588375
ENH: missingdata: Finish count_nonzero as a full-fledged reduction op…
mwiebe Aug 17, 2011
a6067cd
ENH: missingdata: Move the Reduce boilerplate into a function PyArray…
mwiebe Aug 17, 2011
9e30d2c
ENH: missingdata: Create count_reduce_items function
mwiebe Aug 18, 2011
3bb23d6
ENH: missingdata: Support 'skipna=' parameter in np.mean
mwiebe Aug 18, 2011
bb11a53
ENH: missingdata: Implement skipna= support for np.std and np.var
mwiebe Aug 18, 2011
d5490ff
ENH: missingdata: Implement tests for np.std, add skipna= and keepdim…
mwiebe Aug 18, 2011
da401c7
BUG: nditer: NA masks in arrays with leading 1 dimensions had an issue
mwiebe Aug 18, 2011
b2cfb1c
TST: missingdata: Finish up NA mask tests for np.std and np.var
mwiebe Aug 18, 2011
71a4091
BUG: Some bugs in squeeze and concatenate found by testing SciPy
mwiebe Aug 18, 2011
79ea9a3
DOC: Small tweak to release notes
mwiebe Aug 18, 2011
200564e
BUG: nditer: The nditer was reusing the reduce loop inappropriately (…
mwiebe Aug 18, 2011
b836ba0
BLD: Failure in single file build mode because of a static function i…
mwiebe Aug 19, 2011
3c27a5e
TST: datetime: Change pytz warning message about skipping pytz tests
mwiebe Aug 19, 2011
e3ad278
BLD: missingdata: Fixes for Python 3
mwiebe Aug 19, 2011
70539af
DOC: Tweak to the release notes
mwiebe Aug 19, 2011
b4075c3
BLD: keep VC compiler happy by moving the threading macros after vari…
87 Aug 19, 2011
bacbf75
BLD: keep VC happy by moving inlined variable definitions to top
87 Aug 19, 2011
7f2f414
BUG: fix crash in test_datetime_as_string
87 Aug 19, 2011
ad4635f
BUG: ufunc: Fix bug in multi-dimensional reduction without a unit
mwiebe Aug 19, 2011
3cfccf7
DOC: nditer: Improve the comment doc about the new NpyIter_IsFirstVis…
mwiebe Aug 19, 2011
e5ba9f4
NEP: missingdata: Some fixes and updates to the NEP
mwiebe Aug 19, 2011
8e8d598
BUG: ufunc: Missed a small update for the unitless reduction case
mwiebe Aug 19, 2011
5541192
STY: core: Move some misc converters into conversion_utils.c to clean…
mwiebe Aug 20, 2011
0a22bfd
ENH: missingdata: Add the maskna= parameter to np.ones and np.ones_like
mwiebe Aug 20, 2011
ddc7e6e
ENH: missingdata: Add maskna= parameter to np.linspace and np.logspace
mwiebe Aug 20, 2011
a702e7e
DOC: nditer: Document NpyIter_IsFirstVisit function
mwiebe Aug 20, 2011
9d9f549
ENH: missingdata: Change NA repr to use strings like the dtype repr does
mwiebe Aug 20, 2011
4a0b15b
BUG: repr: Make NA line up in the float array repr like inf and nan
mwiebe Aug 20, 2011
92ffa03
ENH: missingdata: Improve error message when assigning NA to non-NA-m…
mwiebe Aug 20, 2011
0e4c96c
BUG: missingdata: np.isna function wasn't accepting object arrays
mwiebe Aug 20, 2011
890c9e4
BUG: missingdata: Fix long double printing of NAs
mwiebe Aug 20, 2011
00e150e
ENH: ufunc: Separate type resolution from loop selection
mwiebe Aug 21, 2011
3ad0cd8
ENH: umath: Add checking for reorderable ufuncs, add PyArray_ReduceWr…
mwiebe Aug 21, 2011
9b4ff64
ENH: umath: Switch PyUFunc_Reduce to call PyArray_ReduceWrapper to si…
mwiebe Aug 22, 2011
dbabb8e
ENH: ufunc: Add a mask dtype parameter to the masked ufunc loop selector
mwiebe Aug 22, 2011
ce2f51d
DOC: Add info to the release notes about the full boolean indexing ch…
mwiebe Aug 22, 2011
bede98e
ENH: missingdata: Add maskna= parameter to np.copy and ndarray.copy
mwiebe Aug 22, 2011
0f0bce9
ENH: missingdata: Make ndarray.view validate the maskna= parameter be…
mwiebe Aug 22, 2011
9e9df32
BUG: dtype: Mitigate crash bug for some kinds of dtype inputs
mwiebe Aug 22, 2011
023573c
ENH: ufunc: Remove CreateReduceResult and InitializeReduceResult from…
mwiebe Aug 22, 2011
616a0af
ENH: missingdata: Add wheremask to PyArray_ContainsNA
mwiebe Aug 23, 2011
934e50b
ENH: missingdata: Move ReduceMaskNAArray out of the public API
mwiebe Aug 23, 2011
fe720c2
ENH: missingdata: Future-proof AssignNA and AssignMaskNA for later mu…
mwiebe Aug 23, 2011
3ba3937
ENH: missingdata: Add support for identity-less skipna reductions wit…
mwiebe Aug 23, 2011
dfd12cf
ENH: missingdata: Add skipna=, keepdims= parameters to methods
mwiebe Aug 23, 2011
d7fadab
ENH: core: Add static caching of the callables for C to core._method …
mwiebe Aug 23, 2011
65e20d8
BUG: umath: Make the ufunc follow the actual priority for __r<op>__
mwiebe Aug 24, 2011
e0e0abb
ENH: missingdata: Add maskna= flag to np.eye constructor
mwiebe Aug 24, 2011
8abcd4d
ENH: missingdata: Add maskna= flag to np.identity constructor
mwiebe Aug 24, 2011
59bdcd2
ENH: missingdata: Add maskna= and ownmaskna= parameters to np.asarray…
mwiebe Aug 24, 2011
53f831a
BUG: missingdata: Add support for NA masks to the order='K' case of n…
mwiebe Aug 24, 2011
8a5a168
ENH: missingdata: Rename na_singleton.[ch] to na_object.[ch]
mwiebe Aug 24, 2011
b7b4be1
ENH: missingdata: Expose Npy_NA as a global singleton, same as Py_None
mwiebe Aug 24, 2011
3e78681
ENH: missingdata: Finish adding C-API access to the NpyNA object
mwiebe Aug 24, 2011
46551bc
DOC: missingdata: Documenting C API for NA-masked arrays
mwiebe Aug 24, 2011
3db1939
DOC: missingdata: Add example of a C-API function supporting NA masks
mwiebe Aug 24, 2011
6b71702
DOC: missingdata: Some tweaks to the NA mask documentation
mwiebe Aug 25, 2011
6891fb7
ENH: core: Rename PyArrayObject_fieldaccess to PyArrayObject_fields
mwiebe Aug 25, 2011
502acec
DOC: missingdata: Add introductory documentation for NA-masked arrays
mwiebe Aug 25, 2011
95595a0
DOC: missingdata: Also show what assigning a non-NA value does in eac…
mwiebe Aug 25, 2011
6767798
ENH: missingdata: Make numpy.all follow the NA || True == True rule
mwiebe Aug 26, 2011
b276b34
ENH: missingdata: Make numpy.all follow the NA && False == False rule
mwiebe Aug 26, 2011
6b7bdc9
TST: missingdata: Write some tests for the np.any and np.all NA behavior
mwiebe Aug 26, 2011
3981e93
STY: Remove trailing whitespace
mwiebe Aug 26, 2011
66d289a
TST: dtype: Adjust void dtype test to pass without raising a zero-siz…
mwiebe Aug 26, 2011
5d31409
DOC: Mention the update to np.all and np.any in the release notes
mwiebe Aug 26, 2011
5874e97
BLD: core: onefile build fix and Python3 compatibility change
mwiebe Aug 26, 2011
0fdad1c
ENH: missingdata: Make comparisons with NA return NA(dtype='bool')
mwiebe Aug 26, 2011
d76dda2
ENH: nditer: Change the Python nditer exposure to automatically add N…
mwiebe Aug 26, 2011
66f97de
DOC: missingdata: Updates based on pull request feedback
mwiebe Aug 26, 2011
64554e6
DOC: missingdata: Updates from pull request feedback
mwiebe Aug 26, 2011
82a1b69
DOC: missingdata: Add a mention of the design NEP, and masks vs bitpa…
mwiebe Aug 26, 2011
61af4b9
ENH: missingdata: Make PyArray_Converter and PyArray_OutputConverter …
mwiebe Aug 26, 2011
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 145 additions & 61 deletions doc/neps/missing-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -237,54 +237,41 @@ mask [Exposed, Exposed, Hidden, Exposed], and
values [1.0, 2.0, <NA bitpattern>, 7.0] for the masked and
NA dtype versions respectively.

It may be worth overloading the np.NA __call__ method to accept a dtype,
returning a zero-dimensional array with a missing value of that dtype.
Without doing this, NA printouts would look like::
The np.NA singleton may accept a dtype= keyword parameter, indicating
that it should be treated as an NA of a particular data type. This is also
a mechanism for preserving the dtype in a NumPy scalar-like fashion.
Here's what this looks like::

>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True))
array(NA, dtype='float64', maskna=True)
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]'))
array(NA, dtype='NA[<f8]')

but with this, they could be printed as::

>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True))
NA('float64')
NA(dtype='<f8')
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]'))
NA('NA[<f8]')
NA(dtype='NA[<f8]')

Assigning a value to an array always causes that element to not be NA,
transparently unmasking it if necessary. Assigning numpy.NA to the array
masks that element or assigns the NA bitpattern for the particular dtype.
In the mask-based implementation, the storage behind a missing value may never
be accessed in any way, other than to unmask it by assigning its value.

While numpy.NA works to mask values, it does not itself have a dtype.
This means that returning the numpy.NA singleton from an operation
like 'arr[0]' would be throwing away the dtype, which is still
valuable to retain, so 'arr[0]' will return a zero-dimensional
array either with its value masked, or containing the NA bitpattern
for the array's dtype. To test if the value is missing, the function
"np.isna(arr[0])" will be provided. One of the key reasons for the
NumPy scalars is to allow their values into dictionaries. Having a
missing value as the key in a dictionary is a bad idea, so the NumPy
scalars will not support missing values in any form.
To test if a value is missing, the function "np.isna(arr[0])" will
be provided. One of the key reasons for the NumPy scalars is to allow
their values into dictionaries.

All operations which write to masked arrays will not affect the value
unless they also unmask that value. This allows the storage behind
masked elements to still be relied on if they are still accessible
from another view which doesn't have them masked. For example::
from another view which doesn't have them masked. For example, the
following was run on the missingdata work-in-progress branch::

>>> a = np.array([1,2])
>>> b = a.view()
>>> b.flags.hasmaskna = True
>>> b = a.view(maskna=True)
>>> b
array([1,2], maskna=True)
array([1, 2], maskna=True)
>>> b[0] = np.NA
>>> b
array([NA,2], maskna=True)
array([NA, 2], maskna=True)
>>> a
array([1,2])
array([1, 2])
>>> # The underlying number 1 value in 'a[0]' was untouched

Copying values between the mask-based implementation and the
Expand All @@ -308,9 +295,16 @@ performance in unexpected ways.

By default, the string "NA" will be used to represent missing values
in str and repr outputs. A global configuration will allow
this to be changed. The array2string function will also gain a
'nastr=' parameter so this could be changed to "<missing>" or
other values people may desire.
this to be changed, exactly extending the way nan and inf are treated.
The following works in the current draft implementation::

>>> a = np.arange(6, maskna=True)
>>> a[3] = np.NA
>>> a
array([0, 1, 2, NA, 4, 5], maskna=True)
>>> np.set_printoptions(nastr='blah')
>>> a
array([0, 1, 2, blah, 4, 5], maskna=True)

For floating point numbers, Inf and NaN are separate concepts from
missing values. If a division by zero occurs in an array with default
Expand All @@ -322,14 +316,19 @@ these semantics without the extra manipulation.

A manual loop through a masked array like::

for i in xrange(len(a)):
a[i] = np.log(a[i])
>>> a = np.arange(5., maskna=True)
>>> a[3] = np.NA
>>> a
array([ 0., 1., 2., NA, 4.], maskna=True)
>>> for i in xrange(len(a)):
... a[i] = np.log(a[i])
...
__main__:2: RuntimeWarning: divide by zero encountered in log
>>> a
array([ -inf, 0. , 0.69314718, NA, 1.38629436], maskna=True)

works even with masked values, because 'a[i]' returns a zero-dimensional
array with a missing value instead of the singleton np.NA for the missing
elements. If np.NA was returned, np.log would have to raise an exception
because it doesn't know the log of which dtype it's meant to call, whether
it's a missing float or a missing string, for example.
works even with masked values, because 'a[i]' returns an NA object
with a data type associated, that can be treated properly by the ufuncs.

Accessing a Boolean Mask
========================
Expand All @@ -348,20 +347,42 @@ instead of masked and unmasked values. The functions are
'np.isna' and 'np.isavail', which test for NA or available values
respectively.

Creating Masked Arrays
======================
Creating NA-Masked Arrays
=========================

There are two flags which indicate and control the nature of the mask
used in masked arrays.
The usual way to create an array with an NA mask is to pass the keyword
parameter maskna=True to one of the constructors. Most functions that
create a new array take this parameter, and produce an NA-masked
array with all its elements exposed when the parameter is set to True.

First is 'arr.flags.hasmaskna', which is True for all masked arrays and
There are also two flags which indicate and control the nature of the mask
used in masked arrays. These flags can be used to add a mask, or ensure
the mask isn't a view into another array's mask.

First is 'arr.flags.maskna', which is True for all masked arrays and
may be set to True to add a mask to an array which does not have one.

Second is 'arr.flags.ownmaskna', which is True if the array owns the
memory to the mask, and False if the array has no mask, or has a view
into the mask of another array. If this is set to False in a masked
into the mask of another array. If this is set to True in a masked
array, the array will create a copy of the mask so that further modifications
to the mask will not affect the array being viewed.
to the mask will not affect the original mask from which the view was taken.

NA-Masks When Constructing From Lists
=====================================

The initial design of NA-mask construction was to make all construction
fully explicit. This turns out to be unwieldy when working interactively
with NA-masked arrays, and having an object array be created instead of
an NA-masked array can be very surprising.

Because of this, the design has been changed to enable an NA-mask whenever
creating an array from lists which have an NA object in them. There could
be some debate of whether one should create NA-masks or NA-bitpatterns
by default, but due to the time constraints it was only feasible to tackle
NA-masks, and extending the NA-mask support more fully throughout NumPy seems
much more reasonable than starting another system and ending up with two
incomplete systems.

Mask Implementation Details
===========================
Expand All @@ -388,7 +409,7 @@ New ndarray Methods

New functions added to the numpy namespace are::

np.isna(arr)
np.isna(arr) [IMPLEMENTED]
Returns a boolean array with True whereever the array is masked
or matches the NA bitpattern, and False elsewhere

Expand All @@ -398,22 +419,34 @@ New functions added to the numpy namespace are::

New functions added to the ndarray are::

arr.copy(..., replacena=None)
arr.copy(..., replacena=np.NA)
Modification to the copy function which replaces NA values,
either masked or with the NA bitpattern, with the 'replacena='
parameter suppled. When 'replacena' isn't None, the copied
parameter suppled. When 'replacena' isn't NA, the copied
array is unmasked and has the 'NA' part stripped from the
parameterized type ('NA[f8]' becomes just 'f8').

arr.view(maskna=True)
parameterized dtype ('NA[f8]' becomes just 'f8').

The default for replacena is chosen to be np.NA instead of None,
because it may be desirable to replace NA with None in an
NA-masked object array.

For future multi-NA support, 'replacena' could accept a dictionary
mapping the NA payload to the value to substitute for that
particular NA. NAs with payloads not appearing in the dictionary
would remain as NA unless a 'default' key was also supplied.

Both the parameter to replacena and the values in the dictionaries
can be either scalars or arrays which get broadcast onto 'arr'.

arr.view(maskna=True) [IMPLEMENTED]
This is a shortcut for
>>> a = arr.view()
>>> a.flags.hasmaskna = True
>>> a.flags.maskna = True

arr.view(ownmaskna=True)
arr.view(ownmaskna=True) [IMPLEMENTED]
This is a shortcut for
>>> a = arr.view()
>>> a.flags.hasmaskna = True
>>> a.flags.maskna = True
>>> a.flags.ownmaskna = True

Element-wise UFuncs With Missing Values
Expand Down Expand Up @@ -478,7 +511,7 @@ Some examples::
>>> np.sum(a, skipna=True)
11.0
>>> np.mean(a)
NA('<f8')
NA(dtype='<f8')
>>> np.mean(a, skipna=True)
3.6666666666666665

Expand All @@ -488,7 +521,7 @@ Some examples::
>>> np.max(a, skipna=True)
array(NA, dtype='<f8', maskna=True)
>>> np.mean(a)
NA('<f8')
NA(dtype='<f8')
>>> np.mean(a, skipna=True)
/home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2374: RuntimeWarning: invalid value encountered in double_scalars
return mean(axis, dtype, out)
Expand Down Expand Up @@ -743,6 +776,38 @@ to be consistent with the result of np.sum([])::
>>> np.sum([])
0.0

Boolean Indexing
================

Indexing using a boolean array containing NAs does not have a consistent
interpretation according to the NA abstraction. For example::

>>> a = np.array([1, 2])
>>> mask = np.array([np.NA, True], maskna=True)
>>> a[mask]
What should happen here?

Since the NA represents a valid but unknown value, and it is a boolean,
it has two possible underlying values::

>>> a[np.array([True, True])]
array([1, 2])
>>> a[np.array([False, True])]
array([2])

The thing which changes is the length of the output array, nothing which
itself can be substituted for NA. For this reason, at least initially,
NumPy will raise an exception for this case.

Another possibility is to add an inconsistency, and follow the approach
R uses. That is, to produce the following::

>>> a[mask]
array([NA, 2], maskna=True)

If, in user testing, this is found necessary for pragmatic reasons,
the feature should be added even though it is inconsistent.

PEP 3118
========

Expand Down Expand Up @@ -823,7 +888,7 @@ This gives us the following additions to the PyArrayObject::
* If no mask: NULL
* If mask : bool/uint8/structured dtype of mask dtypes
*/
PyArray_Descr *maskna_descr;
PyArray_Descr *maskna_dtype;
/*
* Raw data buffer for mask. If the array has the flag
* NPY_ARRAY_OWNMASKNA enabled, it owns this memory and
Expand All @@ -837,9 +902,24 @@ This gives us the following additions to the PyArrayObject::
*/
npy_intp *maskna_strides;

There are 2 (or 3) flags which must be added to the array flags::
These fields can be accessed through the inline functions::

PyArray_Descr *
PyArray_MASKNA_DTYPE(PyArrayObject *arr);

npy_mask *
PyArray_MASKNA_DATA(PyArrayObject *arr);

NPY_ARRAY_HASMASKNA
npy_intp *
PyArray_MASKNA_STRIDES(PyArrayObject *arr);

npy_bool
PyArray_HASMASKNA(PyArrayObject *arr);

There are 2 or 3 flags which must be added to the array flags, both
for requesting NA masks and for testing for them::

NPY_ARRAY_MASKNA
NPY_ARRAY_OWNMASKNA
/* To possibly add in a later revision */
NPY_ARRAY_HARDMASKNA
Expand All @@ -860,6 +940,10 @@ PyArray_ContainsNA(PyArrayObject* obj)
true if the array has NA support AND there is an
NA anywhere in the array.

int PyArray_AllocateMaskNA(PyArrayObject* arr, npy_bool ownmaskna, npy_bool multina)
Allocates an NA mask for the array, ensuring ownership if requested
and using NPY_MASK instead of NPY_BOOL for the dtype if multina is True.

Mask Binary Format
==================

Expand Down Expand Up @@ -955,7 +1039,7 @@ We add several new per-operand flags:
NPY_ITER_USE_MASKNA
If the operand has an NA dtype, an NA mask, or both, this adds a new
virtual operand to the end of the operand list which iterates
over the mask of the particular operand.
over the mask for the particular operand.

NPY_ITER_IGNORE_MASKNA
If an operand has an NA mask, by default the iterator will raise
Expand Down Expand Up @@ -989,7 +1073,7 @@ to 12.5% overhead for a separately kept mask.
Acknowledgments
***************

In addition to feedback Travis Oliphant and others at Enthought,
In addition to feedback from Travis Oliphant and others at Enthought,
this NEP has been revised based on a great deal of feedback from
the NumPy-Discussion mailing list. The people participating in
the discussion are::
Expand Down
Loading