Missingdata - add NA mask, multitude of other improvements #141

mwiebe · 2011-08-19T04:47:39Z

Normally I would have preferred to do lots of smaller merges as I was developing this, but after the extensive design discussion phase, I thought it would be better to first write something that's in good, testable shape. Read the 2.0.0 release notes file for some details about the changes.

I believe this set of patches is in good shape to get merged, so needs the requisite testing and review to ensure things work well on a wide variety of platforms and with other dependent software.

charris · 2011-08-19T05:19:49Z

Fails to build here:

numpy/core/src/multiarray/nditer_constr.c:2373:1: error: redefinition of ‘intp_abs’

mwiebe · 2011-08-19T05:20:52Z

Ah yes, single file build mode strikes again.

mwiebe · 2011-08-19T05:32:03Z

That should fix it.

rgommers · 2011-08-19T07:13:15Z

Description of the new functionality sounds promising!

I noticed one problem after a first try: some test files have the executable bit set and are therefore not run. With np.test(extra_argv=['--exe']) I get 3541 tests run vs. 3072 for np.test(). The former number still seems to be low, since in 1.6.1 there are 3542.

rgommers · 2011-08-19T07:19:48Z

Thanks for fixing all the test noise (deprecation warnings) as well. You also masked the message about putmask however, I thought that deprecation would be undone?

>>>     >>> x = np.arange(6).reshape(2, 3)
>>>     >>> np.putmask(x, x>2, x**2)
/Library/Frameworks/Python.framework/Versions/2.6/bin/ipython:1: DeprecationWarning: 
putmask has been deprecated. Use copyto with 'where' as the mask instead

charris · 2011-08-19T14:26:45Z

Python 3 build still fails:

In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0:
numpy/core/src/multiarray/na_singleton.c: At top level:
numpy/core/src/multiarray/na_singleton.c:708:25: error: ‘Py_TPFLAGS_CHECKTYPES’ undeclared here (not in a function)

Looks like the python 3 change in flags.

charris · 2011-08-19T14:34:13Z

Ralph, I see the same number of tests both ways. I applied Mark's branch as patches, so I expect previous permissions to remain the same. How did you get the branch? The patch approach is nice as it applies on top of the current master and removes trailing whitespace.

rgommers · 2011-08-19T14:49:33Z

I just fetched and checked it out from Mark's git repo. Then

$ git clean -xdf
$ python setup.py build_ext -i

I just rebased it on master and tried again, same result. What number of tests do you get?

rgommers · 2011-08-19T14:50:34Z

Never mind my putmask comment; your branch is just older than the putmask deprecation.

charris · 2011-08-19T15:40:52Z

Ralph, I see

Ran 3420 tests in 21.679s

OK (KNOWNFAIL=3, SKIP=1)

The way I handle pull requests is like:

 1002  git co -b pull-141
 1003  curl https://github.com/numpy/numpy/pull/141.patch | git am --whitespace=strip

And someday I'll even figure out how to make a git macro for that ;)

mwiebe · 2011-08-19T15:46:01Z

There is trickiness with the number of tests.

There are points where I've renamed the test .py file (test_iterator.py to test_nditer.py), and if you don't wipe out the destination install path, the installer doesn't remove test_iterator.py, giving an inflated number of tests. That would increase the number of tests, though.

Whenever I've encountered yield-based tests, I've perhaps cursed a little bit, then changed them to not use yield so it's actually possible to track down where the failure is happening and what went wrong. This typically reduces the number of tests.

I don't have an accounting for the number of tests, but it probably needs a closer look to see what precisely is causing the discrepencies.

rgommers · 2011-08-19T16:11:35Z

I've cleaned out my site-packages and reinstalled directly from the released installer, to be sure I don't have leftover files hanging around. The correct number for 1.6.1 is 3541 tests (on OS X).

The ~500 tests missing is due to the executable bit being set. After that's fixed we can look for smaller discrepancies if necessary.

mwiebe · 2011-08-19T16:19:20Z

I've just done the following to try and find the files you refer to:

$ find . -iname test_*.py | xargs ls -l > test_files.txt

and all 201 of them are '-rw-rw-r--', no execute bit set anywhere. I also get 3064 tests on my machines (Fedora 15 64-bit)

charris · 2011-08-19T16:33:14Z

Hmm, I'm also running Fedora 15 64 bit and git status doesn't show any untracked test files. Might be because the missing data branch is rebased to current master.

I don't see any executable test files either.

charris · 2011-08-19T16:54:59Z

If I run numpy.test(label='full') I get

Ran 3439 tests in 40.484s

FAILED (KNOWNFAIL=3, SKIP=1, failures=1)

The failure is one of the f2py tests.

charris · 2011-08-19T17:05:00Z

Just in case: nose-1.0.0-py2.7.egg

rgommers · 2011-08-19T18:11:21Z

It seems to be even worse on some of the buildbots: http://buildbot.scipy.org/builders/Windows_XP_x86/builds/1109/steps/shell_1/logs/stdio

I've got 3619 tests run for commit c649139 (before the start of the deprecate_array_access_fields branch) and 3373 when that branch was merged in again. So 250 tests got lost on the way. I'll open a ticket and test this more thoroughly in some automated way. The problem is not (just) the missingdata branch anyway.

mwiebe · 2011-08-19T18:29:57Z

I have a feeling most or all of this is accounted for where I got rid of 'yield' in some of the tests. When a 'yield'-based test fails, its virtually impossible to track down where the failure is coming from, so in order to debug things I converted such tests to use asserts instead. Each 'yield' counts as a test, whereas each assert does not.

rgommers · 2011-08-19T20:02:57Z

You could be right. Still, this isn't the first time that the test count is inconsistent. We've had bugs with executable files before, and with yield tests in classes, etc. It's not hard to write a script that goes back in time and checks per X commits what the number of executed tests is.

charris · 2011-08-19T23:08:40Z

The NEP still has references to 'hasmaskna' the need to be updated to 'maskna', so you might want to update that separately from the rest of this branch.

I think the common methods of array creation -- ones, zeros, ones_like, etc. -- need to get the 'maskna' keyword. That will make it much easier for folks to play and shouldn't be a lot of work.

It would also help rationalize expectations to note that fancy indexing doesn't work for np.NA assignment at this point, boolean indexing only.

So far I haven't run into any problems with scipy or ipython .11.dev using the missing data branch.

mwiebe · 2011-08-20T00:37:35Z

I've updated the NEP to remove 'hasmaskna'. I've added ones and ones_like. I guess that commit is still in the branch though...

empty and zeros were done, I just postponed ones_like because it was a ufunc, and adding maskna as a parameter to ufuncs didn't seem good. I've renamed the ones_like ufunc to _ones_like, and made a function matching the others. I'll take a look at some others, like random.rand, etc.

I mentioned the fancy indexing thing in the release notes, but I guess I could point it out in a future email as well.

Great to hear it's working for you!

charris · 2011-08-20T16:45:52Z

Re: missing tests, I updated to current master and lost about 350 tests. Git bisect fingers 4386275 as the culprit.

charris · 2011-08-20T17:17:13Z

Reverting the commit fixes the problem, but I don't immediately see why. I assume something is failing during the build without derailing the whole thing.

87 · 2011-08-22T08:18:47Z

Hi, I noticed a subtle difference between numpy 1.6.1 and the missingdata branch.

a = np.arange(10)
b = np.arange(3).astype(bool)

1.6.1:
a[b] --> [0,1,2]

2.0.0.dev
a[b] --> ValueError: operands could not be broadcast together with shapes (10) (3)

The latest behaviour seems more consistent, but I've found at least one piece of code that relied on the former..

mwiebe · 2011-08-22T21:29:13Z

There's an email thread I started about the boolean broadcasting here:

http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057870.html

but it didn't generated any discussion on the issue. I've added the specific point you bring up to the release notes. If it's possible, I'd like to keep the change in, but if it's really necessary based on discussion during the alpha/early beta release cycle, a hack can be introduced to make the behavior match.

What's the code you've found that relies on the former behavior?

87 · 2011-08-23T06:08:30Z

Ah, a specific note on the issue would be the best option, I think, as it adds more consistent behavior after all.

This line generated the error in PyTables, but it is easily fixable..:
https://github.com/PyTables/PyTables/blob/master/tables/tableExtension.pyx#L922

charris · 2011-08-26T03:04:33Z

The github patch no longer applies cleanly to master. I'm beginning to have doubts about github patches... Anyway, it might be worth rebasing this.

…e exception

mwiebe · 2011-08-26T06:27:07Z

I've rebased against master - the conflict was about the void dtype crash that worked in this branch and then got fixed in master later.

charris · 2011-08-26T13:19:49Z

The file boolean_ops.c needs to be included in numpy/core/src/multiarray/multiarraymodule_onefile.c.

charris · 2011-08-26T13:48:14Z

The imports in numpy/core/_methods.py need fixes for python3

from numpy.core import multiarray as mu
from numpy.core import umath as um
from numpy.core.numeric import asanyarray

charris · 2011-08-26T17:06:42Z

Curiously, I needed to reinstall matplotlib on account of the import changes. I'm not sure what's up with that, so this note is just a heads up in case someone else runs into that.

…PY_ITER_USE_MASKNA

From github users xscript and 87.

From Chris Jordan-Squire

…tterns

…safer for legacy code These functions now reject inputs with NA, and there are alternative functions PyArray_AllowNAConverter and PyArray_OutputAllowNAConverter that functions should use when they intend to support NA.

charris · 2011-08-27T13:59:36Z

It looks about time to merge this so it gets further testing and as preparation for a long release cycle. Once this is in I'll take a look at supporting NA in the various numpy fitting routines.

mwiebe · 2011-08-27T16:12:44Z

Sounds good, is there anything you guys would like me to do/check before this get merged?

charris · 2011-08-27T16:18:57Z

It would be useful to have your thoughts on the way forward following the merge/release, but that is separate from the merge.

mwiebe · 2011-08-27T18:24:35Z

My basic thoughts are:

Similar to the nditer change in 1.6, the datetime + NA masks constitute a significant update to core-level functionality of NumPy. We need to be just as cautious in the release schedule for this version as we were for 1.6.
I'd suggest going with 1.7 as the release version, because the ABI is still compatible. I'm testing it with the SciPy 0.9.0 in Fedora 15 built against 1.5.1, for example. Just as in 1.6, compatibility tests are in order. I'm a fan of announcing that 1.7 will be the last version to support Python 2.4.
A lesson of the post-1.6 release period is that some of the changes we discussed a fair bit on the list didn't get proper mention in the release notes. We need to ensure that we explicitly mention these things. The feeling I got about switching to 'same_kind' default casting for ufuncs was that people generally like a change like this which will help catch programming errors, but want to be explicitly told about it.
I'm inclined to also change default assignment casting to 'same_kind' as well, what do people think about giving that a try? Some of the changes, like introducing new PyArray_Assign* functions, should make trying out such a change relatively easy. If we do this, it should probably be some time after the merge of this branch, so as to isolate testing of it.
After a 1.7 release, I think we want to go for a 1.8 to finish up an ABI/API deprecation phase. I did a big hunk of work creating a mechanism to allow hiding of the PyArrayObject details from the ABI, and the same thing needs to happen to at least PyArray_Descr and PyUFuncObject in the 1.x series before a 2.0 can be done which would throw out the deprecated access and fully hide the field details. (Of course, for people who want faster access, a NPY_DISABLE_ABI_COMPATIBILITY macro could be supported which would only allow loading of modules with that exact build of NumPy, i.e. it would enforce both the major and minor API versions)
There are still a bunch of NumPy features not supporting NA masks yet. Hopefully the C-API NA mask documentation is good enough to help more people than just me work on these things! I don't think these things should preclude a release, because the functionality is already useful, but it would be nice to have a continued trajectory towards full integration and support of the NA mask.
The next big thing for NA masks is supporting struct dtypes. I don't have any immediate plans to do that myself, but if someone skilled in C steps up to do it I'll be glad to assist them with it. After that, probably multi-NA, then the bitpattern NA dtypes.

This list looks pretty long to me, and I think the topics deserve multiple discussions on the mailing list over time. These are just my thoughts at the present, after all. ;)

charris · 2011-08-27T18:51:12Z

That's a pretty long list ;) I think you should post it on the mailing list.

If no one complains, I'll merge this this evening.

mwiebe · 2011-08-27T19:21:09Z

I'll turn that into a "roadmap" email in the next few days.

charris · 2011-08-28T04:03:46Z

Merged aa55ba7..9ecd91b.

feat: Add v[max|min]nm_f32

Mark Wiebe added 3 commits August 25, 2011 23:00

ENH: missingdata: Add the NA mask members to PyArrayObject

ac21b5c

ENH: Create NA singleton file

b1de3d3

ENH: missingdata: Add boilerplate for NA singleton type

7ed10a2

mwiebe added 6 commits August 25, 2011 23:11

ENH: missingdata: Make numpy.all follow the NA || True == True rule

6767798

ENH: missingdata: Make numpy.all follow the NA && False == False rule

b276b34

TST: missingdata: Write some tests for the np.any and np.all NA behavior

6b7bdc9

STY: Remove trailing whitespace

3981e93

TST: dtype: Adjust void dtype test to pass without raising a zero-siz…

66d289a

…e exception

DOC: Mention the update to np.all and np.any in the release notes

5d31409

mwiebe added 2 commits August 26, 2011 09:02

BLD: core: onefile build fix and Python3 compatibility change

5874e97

ENH: missingdata: Make comparisons with NA return NA(dtype='bool')

0fdad1c

mwiebe added 5 commits August 26, 2011 10:22

ENH: nditer: Change the Python nditer exposure to automatically add N…

d76dda2

…PY_ITER_USE_MASKNA

DOC: missingdata: Updates based on pull request feedback

66f97de

From github users xscript and 87.

DOC: missingdata: Updates from pull request feedback

64554e6

From Chris Jordan-Squire

DOC: missingdata: Add a mention of the design NEP, and masks vs bitpa…

82a1b69

…tterns

charris closed this Aug 28, 2011

njsmith mentioned this pull request May 2, 2016

Segmentation fault in PyArray_Item_INCREF #7595

Closed

charris mentioned this pull request Nov 22, 2018

ENH: Add pocketfft sources to numpy for testing, benchmarks, etc. #11888

Merged

roachlord mentioned this pull request Feb 18, 2019

Passing MagicMock to np.dtype causes a segmentation fault #12982

Closed

luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024

Merge pull request numpy#141 from howjmay/fmaxmin

79c6d14

feat: Add v[max|min]nm_f32

Uh oh!

Missingdata - add NA mask, multitude of other improvements #141

Missingdata - add NA mask, multitude of other improvements #141

Uh oh!

Conversation

mwiebe commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

mwiebe commented Aug 19, 2011

Uh oh!

mwiebe commented Aug 19, 2011

Uh oh!

rgommers commented Aug 19, 2011

Uh oh!

rgommers commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

rgommers commented Aug 19, 2011

Uh oh!

rgommers commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

mwiebe commented Aug 19, 2011

Uh oh!

rgommers commented Aug 19, 2011

Uh oh!

mwiebe commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

rgommers commented Aug 19, 2011

Uh oh!

mwiebe commented Aug 19, 2011

Uh oh!

rgommers commented Aug 19, 2011

Uh oh!

charris commented Aug 19, 2011

Uh oh!

mwiebe commented Aug 20, 2011

Uh oh!

charris commented Aug 20, 2011

Uh oh!

charris commented Aug 20, 2011

Uh oh!

87 commented Aug 22, 2011

Uh oh!

mwiebe commented Aug 22, 2011

Uh oh!

87 commented Aug 23, 2011

Uh oh!

charris commented Aug 26, 2011

Uh oh!

mwiebe commented Aug 26, 2011

Uh oh!

charris commented Aug 26, 2011

Uh oh!

charris commented Aug 26, 2011

Uh oh!

charris commented Aug 26, 2011

Uh oh!

charris commented Aug 27, 2011

Uh oh!

mwiebe commented Aug 27, 2011

Uh oh!

charris commented Aug 27, 2011

Uh oh!

mwiebe commented Aug 27, 2011

Uh oh!

charris commented Aug 27, 2011

Uh oh!

mwiebe commented Aug 27, 2011