-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Missingdata - add NA mask, multitude of other improvements #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fails to build here:
|
Ah yes, single file build mode strikes again. |
That should fix it. |
Description of the new functionality sounds promising! I noticed one problem after a first try: some test files have the executable bit set and are therefore not run. With |
Thanks for fixing all the test noise (deprecation warnings) as well. You also masked the message about
|
Python 3 build still fails:
Looks like the python 3 change in flags. |
Ralph, I see the same number of tests both ways. I applied Mark's branch as patches, so I expect previous permissions to remain the same. How did you get the branch? The patch approach is nice as it applies on top of the current master and removes trailing whitespace. |
I just fetched and checked it out from Mark's git repo. Then
I just rebased it on master and tried again, same result. What number of tests do you get? |
Never mind my |
Ralph, I see
The way I handle pull requests is like:
And someday I'll even figure out how to make a git macro for that ;) |
There is trickiness with the number of tests. There are points where I've renamed the test .py file (test_iterator.py to test_nditer.py), and if you don't wipe out the destination install path, the installer doesn't remove test_iterator.py, giving an inflated number of tests. That would increase the number of tests, though. Whenever I've encountered yield-based tests, I've perhaps cursed a little bit, then changed them to not use yield so it's actually possible to track down where the failure is happening and what went wrong. This typically reduces the number of tests. I don't have an accounting for the number of tests, but it probably needs a closer look to see what precisely is causing the discrepencies. |
I've cleaned out my site-packages and reinstalled directly from the released installer, to be sure I don't have leftover files hanging around. The correct number for 1.6.1 is 3541 tests (on OS X). The ~500 tests missing is due to the executable bit being set. After that's fixed we can look for smaller discrepancies if necessary. |
I've just done the following to try and find the files you refer to: $ find . -iname test_*.py | xargs ls -l > test_files.txt and all 201 of them are '-rw-rw-r--', no execute bit set anywhere. I also get 3064 tests on my machines (Fedora 15 64-bit) |
Hmm, I'm also running Fedora 15 64 bit and git status doesn't show any untracked test files. Might be because the missing data branch is rebased to current master. I don't see any executable test files either. |
If I run numpy.test(label='full') I get
The failure is one of the f2py tests. |
Just in case: nose-1.0.0-py2.7.egg |
It seems to be even worse on some of the buildbots: http://buildbot.scipy.org/builders/Windows_XP_x86/builds/1109/steps/shell_1/logs/stdio I've got 3619 tests run for commit c649139 (before the start of the deprecate_array_access_fields branch) and 3373 when that branch was merged in again. So 250 tests got lost on the way. I'll open a ticket and test this more thoroughly in some automated way. The problem is not (just) the missingdata branch anyway. |
I have a feeling most or all of this is accounted for where I got rid of 'yield' in some of the tests. When a 'yield'-based test fails, its virtually impossible to track down where the failure is coming from, so in order to debug things I converted such tests to use asserts instead. Each 'yield' counts as a test, whereas each assert does not. |
You could be right. Still, this isn't the first time that the test count is inconsistent. We've had bugs with executable files before, and with yield tests in classes, etc. It's not hard to write a script that goes back in time and checks per X commits what the number of executed tests is. |
The NEP still has references to 'hasmaskna' the need to be updated to 'maskna', so you might want to update that separately from the rest of this branch. I think the common methods of array creation -- ones, zeros, ones_like, etc. -- need to get the 'maskna' keyword. That will make it much easier for folks to play and shouldn't be a lot of work. It would also help rationalize expectations to note that fancy indexing doesn't work for np.NA assignment at this point, boolean indexing only. So far I haven't run into any problems with scipy or ipython .11.dev using the missing data branch. |
I've updated the NEP to remove 'hasmaskna'. I've added ones and ones_like. I guess that commit is still in the branch though... empty and zeros were done, I just postponed ones_like because it was a ufunc, and adding maskna as a parameter to ufuncs didn't seem good. I've renamed the ones_like ufunc to _ones_like, and made a function matching the others. I'll take a look at some others, like random.rand, etc. I mentioned the fancy indexing thing in the release notes, but I guess I could point it out in a future email as well. Great to hear it's working for you! |
Re: missing tests, I updated to current master and lost about 350 tests. Git bisect fingers 4386275 as the culprit. |
Reverting the commit fixes the problem, but I don't immediately see why. I assume something is failing during the build without derailing the whole thing. |
Hi, I noticed a subtle difference between numpy 1.6.1 and the missingdata branch.
1.6.1: 2.0.0.dev The latest behaviour seems more consistent, but I've found at least one piece of code that relied on the former.. |
There's an email thread I started about the boolean broadcasting here: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057870.html but it didn't generated any discussion on the issue. I've added the specific point you bring up to the release notes. If it's possible, I'd like to keep the change in, but if it's really necessary based on discussion during the alpha/early beta release cycle, a hack can be introduced to make the behavior match. What's the code you've found that relies on the former behavior? |
Ah, a specific note on the issue would be the best option, I think, as it adds more consistent behavior after all. This line generated the error in PyTables, but it is easily fixable..: |
The github patch no longer applies cleanly to master. I'm beginning to have doubts about github patches... Anyway, it might be worth rebasing this. |
I've rebased against master - the conflict was about the void dtype crash that worked in this branch and then got fixed in master later. |
The file boolean_ops.c needs to be included in numpy/core/src/multiarray/multiarraymodule_onefile.c. |
The imports in numpy/core/_methods.py need fixes for python3
|
Curiously, I needed to reinstall matplotlib on account of the import changes. I'm not sure what's up with that, so this note is just a heads up in case someone else runs into that. |
…PY_ITER_USE_MASKNA
From github users xscript and 87.
From Chris Jordan-Squire
…safer for legacy code These functions now reject inputs with NA, and there are alternative functions PyArray_AllowNAConverter and PyArray_OutputAllowNAConverter that functions should use when they intend to support NA.
It looks about time to merge this so it gets further testing and as preparation for a long release cycle. Once this is in I'll take a look at supporting NA in the various numpy fitting routines. |
Sounds good, is there anything you guys would like me to do/check before this get merged? |
It would be useful to have your thoughts on the way forward following the merge/release, but that is separate from the merge. |
My basic thoughts are:
This list looks pretty long to me, and I think the topics deserve multiple discussions on the mailing list over time. These are just my thoughts at the present, after all. ;) |
That's a pretty long list ;) I think you should post it on the mailing list. If no one complains, I'll merge this this evening. |
I'll turn that into a "roadmap" email in the next few days. |
feat: Add v[max|min]nm_f32
Normally I would have preferred to do lots of smaller merges as I was developing this, but after the extensive design discussion phase, I thought it would be better to first write something that's in good, testable shape. Read the 2.0.0 release notes file for some details about the changes.
I believe this set of patches is in good shape to get merged, so needs the requisite testing and review to ensure things work well on a wide variety of platforms and with other dependent software.