-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Missingdata - Create masked dtype transfer API #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This kind of shortcut is inappropriate for the default public API.
This was presumably there so that variables could be called 'fortran' without choking up an obscure unmentioned compiler. I've also gone through and renamed any variables from 'fortran' to something else.
This change allows npy_bool to be a mask which always has payload zero. This doesn't let combining masks with payloads to be a simple 'min' operation as the previous design, but allowing npy_bool as the mask appears to be a very worthwhile tradeoff.
This implementation has no optimization whatsoever in it yet, it just wraps the unmasked strided transfer functions. It also does not handle struct masks yet.
These functions expose masked copying routines, with and without handling of overlapping data. Also deprecated the np.putmask and PyArray_PutMask functions, because np.copyto supercedes their functionality. This will need to be discussed on the list during the pull request review.
This should allow one to create struct dtype arrays with np.ones and np.zeros_like.
I get a lot of the deprecation warnings showing up in the masked array tests. I could probably fix those if you are pressed for time. Pierre might be OK with that. I'm a bit concerned about the clone operation. Would it be worth adding a debug reference count to some of the structures? |
Tests pass on python 2.7 and 3.2 (no new errors). Git indicated trailing whitespace in one of the files. |
I'll be pretty busy with the scipy conference, so I may or may not have time to fix these things in the near future. When I enable deprecation warnings, I get tons of warnings about using PyCObject instead of PyCapsule, pretty much dominating other deprecation warnings. The rationale for the clone operation is the multithreading use case like in numexpr, having completely separated ownership semantics. It isn't reference counting semantics, but object ownership semantics, so I suppose one could add a special debug build mode appropriate for single threading which adds the pointers to a global dictionary to ensure the object ownership rules are being followed. |
I merged this with some style edits and a fix for the copy, copyto typo. |
test: Refactor test
This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, which behave analogously to the corresponding unmasked functions. To expose this with a reasonable interface, I added a function np.copyto, which takes a 'where=' parameter just like the element-wise ufuncs.
I've made no effort to optimize the performance of the code, but the function performs better than boolean indexing:
Because of other things I had to do today, I was not able to finish up the masked iteration mode to fix the bug in the ufunc where= parameter with output casting. That will be the next pull request.