-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Add where= parameter to ufuncs #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Impressive speedup indeed, also getting 10-12x improvement for all but transcendental operations (np.power(a, b)...).
|
Thanks. The crashes you're getting definitely shouldn't be happening (NumPy should never crash!), it should automatically convert the data (raising an exception if it's violating the casting rule). |
Since this is the missing data branch, I am also reporting some Python3-errors with tests updated from your previous masked array commits:
|
I've patched those integer divisions. |
Bit more on the first segfault (64 bit linux)
|
Great, I think it would be good to put the crash info in its own trac ticket, since it's independent of this pull request. I can probably get to it some time after this is merged. |
I've added some commits to deprecate some bad namespace pollution in the macros. I've started with some obviously bad C API stuff. |
@@ -291,8 +352,8 @@ New functions added to the ndarray are:: | |||
array is unmasked and has the 'NA' part stripped from the | |||
parameterized type ('NA[f8]' becomes just 'f8'). | |||
|
|||
arr.view(masked=True) | |||
This is a shortcut for 'a = arr.view(); a.flags.hasmask=True'. | |||
arr.view(namasked=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking 'asnamasked' would be more descriptive, but it's getting a bit long.
Where might be more useful if it didn't overwrite the masked portions of the output.
Hmm, that seems to be part of the mixed type problem.
|
I think it would be good to document the API deprecations before merging, but otherwise hopefully it's good to go? Thanks for the reviews! |
I'm concerned about committing when there is a known segfault, otherwise I would have put it in last night. The problem looks to be in the source pointer for the copy to the output. It works fine without the where. Thoughts? |
I haven't tried it, but I read Derek's comment to mean that this bug exists all the way back to 1.6, so this patch wasn't changing anything about it. I suppose I could just track it down first, sure. |
Hmm, you're right. There may actually be two errors, or maybe some combination of the input being freed at different times. I was using small arrays for testing in the hope of avoiding the segfault, but I do see it with large arrays. However, the output is wrong in the small array (no segfault) case with mixed kinds and the where keyword. |
Here is what I am talking about
This behavior is actually different if the arrays are smaller. And if the output is int64
|
Playing with the bug{s}, there appears to be no problem with unary ufuncs. |
proposed elsewhere for customizing subclass ufunc behavior with a | ||
_numpy_ufunc_ member function would allow a subclass with a different | ||
default to be created. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this would be the approach where numpy.ma can be implemented to maintain backwards compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should mostly be possible, yes. For the subtleties and features that the missing data abstraction doesn't provide, I'm also trying to make working with masks easier in general with things like the 'where=' parameter in ufuncs.
I just tried a clean build of this pull request on my 32-bit, python 2.7 platform. Somewhere during the tests, a glibc exception occurs with respect to a double-freeing of memory. |
Sorry about the test error, I added tests for bugs that Derek and Chuck reported but haven't implemented fixes for them yet. |
Hi Mark, This is getting rather large. Would you mind if I committed the NEP and header cleanups separately and then you could rebase the remainder on top of that? |
Sure, go for it! |
Never mind, it isn't that easy to tease things apart ;) |
Left out a cherry-pick, it works now. I'll put it up as a pull request. |
… bit patterns more clear
…ends This is an attempt to reduce potential confusion between the existing numpy.ma and the NA-supporting built-in mask.
This one handles PyArray_DEFAULT -> NPY_DEFAULT_TYPE and the NPY_* array flags -> NPY_ARRAY_*. The PyArray_DEFAULT vs NPY_DEFAULT confusion was particularly egregious here.
… masks' This includes an email comment from Ben about 'np.any' and 'np.all'.
This is a step towards having everyone on the list use the same vocabulary with specific nailed-down definitions for the terms.
Will reenable it once masking iteration features are done.
The bug Derek found is fixed in master now. The bug Chuck found requires implementing mask features in the iterator to properly handle it, so I'd like to merge this as is with the bug still there. I've marked it as known fail, and will unmark it when it can be fixed. |
Sounds good, I put it in. |
Thanks for working on this, numpy no longer crashes, and all combinations with different in and out dtypes also seem to work for the standard function call (or raise the appropriate TypeError)! However, with your where parameter np.add() is now returning wrong results [Python2.7, Mac OS 10.6]:
|
Cool, thanks for confirming all that. I'm presently extending the iterator to have a masked mode which is needed to properly fix that bug. There's a test in the suite which is marked knownfail at the moment. |
feat: Add vqmovn_s16
Also made a standard mechanism for inner loop auxiliary data, converting the existing low level casting loops to use it. I've also moved the ufunc type promotion functions into their own file to tame the ufunc_object.c monster a tiny bit.
A simple test, mainly showing how poorly boolean indexing performs: