Skip to content

Missingdata - Create masked dtype transfer API #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

mwiebe
Copy link
Member

@mwiebe mwiebe commented Jul 8, 2011

This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, which behave analogously to the corresponding unmasked functions. To expose this with a reasonable interface, I added a function np.copyto, which takes a 'where=' parameter just like the element-wise ufuncs.

I've made no effort to optimize the performance of the code, but the function performs better than boolean indexing:

In [2]: a = np.zeros((1000,1000))
In [3]: b = np.ones((1000,1000), dtype='f4')
In [4]: m = np.random.rand(1000,1000) > 0.5

In [8]: timeit a[m] = b[m]
10 loops, best of 3: 160 ms per loop

In [9]: timeit np.copyto(a, b, where=m)
100 loops, best of 3: 12.7 ms per loop

Because of other things I had to do today, I was not able to finish up the masked iteration mode to fix the bug in the ufunc where= parameter with output casting. That will be the next pull request.

Mark Wiebe added 13 commits July 7, 2011 14:25
This kind of shortcut is inappropriate for the default public API.
This was presumably there so that variables could be called 'fortran'
without choking up an obscure unmentioned compiler. I've also gone
through and renamed any variables from 'fortran' to something else.
This change allows npy_bool to be a mask which always has payload
zero. This doesn't let combining masks with payloads to be a simple
'min' operation as the previous design, but allowing npy_bool as
the mask appears to be a very worthwhile tradeoff.
This implementation has no optimization whatsoever in it yet,
it just wraps the unmasked strided transfer functions. It also
does not handle struct masks yet.
These functions expose masked copying routines, with and without
handling of overlapping data. Also deprecated the np.putmask and
PyArray_PutMask functions, because np.copyto supercedes their
functionality. This will need to be discussed on the list during
the pull request review.
This should allow one to create struct dtype arrays with np.ones
and np.zeros_like.
@charris
Copy link
Member

charris commented Jul 9, 2011

I get a lot of the deprecation warnings showing up in the masked array tests. I could probably fix those if you are pressed for time. Pierre might be OK with that.

I'm a bit concerned about the clone operation. Would it be worth adding a debug reference count to some of the structures?

@charris
Copy link
Member

charris commented Jul 9, 2011

Tests pass on python 2.7 and 3.2 (no new errors). Git indicated trailing whitespace in one of the files.

@mwiebe
Copy link
Member Author

mwiebe commented Jul 9, 2011

I'll be pretty busy with the scipy conference, so I may or may not have time to fix these things in the near future. When I enable deprecation warnings, I get tons of warnings about using PyCObject instead of PyCapsule, pretty much dominating other deprecation warnings.

The rationale for the clone operation is the multithreading use case like in numexpr, having completely separated ownership semantics. It isn't reference counting semantics, but object ownership semantics, so I suppose one could add a special debug build mode appropriate for single threading which adds the pointers to a global dictionary to ensure the object ownership rules are being followed.

@charris
Copy link
Member

charris commented Jul 9, 2011

I merged this with some style edits and a fix for the copy, copyto typo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants