-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Deprecate PyArrayObject* direct field access #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
NOTE: WIP, code doesn't build
…rface__ incorrectly I'm not sure why this came up after the arrayobject field access changes, since it looks like the bug was in there before already.
…tor being correct There was some fishy code flushing and closing the _mmap property, which looked as if it was paranoid in not trusting mmap.mmap to behave properly. This was done in a way which assumed the .base attribute isn't collapsed, something which I've changed in the refactoring. The easiest way to fix this is to trust mmap.mmap - if this is incorrect, the specific platform on which this fails should have been already documented in the comments!
…patibilities Sticking with inline functions when NPY_NO_DEPRECATED_API is defined.
may be set to True to add a mask to an array which does not have one. | ||
|
||
Second is 'arr.flags.ownnamask', which is True if the array owns the | ||
Second is 'arr.flags.ownmaskna', which is True if the array owns the | ||
memory to the mask, and False if the array has no mask, or has a view | ||
into the mask of another array. If this is set to False in a masked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simpler if copying the mask could be specified when making the view, something like a 'copymask' keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see this is done below.
>>> a + b | ||
array([NA(0), 5, NA(2)], namasked='multi') | ||
array([NA(0), 5, NA(2)], maskna='multi') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, the implication here is not to mix masked and bit-patterns? Consider the following:
...
a = np.array([np.NA, 2, 5], maskna=True)
b = np.array([1, np.NA, 7], dtype='NA')
a + b
...
What should be the result? Now, let's say that -- somehow -- resulting array retains the distinction between the masked-NA and the bit-pattern NA. And the user wants to skip masked-NA values, but not skip bitpattern-NA values. I don't see how that is going to be possible in the current framework. Note, I am not saying that it should be possible, I am just raising a possible future issue with maintaining the distinction between bitpattern-NA and masked NA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, it isn't about mixing masks and bitpatterns, its about having multiple distinct NAs in both masks and bitpatterns. The NEP makes no distinction between bitpattern-NA and masked NA with regard to computations, however the multi-NA idea would allow a user to distinguish between them by using different multi-NA payloads.
Consider the following:
Note that for masked arrays, they also do not return 0.0 for completely masked rows:
I am not saying that this is the way it should be, but if there is going to be a difference, then this should be made very clear for former np.ma users. |
I've added a section 'Differences with numpy.ma' which uses your example to show how things behave differently. Thanks! |
…also fixes ticket numpy#1912) This adds an 'aligned'= property to the format dict, which gets put in the str() representation when necessary to preserve it.
I've added another commit which makes the 'align=' struct dtype flag stickiness better, and fixes ticket #1912. Are there any more changes you think are necessary before this gets merged? |
I haven't had time to finish looking through it, but I suspect I won't have anything substantive to add. However, I've been thinking the name PyArray_SetBase is too generic. What is the Base and what does setting it do? So maybe another name and some explanation of the intended use of the function would help. |
Sure, the name PyArray_SetBase is what it is because it matches with PyArray_BASE. I think the possible confusion is due to poor naming conventions in the NumPy API. First, Py* should be reserved for CPython, not NumPy. I think a transition to Npy* should be done when the API has matured some more. Second, PyArray_* has all the PyArrayObject-specific functions and random other functions mixed in together. The overloading of the BASE property together with the UPDATEIFCOPY flag also seems unfortunate, as it significantly changes the meaning of that property depending on the flag. |
I'm guessing that PyArray_SetBase sets a pointer to the array from which a view is taken, so maybe something like PyArray_SetBaseArray or PyArray_SetRootArray, or some such. I really think all arrays should be views of some separate memory/buffer object, but that is another issue. Can you expand on the interplay between the BASE and UPDATECOPY flags? I don't think it matters for this commit, but I would like to learn something ;) |
Yes, except the base can be any object that owns the memory, not just an array. PyArray_SetBaseObject? I agree with you about the separate memory/buffer object. When the UPDATEIFCOPY flag of arr is set, arr's base is then the original array from which a copy is made. Arr's base is set readonly as well. When arr is freed, it copies its data into its base and sets arr to writeable again. Quite different from arr's base owning the memory... |
PyArray_SetBaseObject sounds good, or maybe PyArray_SetBufferObject or PyArray_SetMemoryObject. The UPDATEIFCOPY sounds like two paradigms getting mixed together. |
I'll switch it to PyArray_SetBaseObject. Changing it to a name other than Base would need a bigger cascading set of changes. |
I've changed the name to PyArray_SetBaseObject, and added some documentation about it. |
Chuck, did you have anything else you'd like changed before merging this? |
No, go for it. Might clean up the trailing whitespace though ;) |
feat: Add vtst_s8
This pull request deprecates direct access to PyArrayObject fields. This direct access has been discouraged for a while through comments in the header file and documentation, but up till now, there was no way to disable it. I've created such a mechanism, and C extensions can test that they don't use deprecated C API by #defining NPY_NO_DEPRECATED_API at the top of the C file.
I've confirmed that scipy master builds against this branch, and its test failures look unrelated to these changes (iterative methods failures). Additional testing of different versions and platforms would be appreciated!
This also includes a few other miscellaneous changes: