Skip to content

Deprecate PyArrayObject* direct field access #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jul 26, 2011

Conversation

mwiebe
Copy link
Member

@mwiebe mwiebe commented Jul 19, 2011

This pull request deprecates direct access to PyArrayObject fields. This direct access has been discouraged for a while through comments in the header file and documentation, but up till now, there was no way to disable it. I've created such a mechanism, and C extensions can test that they don't use deprecated C API by #defining NPY_NO_DEPRECATED_API at the top of the C file.

I've confirmed that scipy master builds against this branch, and its test failures look unrelated to these changes (iterative methods failures). Additional testing of different versions and platforms would be appreciated!

This also includes a few other miscellaneous changes:

mwiebe and others added 20 commits July 19, 2011 14:00
…rface__ incorrectly

I'm not sure why this came up after the arrayobject field access
changes, since it looks like the bug was in there before already.
…tor being correct

There was some fishy code flushing and closing the _mmap property,
which looked as if it was paranoid in not trusting mmap.mmap to
behave properly. This was done in a way which assumed the .base
attribute isn't collapsed, something which I've changed in the refactoring.

The easiest way to fix this is to trust mmap.mmap - if this is incorrect,
the specific platform on which this fails should have been already
documented in the comments!
…patibilities

Sticking with inline functions when NPY_NO_DEPRECATED_API is defined.
may be set to True to add a mask to an array which does not have one.

Second is 'arr.flags.ownnamask', which is True if the array owns the
Second is 'arr.flags.ownmaskna', which is True if the array owns the
memory to the mask, and False if the array has no mask, or has a view
into the mask of another array. If this is set to False in a masked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be simpler if copying the mask could be specified when making the view, something like a 'copymask' keyword.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see this is done below.

>>> a + b
array([NA(0), 5, NA(2)], namasked='multi')
array([NA(0), 5, NA(2)], maskna='multi')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, the implication here is not to mix masked and bit-patterns? Consider the following:

...

a = np.array([np.NA, 2, 5], maskna=True)
b = np.array([1, np.NA, 7], dtype='NA')
a + b
...
What should be the result? Now, let's say that -- somehow -- resulting array retains the distinction between the masked-NA and the bit-pattern NA. And the user wants to skip masked-NA values, but not skip bitpattern-NA values. I don't see how that is going to be possible in the current framework. Note, I am not saying that it should be possible, I am just raising a possible future issue with maintaining the distinction between bitpattern-NA and masked NA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, it isn't about mixing masks and bitpatterns, its about having multiple distinct NAs in both masks and bitpatterns. The NEP makes no distinction between bitpattern-NA and masked NA with regard to computations, however the multi-NA idea would allow a user to distinguish between them by using different multi-NA payloads.

@WeatherGod
Copy link
Contributor

Consider the following:

>>> import numpy as np
>>> a = np.random.random((3, 2))
>>> b = np.ma.masked_array(a, mask=[[False, True], [True, True], [False, False]])
>>> b
masked_array(data =
 [[0.110804969841 --]
 [-- --]
 [0.955128477746 0.440430735546]],
             mask =
 [[False  True]
 [ True  True]
 [False False]],
       fill_value = 1e+20)
>>> b.mean(axis=0)
masked_array(data = [0.532966723794 0.440430735546],
             mask = [False False],
       fill_value = 1e+20)

>>> b.mean(axis=1)
masked_array(data = [0.110804969841 -- 0.697779606646],
             mask = [False  True False],
       fill_value = 1e+20)

Note that for masked arrays, they also do not return 0.0 for completely masked rows:

>>> b[1].sum()
masked

I am not saying that this is the way it should be, but if there is going to be a difference, then this should be made very clear for former np.ma users.

@mwiebe
Copy link
Member Author

mwiebe commented Jul 20, 2011

I've added a section 'Differences with numpy.ma' which uses your example to show how things behave differently. Thanks!

…also fixes ticket numpy#1912)

This adds an 'aligned'= property to the format dict, which gets put
in the str() representation when necessary to preserve it.
@mwiebe
Copy link
Member Author

mwiebe commented Jul 22, 2011

I've added another commit which makes the 'align=' struct dtype flag stickiness better, and fixes ticket #1912. Are there any more changes you think are necessary before this gets merged?

@charris
Copy link
Member

charris commented Jul 22, 2011

I haven't had time to finish looking through it, but I suspect I won't have anything substantive to add. However, I've been thinking the name PyArray_SetBase is too generic. What is the Base and what does setting it do? So maybe another name and some explanation of the intended use of the function would help.

@mwiebe
Copy link
Member Author

mwiebe commented Jul 22, 2011

Sure, the name PyArray_SetBase is what it is because it matches with PyArray_BASE.

I think the possible confusion is due to poor naming conventions in the NumPy API. First, Py* should be reserved for CPython, not NumPy. I think a transition to Npy* should be done when the API has matured some more. Second, PyArray_* has all the PyArrayObject-specific functions and random other functions mixed in together.

The overloading of the BASE property together with the UPDATEIFCOPY flag also seems unfortunate, as it significantly changes the meaning of that property depending on the flag.

@charris
Copy link
Member

charris commented Jul 22, 2011

I'm guessing that PyArray_SetBase sets a pointer to the array from which a view is taken, so maybe something like PyArray_SetBaseArray or PyArray_SetRootArray, or some such. I really think all arrays should be views of some separate memory/buffer object, but that is another issue.

Can you expand on the interplay between the BASE and UPDATECOPY flags? I don't think it matters for this commit, but I would like to learn something ;)

@mwiebe
Copy link
Member Author

mwiebe commented Jul 22, 2011

Yes, except the base can be any object that owns the memory, not just an array. PyArray_SetBaseObject? I agree with you about the separate memory/buffer object.

When the UPDATEIFCOPY flag of arr is set, arr's base is then the original array from which a copy is made. Arr's base is set readonly as well. When arr is freed, it copies its data into its base and sets arr to writeable again. Quite different from arr's base owning the memory...

@charris
Copy link
Member

charris commented Jul 22, 2011

PyArray_SetBaseObject sounds good, or maybe PyArray_SetBufferObject or PyArray_SetMemoryObject.

The UPDATEIFCOPY sounds like two paradigms getting mixed together.

@mwiebe
Copy link
Member Author

mwiebe commented Jul 22, 2011

I'll switch it to PyArray_SetBaseObject. Changing it to a name other than Base would need a bigger cascading set of changes.

@mwiebe
Copy link
Member Author

mwiebe commented Jul 22, 2011

I've changed the name to PyArray_SetBaseObject, and added some documentation about it.

@mwiebe
Copy link
Member Author

mwiebe commented Jul 26, 2011

Chuck, did you have anything else you'd like changed before merging this?

@charris
Copy link
Member

charris commented Jul 26, 2011

No, go for it. Might clean up the trailing whitespace though ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants