Skip to content

Rename masked column fails for subsequent slicing #3023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 21, 2014

Conversation

taldcroft
Copy link
Member

This is weird, but when you create a masked column and look at the data attribute, you will see things like mc.data._name, mc.data._parent_table, etc. For reasons I don't understand, this leads to the following bug:

In [1]: from astropy.table import *
In [2]: mc = MaskedColumn([1,2], name='a')
In [3]: mc.data._name
Out[3]: 'a'

In [4]: mc
Out[4]: 
<MaskedColumn name='a' unit=None format=None description=None>
masked_array(data = [1 2],
             mask = [False False],
       fill_value = 999999)

In [5]: mc.name = 'b'

In [6]: mc
Out[6]: 
<MaskedColumn name='b' unit=None format=None description=None>
masked_array(data = [1 2],
             mask = [False False],
       fill_value = 999999)

In [7]: mc[:]   # back to name of 'a'
Out[7]: 
<MaskedColumn name='a' unit=None format=None description=None>
masked_array(data = [1 2],
             mask = [False False],
       fill_value = 999999)

In [8]: mc.data._name
Out[8]: 'a'

Note this is in the current stable release, and the same feature is not seen for Column. This is for Mac / Python 2.7 / numpy 1.8.

@taldcroft
Copy link
Member Author

Fix attached. The root problem seems to be that by design __array_finalize__ is not called. Then when updating the output object after slicing, if the output is a MaskedArray subclass (which it is obviously is), then the original object __dict__ is purposely ignored. Go figure.

For reference see also:
http://stackoverflow.com/questions/14564656/subclassing-numpy-ma-maskedarray

@taldcroft
Copy link
Member Author

@astrofrog or @mhvk can you have a quick look? I think this not too controversial and has reasonable tests.

@taldcroft
Copy link
Member Author

(There was a failing test, but it appeared to just be a random Travis stoppage, re-running now).

# Fixes issue #3023: when calling getitem with a MaskedArray subclass
# the original object attributes are not copied.
if out.__class__ is self.__class__:
out.parent_table = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taldcroft - since the problem appears to be that __array_finalize__ is not called, why not just do that? I.e.,

if out.__class__ is self.__class__:
    out.__array_finalize__(self)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. Am slightly confused why the if statement is necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About using __array_finalize__ directly, I didn't do this because it seemed like the out object was mostly "finalized" already, and that there probably was good a reason the MaskedArray implementation was not calling it. So I started from a bit of fearful ignorance.

Now looking at what ends up getting run via __array_finalize__ there is a bunch of stuff that I'm not sure needs to be done or should be done again on out. But still not having complete understanding, I did just try it out and it failed a couple of tests in test_table_aggregate (which is honestly a bit odd).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the if statement, you don't want to set these attributes if the item was an integer so that the return is now a scalar type object instead of a new MaskedColumn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I understand why __array_finalize__ would do too much: I think it would call MaskedArray.__array_finalize__, which is the one that will have been done already. So, what you have is indeed better.

And, yes, of course the if statement makes sense...

@mhvk
Copy link
Contributor

mhvk commented Oct 21, 2014

@taldcroft - see my comment. Note that this would likely be resolved by numpy/numpy#4586. So, hopefully this workaround will no longer be necessary some day...

@taldcroft
Copy link
Member Author

I noticed pandas 0.15 dropped numpy 1.6, but I suppose for astropy it'll be a while before we drop 1.9... 😄

mhvk added a commit that referenced this pull request Oct 21, 2014
Rename masked column fails for subsequent slicing
@mhvk mhvk merged commit 23ba91e into astropy:master Oct 21, 2014
@mhvk
Copy link
Contributor

mhvk commented Oct 21, 2014

@taldcroft - merged, since it looked all OK now that I understand why __array_finalize__ would do too much.

mhvk added a commit that referenced this pull request Dec 31, 2014
Rename masked column fails for subsequent slicing
@taldcroft taldcroft deleted the table-masked-col-3023 branch October 1, 2019 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants