-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN: In DecimalArray implement aliases as properties #26900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Much safer, since _data may be assigned to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a tests to validate this assertion
def data(self): | ||
return self._data | ||
|
||
# those aliases are currently not working due to assumptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't added commented code unless its germane (its not here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which assertion do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were 2 commented out attributes. I tuned them into properties with the rest, then commented them out to not delete what the author saw fit to leave in. Not sure what to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is fine to keep this in comments.
But for briefness, I think it is enough to have only one property (I would take data
) and then you can simply do items = data
as alias, and keep the values
and _values
in a similar way as alias in comments.
Codecov Report
@@ Coverage Diff @@
## master #26900 +/- ##
==========================================
- Coverage 91.87% 91.86% -0.01%
==========================================
Files 180 180
Lines 50712 50712
==========================================
- Hits 46590 46586 -4
- Misses 4122 4126 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26900 +/- ##
==========================================
- Coverage 91.87% 91.86% -0.02%
==========================================
Files 180 180
Lines 50743 50712 -31
==========================================
- Hits 46620 46586 -34
- Misses 4123 4126 +3
Continue to review full report at Codecov.
|
What do you mean with "got bitten by this" ? This is only a test array, so it might be that it is actually good that those are writable, to make sure we don't mess with such attributes. |
I added a
I don't understand. if they are writable of course you can mess with them. But if they were properties (this PR), you can't do anything to them, and if you tried, there would be an exception). Can't see how this helps testing, on the contrary. It's a bad pattern, and EA authors who look for examples or a related EA from which to build on may pick this up and gain bad habits. |
Sorry, that's still too cryptic. How did that made a bug possible by having those writable attributes?
Then let's make the comment around it more clear about that? We mainly care about having those names on the array, less so whether they are attributes or properties. |
My implementation of I should have used a constructor (and changed it), but if Does that clarify things? |
Sure. But from the comments I've been getting here it seems like in my attempts I either fail to make myself clear, or that I come across as too verbose. I'd love to be both succinct and clear, but it turns out to be quite a challenge for me personally :). |
Ah, now I understand! I overloooked that behaviour of python (I didn't think about the fact that if you overwrite
That's actually a "bug" (or at least wrong usage) you discovered, as it was the idea that it is
That's not for you personally. That's a challenge for all of us (and my replies also often result in longer responses than I initially wanted. But I personally try to err on the clear but verbose side than brief but not clear enough). |
Reminder: we prefer you do not force push :) |
self._dtype = DecimalDtype(context) | ||
|
||
# Certain pandas function currently refer to these attributes | ||
# expecting them to return a numpy array of values. here we | ||
# just treat them as synonyms for `self._data`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, so some historical note (not that you should have known, so no blame!): pandas relied in several places on doing things like getattr(object, 'values')
to ensure to get the underlying ndarray values of an object if it was not yet an ndarray (eg where the object could have been an Index or Series). That's something we don't really like and are trying to get more structure in (but also for us, the pandas code is huge and complex). But so those attributes we added here are rather to ensure that we are not using those wrongly. As from pandas code, we should never directly access the "underlying data" of an EA; all interaction with the EA should go through the official documented interface.
So those attributes here were added in order to have an EA exercise the tests that has some additional attributes that are not part of the official interface, to make sure they don't accidentally mess up some functionality covered in the tests.
And since values
and _values
are still commented out, it seems we were (at the time) not yet sure that it would work, or it probably did not yet work. It would be good (but not here in this PR) to at some point try to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See eg #20735 as one example of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, I see. Not brief, but clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this PR is trying to correct something which was put there as a canary, to catch these bugs? that still doesn't make sense. If you want to catch accesses to these, this is a bad way to do it, since you're relying on the access to have an effect you test for. It won't catch reads using the wrong attribute name for example.
Why not make them properties which throw when accessed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to to ensure pandas does not try to "access" the value (which this tests are a way to do), but I agree that "setting" the value (which we also don't want pandas to do) can throw an error. So converting to the property instead of attribute is indeed fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could the original approach catch reads? OTOH, converting to properties will completely hide the problem even of writes, unless the properties are boobytrapped.
Maybe I still don't understand what you were trying to accomplish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: uncomment the values
and _values
aliasing, run the decimals tests, and see if any test fail.
Tests will be failing, because pandas is assuming it can do getattr(obj, 'values'
) on some object and expect that it extracts the data of eg a Series, but not the private data of an EA. See #20735 for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. and what about data and _items which weren't commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the tests are working, pandas does not call those (anymore). Or at least not in a way that gives incorrect results.
But we added them here to have that tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the failing test due to _values, and why it's difficult to figure out the cases. This PR is kind of mired in that purgatory, so I think I'll just close. Ok?
Much safer, since
_data
may be assigned to. I got bitten by this.git diff upstream/master -u -- "*.py" | flake8 --diff