CLN: In DecimalArray implement aliases as properties #26900

ghost · 2019-06-17T12:48:59Z

Much safer, since _data may be assigned to. I got bitten by this.

tests passed (as long as it doesn't break existing)
passes git diff upstream/master -u -- "*.py" | flake8 --diff

Much safer, since _data may be assigned to

jreback

please add a tests to validate this assertion

jreback · 2019-06-17T13:12:25Z

pandas/tests/extension/decimal/array.py

+    def data(self):
+        return self._data
+
+    # those aliases are currently not working due to assumptions


don't added commented code unless its germane (its not here)

Which assertion do you mean?

There were 2 commented out attributes. I tuned them into properties with the rest, then commented them out to not delete what the author saw fit to leave in. Not sure what to do here.

Yes, it is fine to keep this in comments.

But for briefness, I think it is enough to have only one property (I would take data) and then you can simply do items = data as alias, and keep the values and _values in a similar way as alias in comments.

codecov · 2019-06-17T13:24:42Z

Codecov Report

Merging #26900 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26900      +/-   ##
==========================================
- Coverage   91.87%   91.86%   -0.01%     
==========================================
  Files         180      180              
  Lines       50712    50712              
==========================================
- Hits        46590    46586       -4     
- Misses       4122     4126       +4

Flag	Coverage Δ
#multiple	`90.45% <ø> (ø)`	⬆️
#single	`41.12% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/frame.py	`96.88% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baa77c3...93972dc. Read the comment docs.

codecov · 2019-06-17T13:24:44Z

Codecov Report

Merging #26900 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26900      +/-   ##
==========================================
- Coverage   91.87%   91.86%   -0.02%     
==========================================
  Files         180      180              
  Lines       50743    50712      -31     
==========================================
- Hits        46620    46586      -34     
- Misses       4123     4126       +3

Flag	Coverage Δ
#multiple	`90.45% <ø> (-0.01%)`	⬇️
#single	`41.12% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/io/formats/printing.py	`85.56% <0%> (-1.17%)`	⬇️
pandas/util/_test_decorators.py	`93.22% <0%> (-0.44%)`	⬇️
pandas/core/frame.py	`96.88% <0%> (-0.13%)`	⬇️
pandas/core/indexes/multi.py	`95.67% <0%> (-0.07%)`	⬇️
pandas/core/series.py	`93.64% <0%> (-0.02%)`	⬇️
pandas/core/generic.py	`94.2% <0%> (-0.01%)`	⬇️
pandas/core/strings.py	`98.93% <0%> (ø)`	⬆️
pandas/core/indexes/interval.py	`96.44% <0%> (ø)`	⬆️
pandas/core/indexes/base.py	`96.71% <0%> (ø)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d47947a...914840a. Read the comment docs.

jorisvandenbossche · 2019-06-18T10:08:29Z

Much safer, since _data may be assigned to. I got bitten by this.

What do you mean with "got bitten by this" ?

This is only a test array, so it might be that it is actually good that those are writable, to make sure we don't mess with such attributes.

ghost · 2019-06-19T00:44:06Z

What do you mean with "got bitten by this" ?

I added a round method and test to DecimalArray, and this made a bug possible.

This is only a test array, so it might be that it is actually good that those are writable,
to make sure we don't mess with such attributes.

I don't understand. if they are writable of course you can mess with them. But if they were properties (this PR), you can't do anything to them, and if you tried, there would be an exception).

Can't see how this helps testing, on the contrary. It's a bad pattern, and EA authors who look for examples or a related EA from which to build on may pick this up and gain bad habits.

jorisvandenbossche · 2019-06-19T05:46:46Z

What do you mean with "got bitten by this" ?

I added a round method and test to DecimalArray, and this made a bug possible.

Sorry, that's still too cryptic. How did that made a bug possible by having those writable attributes?

It's a bad pattern, and EA authors who look for examples or a related EA from which to build on may pick this up and gain bad habits.

Then let's make the comment around it more clear about that?

We mainly care about having those names on the array, less so whether they are attributes or properties.
But by making them properties, we given even more attention to it IMO, as they take more space:)

ghost · 2019-06-19T06:55:45Z

Sorry, that's still too cryptic. How did that made a bug possible by having those writable attributes?

My implementation of round return a result by calling self.copy(), and then assigning the calculated result to the _data attribute of the copy, then returning it. The data attribute then still pointed at the old values, and when data was used elsewhere it returned stale data.

I should have used a constructor (and changed it), but if data had been a true alias for _data, and not just a different attribute initialized to the same value, that bug could not have happened. That's why I claim it's a bad pattern.

Does that clarify things?

ghost · 2019-06-19T06:58:37Z

Then let's make the comment around it more clear about that?

Sure. But from the comments I've been getting here it seems like in my attempts I either fail to make myself clear, or that I come across as too verbose. I'd love to be both succinct and clear, but it turns out to be quite a challenge for me personally :).

jorisvandenbossche · 2019-06-19T07:54:41Z

My implementation of round return a result by calling self.copy(), and then assigning the calculated result to the _data attribute of the copy, then returning it. The data attribute then still pointed at the old values, and when data was used elsewhere it returned stale data.

Ah, now I understand! I overloooked that behaviour of python (I didn't think about the fact that if you overwrite .data, it would not update automatically .data and ._items). Yes, that's a good reason to make those properties.

The data attribute then still pointed at the old values, and when data was used elsewhere it returned stale data.

That's actually a "bug" (or at least wrong usage) you discovered, as it was the idea that it is _data that should be used consistently throughout the DecimalArray implementation for the internal data. So you can change that occurence of .data to ._data in _reduce.

I'd love to be both succinct and clear, but it turns out to be quite a challenge for me personally :).

That's not for you personally. That's a challenge for all of us (and my replies also often result in longer responses than I initially wanted. But I personally try to err on the clear but verbose side than brief but not clear enough).

jorisvandenbossche · 2019-06-19T07:54:55Z

Reminder: we prefer you do not force push :)

jorisvandenbossche · 2019-06-19T08:01:24Z

pandas/tests/extension/decimal/array.py

        self._dtype = DecimalDtype(context)

+    # Certain pandas function currently refer to these attributes
+    # expecting them to return a numpy array of values. here we
+    # just treat them as synonyms for `self._data`.


This is not correct, so some historical note (not that you should have known, so no blame!): pandas relied in several places on doing things like getattr(object, 'values') to ensure to get the underlying ndarray values of an object if it was not yet an ndarray (eg where the object could have been an Index or Series). That's something we don't really like and are trying to get more structure in (but also for us, the pandas code is huge and complex). But so those attributes we added here are rather to ensure that we are not using those wrongly. As from pandas code, we should never directly access the "underlying data" of an EA; all interaction with the EA should go through the official documented interface.

So those attributes here were added in order to have an EA exercise the tests that has some additional attributes that are not part of the official interface, to make sure they don't accidentally mess up some functionality covered in the tests.

And since values and _values are still commented out, it seems we were (at the time) not yet sure that it would work, or it probably did not yet work. It would be good (but not here in this PR) to at some point try to fix this.

See eg #20735 as one example of this

Ahhh, I see. Not brief, but clear!

So, this PR is trying to correct something which was put there as a canary, to catch these bugs? that still doesn't make sense. If you want to catch accesses to these, this is a bad way to do it, since you're relying on the access to have an effect you test for. It won't catch reads using the wrong attribute name for example.

Why not make them properties which throw when accessed?

We want to to ensure pandas does not try to "access" the value (which this tests are a way to do), but I agree that "setting" the value (which we also don't want pandas to do) can throw an error. So converting to the property instead of attribute is indeed fine.

How could the original approach catch reads? OTOH, converting to properties will completely hide the problem even of writes, unless the properties are boobytrapped.

Maybe I still don't understand what you were trying to accomplish.

Suggestion: uncomment the values and _values aliasing, run the decimals tests, and see if any test fail.

Tests will be failing, because pandas is assuming it can do getattr(obj, 'values') on some object and expect that it extracts the data of eg a Series, but not the private data of an EA. See #20735 for an example.

I see. and what about data and _items which weren't commented out?

Since the tests are working, pandas does not call those (anymore). Or at least not in a way that gives incorrect results.
But we added them here to have that tested.

I see the failing test due to _values, and why it's difficult to figure out the cases. This PR is kind of mired in that purgatory, so I think I'll just close. Ok?

CLN: In DecimalArray implement aliases as properties

93972dc

Much safer, since _data may be assigned to

jreback requested changes Jun 17, 2019

View reviewed changes

ghost mentioned this pull request Jun 17, 2019

BUG: Fix series.round() handling of EA #26817

Closed

5 tasks

Improve comment

914840a

jorisvandenbossche reviewed Jun 19, 2019

View reviewed changes

ghost closed this Jun 21, 2019

ghost deleted the clean_decimal branch July 31, 2019 16:19

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: In DecimalArray implement aliases as properties #26900

CLN: In DecimalArray implement aliases as properties #26900

ghost commented Jun 17, 2019 •

edited by ghost

Loading

jreback left a comment

jreback Jun 17, 2019

ghost Jun 17, 2019

ghost Jun 19, 2019

jorisvandenbossche Jun 19, 2019

codecov bot commented Jun 17, 2019

codecov bot commented Jun 17, 2019 •

edited

Loading

jorisvandenbossche commented Jun 18, 2019

ghost commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019

ghost commented Jun 19, 2019 •

edited by ghost

Loading

ghost commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019

jorisvandenbossche Jun 19, 2019

jorisvandenbossche Jun 19, 2019

ghost Jun 19, 2019

ghost Jun 19, 2019 •

edited by ghost

Loading

jorisvandenbossche Jun 19, 2019 •

edited

Loading

ghost Jun 19, 2019 •

edited by ghost

Loading

jorisvandenbossche Jun 19, 2019

ghost Jun 19, 2019

jorisvandenbossche Jun 19, 2019

ghost Jun 20, 2019

CLN: In DecimalArray implement aliases as properties #26900

CLN: In DecimalArray implement aliases as properties #26900

Conversation

ghost commented Jun 17, 2019 • edited by ghost Loading

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 17, 2019

Codecov Report

codecov bot commented Jun 17, 2019 • edited Loading

Codecov Report

jorisvandenbossche commented Jun 18, 2019

ghost commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019

ghost commented Jun 19, 2019 • edited by ghost Loading

ghost commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost Jun 19, 2019 • edited by ghost Loading

Choose a reason for hiding this comment

jorisvandenbossche Jun 19, 2019 • edited Loading

Choose a reason for hiding this comment

ghost Jun 19, 2019 • edited by ghost Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Jun 17, 2019 •

edited by ghost

Loading

codecov bot commented Jun 17, 2019 •

edited

Loading

ghost commented Jun 19, 2019 •

edited by ghost

Loading

ghost Jun 19, 2019 •

edited by ghost

Loading

jorisvandenbossche Jun 19, 2019 •

edited

Loading

ghost Jun 19, 2019 •

edited by ghost

Loading