-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
BUG: Maintain column order with groupby.nth #22811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello @reidy-p! Thanks for updating the PR.
Comment last updated on October 20, 2018 at 22:24 Hours UTC |
| @@ -497,7 +497,8 @@ def _set_group_selection(self): | |||
|
|
|||
| if len(groupers): | |||
| # GH12839 clear selected obj cache when group selection changes | |||
| self._group_selection = ax.difference(Index(groupers)).tolist() | |||
| self._group_selection = ax.difference(Index(groupers), | |||
| sort=False).tolist() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index.difference tries to sort its result by default and this means that sometimes the order of the columns was changed from the original DataFrame. I added a new sort parameter to Index.difference with a default of True to control this.
Codecov Report
@@ Coverage Diff @@
## master #22811 +/- ##
=========================================
Coverage ? 92.2%
=========================================
Files ? 169
Lines ? 50927
Branches ? 0
=========================================
Hits ? 46955
Misses ? 3972
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. there are a couple of other PRs out there which add a sort=True default to the set ops, though I think , though might be for intersection IIRC. can you see if you can locate / reference.
| sort : bool, default True | ||
| Sort the resulting index if possible | ||
|
|
||
| .. versionadded:: 0.24.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make sure this is added to all subclasses as well (mutli, interval) I think have there own impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this (in this PR), can ideally update the tests for .difference for all types to parameterize it where appropriate
| try: | ||
| the_diff = sorting.safe_sort(the_diff) | ||
| except TypeError: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests in the index tests to exercise this (prob just parameterize the parameter in the tests)
|
the other issue is #17378 |
|
just wanted to make you aware of the other issues not super necessary to actually do it in this PR unless it’s straightforwars |
|
Mostly cross referencing for myself. #21603 should become a lot easier when this is completed; |
|
lgtm. can you rebase, ping on green. |
dc2428c to
acf06ad
Compare
|
@jreback I've rebased now but do you want me to add @mroeschke I had the exact same thought when I was working on a PR for |
acf06ad to
67218b7
Compare
yes new PR after this is merged. |
| sort : bool, default True | ||
| Sort the resulting index if possible | ||
|
|
||
| .. versionadded:: 0.24.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this (in this PR), can ideally update the tests for .difference for all types to parameterize it where appropriate
8f881c8 to
cdc638b
Compare
|
I have made some updates but I just realised that I need to add the new sort parameter to the tests for the subclasses of Index |
922f1eb to
f7446b5
Compare
| @@ -1040,7 +1040,11 @@ def func(self, other): | |||
| 'objects that have compatible dtypes') | |||
| raise TypeError(msg.format(op=op_name)) | |||
|
|
|||
| result = getattr(self._multiindex, op_name)(other._multiindex) | |||
| if op_name == 'difference': | |||
| result = getattr(self._multiindex, op_name)(other._multiindex, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward at the moment because difference is the only set operation with the sort parameter. But if we add a sort parameter to the other set operations I think we can get rid of the if statement
| @@ -2791,8 +2791,14 @@ def difference(self, other, sort=True): | |||
| labels=[[]] * self.nlevels, | |||
| names=result_names, verify_integrity=False) | |||
|
|
|||
| difference = set(self._ndarray_values) - set(other._ndarray_values) | |||
| this = self._get_unique_index() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old way of doing this using set did not preserve the original order so I took this code from the difference method in pandas/core/indexes/base.py:
pandas/pandas/core/indexes/base.py
Lines 2950 to 2957 in 145c227
| this = self._get_unique_index() | |
| indexer = this.get_indexer(other) | |
| indexer = indexer.take((indexer != -1).nonzero()[0]) | |
| label_diff = np.setdiff1d(np.arange(this.size), indexer, | |
| assume_unique=True) | |
| the_diff = this.values.take(label_diff) |
| rng1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) | ||
| @pytest.mark.parametrize("sort", [True, False]) | ||
| def test_difference(self, tz, sort): | ||
| rng_dates = ['1/2/2000', '1/3/2000', '1/1/2000', '1/4/2000', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to ensure that the sort parameter was getting a proper test with unsorted data so I have rewritten some tests to have unsorted data (e.g., by manually specifying a list of dates here rather than using date_range). I have made similar changes to other existing tests.
bf9c4f9 to
d4ec2d9
Compare
|
can you rebase and fixup |
ea186c0 to
27ae813
Compare
27ae813 to
d030680
Compare
d030680 to
2e4b31a
Compare
|
rebased. though will look again. |
|
thanks @reidy-p nice job! |
git diff upstream/master -u -- "*.py" | flake8 --diff