Skip to content

COMPAT: Emit warning when groupby by a tuple #18731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 18, 2017
Prev Previous commit
Next Next commit
Correct KeyError
  • Loading branch information
TomAugspurger committed Dec 15, 2017
commit d8c20e85ea9363bbdbe58c16b51fe9b76eb4e24d
12 changes: 9 additions & 3 deletions pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2861,13 +2861,14 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
is_tuple = isinstance(key, tuple)
all_hashable = is_tuple and all(is_hashable(x) for x in key)

original_grouper = None
if is_tuple:
if not all_hashable or key not in obj:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm lost. Why do you check that elements are not hashable? I would have done instead

if all_hashable and key not in obj and set(key).issubset(obj):

or (if we want to account for the to-be-deprecated possibility to index with missing keys):

if all_hashable and key not in obj and set(key) & (obj):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better - performance-wise:

if key not in obj and all(is_hashable(x) for x in key) and set(key).issubset(obj):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the case where you're grouping by non-hashable arrays like in #18314 (comment)

In that case, don't we know that they're certainly relying on groupby((a, b)) to be groupby([a, b]), so we want to warn and listify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do still need to handle your KeyError example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see what you're doing now. Yes, that's probably better, and will make handling the KeyError easier.

msg = ("Interpreting tuple 'by' as a list of keys, rather than "
"a single key. Use 'by={!r}' instead of 'by={!r}'. In the "
"future, a tuple will always mean a single key.".format(
list(key), key))
"a single key. Use 'by=[...]' instead of 'by=(...)'. In "
"the future, a tuple will always mean a single key.")
warnings.warn(msg, FutureWarning, stacklevel=5)
original_grouper = key
key = list(key)

if not isinstance(key, list):
Expand Down Expand Up @@ -2939,6 +2940,11 @@ def is_in_obj(gpr):
elif obj._is_level_reference(gpr):
in_axis, name, level, gpr = False, None, gpr, None
else:
# Want to raise with the correct KeyError here
# The deprecation in #18731 means we may have
# the wrong error message here.
if original_grouper:
gpr = original_grouper
raise KeyError(gpr)
elif isinstance(gpr, Grouper) and gpr.key is not None:
# Add key to exclusions
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2755,6 +2755,14 @@ def test_tuple_warns_unhashable(self):

assert "Interpreting tuple 'by' as a list" in str(w[0].message)

def test_tuple_correct_keyerror(self):
df = pd.DataFrame(1, index=range(3),
columns=pd.MultiIndex.from_product([[1, 2],
[3, 4]]))
with tm.assert_produces_warning(FutureWarning): # just silence
with tm.assert_raises_regex(KeyError, "(7, 8)"):
df.groupby((7, 8)).mean()


def _check_groupby(df, result, keys, field, f=lambda x: x.sum()):
tups = lmap(tuple, df[keys].values)
Expand Down