-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: groupby().size gives casting error on 32 bit platform #11189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @behzadnouri |
Hmm, although this is maybe more an issue with my numpy installation? |
Possibly an issue with |
since |
with |
I setup a vm for linux-32 these break for the same reasons as well. |
both of them u added
diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index e72f7c6..06b1105 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -1808,15 +1808,17 @@ class BinGrouper(BaseGrouper):
@cache_readonly
def group_info(self):
ngroups = self.ngroups
- obs_group_ids = np.arange(ngroups, dtype='int64')
+ obs_group_ids = np.arange(ngroups)
rep = np.diff(np.r_[0, self.bins])
if ngroups == len(self.bins):
- comp_ids = np.repeat(np.arange(ngroups, dtype='int64'), rep)
+ comp_ids = np.repeat(np.arange(ngroups), rep)
else:
- comp_ids = np.repeat(np.r_[-1, np.arange(ngroups, dtype='int64')], rep)
+ comp_ids = np.repeat(np.r_[-1, np.arange(ngroups)], rep)
- return comp_ids, obs_group_ids, ngroups
+ return comp_ids.astype('int64', copy=False), \
+ obs_group_ids.astype('int64', copy=False), \
+ ngroups
@cache_readonly
def ngroups(self):
@@ -2565,8 +2567,8 @@ class SeriesGroupBy(GroupBy):
# group boundries are where group ids change
# unique observations are where sorted values change
- idx = com._ensure_int64(np.r_[0, 1 + np.nonzero(ids[1:] != ids[:-1])[0]])
- inc = com._ensure_int64(np.r_[1, val[1:] != val[:-1]])
+ idx = np.r_[0, 1 + np.nonzero(ids[1:] != ids[:-1])[0]]
+ inc = np.r_[1, val[1:] != val[:-1]]
# 1st item of each group is a new unique observation
mask = isnull(val)
@@ -2577,7 +2579,7 @@ class SeriesGroupBy(GroupBy):
inc[mask & np.r_[False, mask[:-1]]] = 0
inc[idx] = 1
- out = np.add.reduceat(inc, idx)
+ out = np.add.reduceat(inc, idx).astype('int64', copy=False)
return Series(out if ids[0] != -1 else out[1:],
index=self.grouper.result_index,
name=self.name)
|
I changed them because they didn't work on window. I will try your diff and let u know. |
COMPAT: platform_int fixes in groupby ops, #11189
Hello, I have issue with minlength which seems to be fix with this solved issue... Are we going to have pandas 0.17.1 version tag soon? It really unreliable without it to pass to 0.17.x which seems to bring many benefit despite this ".size() minlength" issue... And if not is using master consider "safe" (read as stable as an official tagged commit)? Thanks |
Sorry, I think I am wrong... This bug fix seems to be include in 0.17.0 But I have this error : <type 'exceptions.ValueError'> minlength must be positive File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 820, in size Which seems related... I just do : grouped.size() Which were working just fine prior to upgrade from 0.15.0 to 0.17.0 Could it be related? |
Downgrading to 0.16.2 solve my issue... :( |
you would have to show a copy-pastable example along with pd.show_versions() |
I will try to reproduce with dummy data, because I can't include the actual data... Give me a couples of days. Thanks |
The text was updated successfully, but these errors were encountered: