Pivot_table drop rows whose entries are all NaN #21969

noob-saibot · 2018-07-18T12:04:19Z

>>> dataframe = pd.DataFrame([['a', 1, 10], ['b', 10, 100], ['c', None, None], ['a', 2, 4]], columns=['lit', 'num1', 'num2'])
>>> dataframe.pivot_table(index='lit', columns='num1', values='num2', aggfunc='max')
num1  1.0   2.0    10.0
lit
a     10.0   4.0    NaN
b      NaN   NaN  100.0

Pivot_table is silently dropping row whose entries fully consisting with NaN. (according to the documentation - dropna : boolean, default True; Do not include columns whose entries are all NaN)

It works fine at version 0.21.1 and 0.22.0.

I found only old bug from 2013.

Expected Output

>>> dataframe.pivot_table(index='lit', columns='num1', values='num2', aggfunc='max')
num1    1    2      10
lit
a     10.0  4.0    NaN
b      NaN  NaN  100.0
c      NaN  NaN    NaN

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.23.1
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.27
numpy: 1.14.5
scipy: None
pyarrow: 0.9.0
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2018-07-18T23:59:05Z

@noob-saibot : Can you add a reference to the old issue?

cc @jreback : Is it intended to drop NA from rows as well? Or is this a regression?

noob-saibot · 2018-07-19T07:25:15Z

@gfyoung issue 3729

neomis · 2018-11-20T15:56:07Z

I'm also experiencing this issue in 0.23.4 with an additional snag. If you set dropna=False it will add nonexistent rows to the output table.

import pandas as pd
from numpy import nan
data_set = [
    {'x': 0, 'y': 0, 'parameter_id': 'a', 'result_value': 3.0},
    {'x': 0, 'y': 0, 'parameter_id': 'b', 'result_value': 1.0},
    {'x': 0, 'y': 0, 'parameter_id': 'c', 'result_value': 3.0},
    {'x': 0, 'y': 0, 'parameter_id': 'd', 'result_value': nan},
    {'x': 0, 'y': 1, 'parameter_id': 'a', 'result_value': 1.0},
    {'x': 0, 'y': 1, 'parameter_id': 'b', 'result_value': 3.0},
    {'x': 0, 'y': 1, 'parameter_id': 'c', 'result_value': nan},
    {'x': 0, 'y': 1, 'parameter_id': 'd', 'result_value': nan},
    {'x': 0, 'y': 2, 'parameter_id': 'a', 'result_value': 1.0},
    {'x': 0, 'y': 2, 'parameter_id': 'b', 'result_value': 3.0},
    {'x': 0, 'y': 2, 'parameter_id': 'c', 'result_value': nan},
    {'x': 0, 'y': 2, 'parameter_id': 'd', 'result_value': nan},
    {'x': 1, 'y': 0, 'parameter_id': 'a', 'result_value': 1.0},
    {'x': 1, 'y': 0, 'parameter_id': 'b', 'result_value': 3.0},
    {'x': 1, 'y': 0, 'parameter_id': 'c', 'result_value': nan},
    {'x': 1, 'y': 0, 'parameter_id': 'd', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'a', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'b', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'c', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'd', 'result_value': nan}]

df = pd.DataFrame(data_set)
df.pivot_table(index=['x', 'y'], columns='parameter_id',
                           values='result_value', dropna=False).reset_index()

Output

parameter_id  x  y    a    b    c   d
0             0  0  3.0  1.0  3.0 NaN
1             0  1  1.0  3.0  NaN NaN
2             0  2  1.0  3.0  NaN NaN
3             1  0  1.0  3.0  NaN NaN
4             1  1  NaN  NaN  NaN NaN
5             1  2  NaN  NaN  NaN NaN

Expected Output

parameter_id  x  y    a    b    c   d
0             0  0  3.0  1.0  3.0 NaN
1             0  1  1.0  3.0  NaN NaN
2             0  2  1.0  3.0  NaN NaN
3             1  0  1.0  3.0  NaN NaN
4             1  1  NaN  NaN  NaN NaN

drewlevitt · 2025-05-14T17:49:06Z

As of 2.2.2, dropna=True drops any rows and columns all of whose values are null.

>>> df = pd.DataFrame({
...     'tract': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1],
...     'year': [1990, 1990, 2000, 2000, 1990, 1990, 2000, 2000, 1990, 1990, 2000, 2000, 1990],
...     'mode': ['drive', 'walk', 'drive', 'walk', 'drive', 'walk', 'drive', 'walk', 'drive', 'walk', 'drive', 'walk', 'bike'],
...     'count': [np.nan, np.nan, np.nan, 3, 2, np.nan, 1, 4, np.nan, np.nan, 5,  9, np.nan]
... })
>>> df
    tract  year   mode  count
0       1  1990  drive    NaN
1       1  1990   walk    NaN
2       1  2000  drive    NaN
3       1  2000   walk    3.0
4       2  1990  drive    2.0
5       2  1990   walk    NaN
6       2  2000  drive    1.0
7       2  2000   walk    4.0
8       3  1990  drive    NaN
9       3  1990   walk    NaN
10      3  2000  drive    5.0
11      3  2000   walk    9.0
12      1  1990   bike    NaN
>>> df.pivot_table(index=["tract", "year"], columns="mode", values="count")  # dropna=True by default
mode        drive  walk
tract year
1     2000    NaN   3.0                                                                                                           
2     1990    2.0   NaN
      2000    1.0   4.0                                                                                                           
3     2000    5.0   9.0

Passing dropna=False preserves all rows and columns:

>>> df.pivot_table(index=["tract", "year"], columns="mode", values="count", dropna=False)
mode        bike  drive  walk
tract year
1     1990   NaN    NaN   NaN                                                                                                     
      2000   NaN    NaN   3.0
2     1990   NaN    2.0   NaN                                                                                                     
      2000   NaN    1.0   4.0
3     1990   NaN    NaN   NaN
      2000   NaN    5.0   9.0

Both of these behaviors seem reasonable to me, but the docs suggest that dropna affects columns only:

Do not include columns whose entries are all NaN.

It seems like either the behavior or the documentation should change; I would think updating the docs is best, but what say others?

gfyoung added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jul 18, 2018

tompollard mentioned this issue Aug 1, 2018

pivot_table drops columns for aggregate functions that return all None, even if dropna=False #22159

Closed

john-bodley mentioned this issue Aug 13, 2019

[viz] Revert dropna logic for pivot tables apache/superset#8040

Merged

12 tasks

mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pivot_table drop rows whose entries are all NaN #21969

Pivot_table drop rows whose entries are all NaN #21969

noob-saibot commented Jul 18, 2018 •

edited

Loading

gfyoung commented Jul 18, 2018

Uh oh!

noob-saibot commented Jul 19, 2018 •

edited

Loading

Uh oh!

neomis commented Nov 20, 2018

Uh oh!

drewlevitt commented May 14, 2025

Uh oh!

Uh oh!

Pivot_table drop rows whose entries are all NaN #21969

Pivot_table drop rows whose entries are all NaN #21969

Comments

noob-saibot commented Jul 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expected Output

Output of pd.show_versions()

gfyoung commented Jul 18, 2018

Uh oh!

noob-saibot commented Jul 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neomis commented Nov 20, 2018

Uh oh!

drewlevitt commented May 14, 2025

Uh oh!

noob-saibot commented Jul 18, 2018 •

edited

Loading

Output of `pd.show_versions()`

noob-saibot commented Jul 19, 2018 •

edited

Loading