WriteApi.write does not support pandas' nullable integer #590

yannsartori · 2023-07-06T23:53:43Z

Specifications

Client Version: 1.36.1
InfluxDB Version: 2.7.0
Platform: Mac

If you have a dataframe with Pandas' nullable integer as one of the column datatypes, and a row includes a pd.NA value, you get the following traceback:

Traceback (most recent call last):
    write_api.write(
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
    return self._write_batching(bucket, org, record,
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
    serializer.serialize(chunk_idx),
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
    return list(lp)
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
    lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
    for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
    return any(map(lambda x: _not_nan(p[x]), indexes))
  File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

However, if your change your column datatype to a float (which has a native NaN encoding), it works

Code sample to reproduce problem

import pandas as pd

df = pd.DataFrame({"x": [1, pd.NA], "time": [0, 1]}).astype({"x": "Int64"})
with get_client() as client:
    with client.write_api() as write_api:
        write_api.write(BUCKET, record=df, data_frame_measurement_name="test", data_frame_timestamp_column="time")

Expected behavior

I would anticipate that this behaves the same as if it were a float. My current work around is to use floats.

If the code is too complicated to fix/would incur significant slowdown for other users, I think at minimum, raising a cleaner exception would be reasonable.

Actual behavior

I get an exception:

Traceback (most recent call last):
    write_api.write(
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
    return self._write_batching(bucket, org, record,
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
    serializer.serialize(chunk_idx),
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
    return list(lp)
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
    lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
    for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
    return any(map(lambda x: _not_nan(p[x]), indexes))
  File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Additional info

My knee-jerk reaction is I saw is in client/write/dataframe_serializer.py, there is a function:

def _not_nan(x):
    return x == x

which I think can just be

def _not_nan(x):
    from ...extras import pd
    return pd.isna(x)

However, I saw this block of code:

                if null_columns[index]:
                    key_value = f"""{{
                            '' if {val_format} == '' or type({val_format}) == float and math.isnan({val_format}) else
                            f',{key_format}={{str({val_format}).translate(_ESCAPE_STRING)}}'
                        }}"""

which looks pretty crazy, and I am not sure how the data would look at that point?

The text was updated successfully, but these errors were encountered:

ianog-eng · 2024-03-12T18:54:27Z

I have exactly the same issue

yannsartori added the bug Something isn't working label Jul 6, 2023

bednar mentioned this issue Apr 2, 2024

fix: serialize Pandas NaN values into LineProtocol #648

Merged

6 tasks

bednar self-assigned this Apr 15, 2024

bednar closed this as completed in #648 Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WriteApi.write does not support pandas' nullable integer #590

WriteApi.write does not support pandas' nullable integer #590

yannsartori commented Jul 6, 2023

ianog-eng commented Mar 12, 2024

WriteApi.write does not support pandas' nullable integer #590

WriteApi.write does not support pandas' nullable integer #590

Comments

yannsartori commented Jul 6, 2023

Specifications

Code sample to reproduce problem

Expected behavior

Actual behavior

Additional info

ianog-eng commented Mar 12, 2024