Skip to content

WriteApi.write does not support pandas' nullable integer #590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yannsartori opened this issue Jul 6, 2023 · 1 comment · Fixed by #648
Closed

WriteApi.write does not support pandas' nullable integer #590

yannsartori opened this issue Jul 6, 2023 · 1 comment · Fixed by #648
Assignees
Labels
bug Something isn't working

Comments

@yannsartori
Copy link

Specifications

  • Client Version: 1.36.1
  • InfluxDB Version: 2.7.0
  • Platform: Mac

If you have a dataframe with Pandas' nullable integer as one of the column datatypes, and a row includes a pd.NA value, you get the following traceback:

Traceback (most recent call last):
    write_api.write(
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
    return self._write_batching(bucket, org, record,
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
    serializer.serialize(chunk_idx),
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
    return list(lp)
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
    lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
    for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
    return any(map(lambda x: _not_nan(p[x]), indexes))
  File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

However, if your change your column datatype to a float (which has a native NaN encoding), it works

Code sample to reproduce problem

import pandas as pd

df = pd.DataFrame({"x": [1, pd.NA], "time": [0, 1]}).astype({"x": "Int64"})
with get_client() as client:
    with client.write_api() as write_api:
        write_api.write(BUCKET, record=df, data_frame_measurement_name="test", data_frame_timestamp_column="time")

Expected behavior

I would anticipate that this behaves the same as if it were a float. My current work around is to use floats.

If the code is too complicated to fix/would incur significant slowdown for other users, I think at minimum, raising a cleaner exception would be reasonable.

Actual behavior

I get an exception:

Traceback (most recent call last):
    write_api.write(
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
    return self._write_batching(bucket, org, record,
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
    serializer.serialize(chunk_idx),
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
    return list(lp)
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
    lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
    for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
    return any(map(lambda x: _not_nan(p[x]), indexes))
  File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Additional info

My knee-jerk reaction is I saw is in client/write/dataframe_serializer.py, there is a function:

def _not_nan(x):
    return x == x

which I think can just be

def _not_nan(x):
    from ...extras import pd
    return pd.isna(x)    

However, I saw this block of code:

                if null_columns[index]:
                    key_value = f"""{{
                            '' if {val_format} == '' or type({val_format}) == float and math.isnan({val_format}) else
                            f',{key_format}={{str({val_format}).translate(_ESCAPE_STRING)}}'
                        }}"""

which looks pretty crazy, and I am not sure how the data would look at that point?

@yannsartori yannsartori added the bug Something isn't working label Jul 6, 2023
@ianog-eng
Copy link

I have exactly the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants