Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

DataFrameClient issue - seems does not process correctly DateTimeIndex dates #479

Closed
JohnSamler opened this issue Jul 19, 2017 · 6 comments

Comments

@JohnSamler
Copy link

JohnSamler commented Jul 19, 2017

I'm having problems uploading dataframes into InfluxDB, using write_points.

The problem is due to this part of the code:

File _dataframe_client.py

        # Make array of timestamp ints
        if isinstance(dataframe.index, pd.PeriodIndex):
            time = ((dataframe.index.to_timestamp().values.astype(int) /
                     precision_factor).astype(int).astype(str))
        else:
            time = ((pd.to_datetime(dataframe.index).values.astype(int) /
                     precision_factor).astype(int).astype(str))

That int cast doesn't work in my code. I tested just casting datetime's into ints and this is what i get:

    In [24]: numpy.datetime64('2016-01-05T00:00:00.000000000').astype(int)
    Out[24]: -1226113024
    In [25]: numpy.datetime64('2016-01-04T00:00:00.000000000').astype(int)
    Out[25]: 630980608
    In [26]: numpy.version.version
    Out[26]: '1.11.3'

I had to modify the code to cast to np.int64 for it to work (not using PeriodIndex):

        # Make array of timestamp ints
        if isinstance(dataframe.index, pd.PeriodIndex):
            time = ((dataframe.index.to_timestamp().values.astype(int) /
                     precision_factor).astype(int).astype(str))
        else:
            time = ((pd.to_datetime(dataframe.index).values.astype(np.int64) /
                     precision_factor).astype(np.int64).astype(str))

Don't think the cast as coded currently is correct, or a case is missing.

@patrickhoebeke
Copy link
Contributor

patrickhoebeke commented Aug 23, 2017

I confirm the issue.
On my Ubuntu, I do not have the issue, the casting to int generates the correct Epoch required by Influx.
On Windows, the conversion ins rather chaotic if using 'int'.
Casting to np.int64 solves the issue on both OS.

So I propose the following modification:

# Make array of timestamp ints
if isinstance(dataframe.index, pd.PeriodIndex):
    time = ((dataframe.index.to_timestamp().values.astype(np.int64) /
        precision_factor).astype(int).astype(str))
else:
    time = ((pd.to_datetime(dataframe.index).values.astype(np.int64) /
        precision_factor).astype(np.int64).astype(str))

As a side note, from a performance point of view, direct casting seems to be more performant. For example with a DataFrame with 1 million rows:

%timeit (pd.to_datetime(dataframe.index).values.astype(np.int64) /  1e9).astype(np.int64).astype(str)

10 loops, best of 3: 113 ms per loop

%timeit str(np.int64(np.int64(pd.to_datetime(dataframe.index).values) /  1e9))

100 loops, best of 3: 5.65 ms per loop

patrickhoebeke added a commit to patrickhoebeke/influxdb-python-pho that referenced this issue Aug 23, 2017
…to Unix Epoch (e.g .on (some?) Windows machines)
patrickhoebeke added a commit to patrickhoebeke/influxdb-python-pho that referenced this issue Aug 23, 2017
…erted to Unix Epoch (e.g .on (some?) Windows machines)
@patrickhoebeke
Copy link
Contributor

It seems that it is related to the number of bits used by the conversion to int (32 bits VS 64bits).
I did not have the issue on my machine (running 64bits Linux) but my colleague has the issue on his machine (Windows 64bits). But it looks like the default int type on Windows is 32bits (even on 64bits machine) see here.
I could reproduce the bug on my Linux machine by simply using np.int32 in place of int.
So it looks like astype(int) converts to 32bits on some machines and to 64bits on others.
Using np.int64 solved the issue.
I've just created a pull request that solves the issue. (sorry it is split in two commits as I was wrong with the first fix)

Two remaining questions:

  1. Is this fix compliant with 32bits machine? If fear that if we want to provide nanoseconds timestamps inputs to Influx, np.int64 is a requirement. So should it be part of some check when running the code?
  2. I am not sure whether a specific unittest is required

@patrickhoebeke
Copy link
Contributor

I've just made a cleaner pull request here : #507

@xginn8 xginn8 closed this as completed in bf232a7 Nov 25, 2017
@mousumipaul
Copy link

mousumipaul commented Dec 6, 2018

I am also facing the same error. I tried the following code:

from datetime import datetime
from influxdb import DataFrameClient
import pandas as pd
a = datetime.now()
times = []
times.append(a)
df = pd.DataFrame({'A':[5]}, index=times )
client = DataFrameClient('XXX.XXX.XX.XXX', 8086, 'XXX', 'XXX', 'XXXX')
client.write_points(df, 'M1', protocol='json')

I got the following error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/site-packages/influxdb/_dataframe_client.py", line 125, in write_points
field_columns=field_columns)
File "/usr/local/lib/python2.7/site-packages/influxdb/_dataframe_client.py", line 194, in _convert_dataframe_to_json
dataframe.index = dataframe.index.to_datetime()
AttributeError: 'DatetimeIndex' object has no attribute 'to_datetime'

The above code is working fine at influxdb of version 1.3 but in 1.5 i am getting the above error

@patrickhoebeke
Copy link
Contributor

patrickhoebeke commented Dec 6, 2018

@mousumipaul
I think it is related to the version of pandas you are using.
It seems that pandas.DatetimeIndex.to_datetime exists in pandas 0.22 but disappeared in version 0.23
https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DatetimeIndex.html
https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DatetimeIndex.html
Much less methods in 0.23

side note : your question is not related to the original issue of this post :-)

@mousumipaul
Copy link

@patrickhoebeke
Thanks Patrick for the answer. You are right. It is the problem with pandas. I have changed the pandas version from 0.23.0 to 0.22.0 and it worked perfectly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants