Dataframe client support for (i) tag columns and (ii) direct conversion to line protocol #364

mdbartos · 2016-08-29T20:43:11Z

This pull request addresses issues #362 and #363.

tag_columns and field_columns can now be specified in the write_points method, allowing some columns to be treated as tags and others to be treated as fields. Global tags can still be specified using the tags keyword argument (meaning that this change shouldn't break any old code).
Dataframes are now converted directly to line protocol. This results in a ~5x speed boost compared to the old method.

Additions:

_dataframe_client.py
- new functions:
  - _convert_dataframe_to_lines: Converts dataframe to line protocol.
  - _stringify_dataframe: Helper function for converting dataframe to string type.
- changed functions:
  - write_points: Added protocol ('line' or 'json') and numeric precision keyword args.
  - _convert_dataframe_to_json: Tag columns can now be specified.
client.py
- changed functions:
  - write: Added protocol ('line' or 'json') keyword arg, and support for direct line protocol.
  - write_points: Same as write.
  - _write_points: Same as write.
  - send_packet: Same as write.
dataframe_client_test.py
- new functions:
  - test_write_points_from_dataframe_with_tag_columns: Self-explanatory.
  - test_write_points_from_dataframe_with_tag_cols_and_global_tags: Self-explanatory.
  - test_write_points_from_dataframe_with_tag_cols_and_defaults: Tests default behavior (i.e. when tag columns are specified, but field columns aren't, etc.)
  - test_write_points_from_dataframe_with_numeric_precision: Tests for correct numeric precision behavior.
- changed functions:
  - (all tests): In expected_response, order of tags/fields was changed to match the order they appear in the dataframe (for more consistent testing)

Tag/field columns default behavior:

If neither tag columns nor field columns are specified, all columns are assumed to be field columns (this is consistent with previous behavior).
If tag columns are specified, but no field columns are specified, all column names not included in tag columns are assumed to be field columns.
If tag columns are not specified, but field columns are specified, all column names not included in field columns are assumed to be tag columns.
If tag columns and field columns are specified, only those columns included in tag columns or field columns are included in the write.
See dataframe_client_test/test_write_points_with_tag_columns_and_defaults for examples of expected behavior.

Minor issues:

Haven't tested with older versions of pandas.
I left _convert_dataframe_to_json in for the time being, but it can probably be removed.
To get the Travis build to work, I had to disable cache. Cache should be cleared.

When you get time to review, please let me know if you have any questions or concerns.

Thanks,
MDB

… build.

mdbartos · 2016-08-29T21:34:30Z

Python2.7 build is failing because Travis can't install pandas:

error: Error -5 while decompressing data: incomplete or truncated stream
...

ERROR: could not install deps [-r/home/travis/build/influxdata/influxdb-python/requirements.txt, -r/home/travis/build/influxdata/influxdb-python/test-requirements.txt, pandas]; v = InvocationError('/home/travis/build/influxdata/influxdb-python/.tox/py27/bin/pip install -r/home/travis/build/influxdata/influxdb-python/requirements.txt -r/home/travis/build/influxdata/influxdb-python/test-requirements.txt pandas (see /home/travis/build/influxdata/influxdb-python/.tox/py27/log/py27-1.log)', 2)

The cache might need to be cleared:
pypa/pip#3359

I can also try running a build with no cache.

tzonghao · 2016-08-30T01:17:09Z

influxdb/_dataframe_client.py

+                             datatype='field'):
+
+        # Find int and string columns for field-type data
+        int_columns = dataframe.select_dtypes(include=['int']).columns


Can we change this to include=['integer'] so other integer subtypes won't be treated as floats?

mdbartos · 2016-08-30T07:08:49Z

@tzonghao : This change has been made, along with some other small improvements to stringify_dataframe. I also added tests for floating point precision in test_write_points_from_dataframe_with_numeric_precision.

Also, for some reason, tox tests on my machine expect a retention policy duration of '0s' instead of '0' in tests.server_tests.client_test_with_server.CommonTests. This means that in order to get the Travis build to pass I have to fail the tox tests on my own machine. Not a big problem, but I'm not sure what's causing it. I'm using influxdb v0.13 on Manjaro Linux, built from AUR [InfluxDB version: InfluxDB v0.13.0 (git: unknown e57fb88a051ee40fd9277094345fbd47bb4783ce)].

tzonghao · 2016-08-30T18:09:16Z

@mdbartos Awesome pull request, it definitely makes things faster and easier. Thank you.

tzonghao · 2016-08-31T14:11:01Z

influxdb/_dataframe_client.py

+                    time_precision,
+                    database,
+                    retention_policy,
+                    protocol='line')


should be protocol=protocol

aviau · 2016-08-31T17:52:47Z

@tzonghao Thank you for reviewing this!

mdbartos · 2016-09-03T05:42:54Z

Thanks again for reviewing @tzonghao. These changes have been made.

tzonghao · 2016-09-06T13:40:42Z

@aviau @mdbartos You're welcome. Unless someone else wants to have another look, we're good to go.

aviau · 2016-09-06T14:09:56Z

Thank you @mdbartos

mdbartos · 2016-09-06T15:13:50Z

One last thing: I edited the travis.yml file to get the build to work properly, but it should probably be changed back to the way it was (also, the travis cache should be cleared to allow pandas to build properly on 2.7).

aviau · 2016-09-06T15:30:43Z

Yeah, I had noticed and was planning to fix it.

mousumipaul · 2018-05-15T08:31:29Z

Hi I am new to influxdb. When I am trying to insert a dataframe into influxdb i am getting "NameError: name '_convert_dataframe_to_lines' is not defined" error. In my code i have imported "from influxdb import dataframe_client"
my dataframe has index in datetime.
It has one field and two tags.
The code for insertion is:

tags ={'tag1': df[['tag1']], 'tag2': df[['tag2']]}
dp = _convert_dataframe_to_lines(None, dataframe=df, measurement='M1', tag_columns=tags, field_columns=['UI'])
client.write_points(dp)

I have checked that _dataframe_client.py exists. But anyway I can't figure it out how to execute it

mdbartos added 5 commits August 29, 2016 04:04

Addressed issues 362 and 363

30e4e87

Added unit tests for tag columns. All tests working.

7e092ad

Added more comments and docstrings

1af9d4e

Rolled back changes to retention policy duration.

c3832a1

Added comments to _dataframe_client. Re-pushing to try and fix travis…

c8e3e99

… build.

Try rebuilding without cache

423b8c9

mdbartos mentioned this pull request Aug 29, 2016

Convert dataframes directly to line protocol during 'write_points()' operation #363

Closed

tzonghao reviewed Aug 30, 2016
View reviewed changes

Minor changes to _stringify_dataframe. Added test for numeric precision.

56062c5

tzonghao reviewed Aug 31, 2016
View reviewed changes

influxdb/_dataframe_client.py

time_precision,

database,

retention_policy,

protocol='line')

Copy link

Contributor

tzonghao Aug 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be protocol=protocol

Incorporated fixes from @tzonghao. Fixed docstrings.

15a3d33

aviau merged commit 1343ae9 into influxdata:master Sep 6, 2016

gte620v mentioned this pull request Aug 25, 2017

data frame with tag columns #286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataframe client support for (i) tag columns and (ii) direct conversion to line protocol #364

Dataframe client support for (i) tag columns and (ii) direct conversion to line protocol #364

Uh oh!

mdbartos commented Aug 29, 2016 •

edited

Loading

Uh oh!

mdbartos commented Aug 29, 2016

Uh oh!

tzonghao Aug 30, 2016

Uh oh!

mdbartos commented Aug 30, 2016

Uh oh!

tzonghao commented Aug 30, 2016

Uh oh!

tzonghao Aug 31, 2016

Uh oh!

aviau commented Aug 31, 2016

Uh oh!

mdbartos commented Sep 3, 2016

Uh oh!

tzonghao commented Sep 6, 2016

Uh oh!

aviau commented Sep 6, 2016

Uh oh!

mdbartos commented Sep 6, 2016

Uh oh!

aviau commented Sep 6, 2016

Uh oh!

mousumipaul commented May 15, 2018

Uh oh!

Uh oh!

Dataframe client support for (i) tag columns and (ii) direct conversion to line protocol #364

Dataframe client support for (i) tag columns and (ii) direct conversion to line protocol #364

Uh oh!

Conversation

mdbartos commented Aug 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdbartos commented Aug 29, 2016

Uh oh!

tzonghao Aug 30, 2016

Choose a reason for hiding this comment

Uh oh!

mdbartos commented Aug 30, 2016

Uh oh!

tzonghao commented Aug 30, 2016

Uh oh!

tzonghao Aug 31, 2016

Choose a reason for hiding this comment

Uh oh!

aviau commented Aug 31, 2016

Uh oh!

mdbartos commented Sep 3, 2016

Uh oh!

tzonghao commented Sep 6, 2016

Uh oh!

aviau commented Sep 6, 2016

Uh oh!

mdbartos commented Sep 6, 2016

Uh oh!

aviau commented Sep 6, 2016

Uh oh!

mousumipaul commented May 15, 2018

Uh oh!

Uh oh!

mdbartos commented Aug 29, 2016 •

edited

Loading