This repository was archived by the owner on Oct 29, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 524
Convert dataframes directly to line protocol during 'write_points()' operation #363
Comments
Tested this approach. It's about 5x faster than the current method when float precision is ignored, and roughly 3x faster when float precision is controlled (i.e. such that the output is exactly the same as the output produced under the current method). The following test uses a dataframe of 1 million entries divided into 5 columns.
Note that the implementation in the previous post has been updated. |
Can you please open a pull request? |
PR opened as #364 |
Addressed in #364 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Uh oh!
There was an error while loading. Please reload this page.
The current dataframe client's
write_points
method is probably not as efficient as it could be: the dataframe is first converted to a json-like dict with_convert_dataframe_to_json
, and then written to line protocol withmake_lines
. Converting the dataframe to line protocol directly would avoid a lot of overhead associated with iterating through dicts.This could be done by implementing a simple
_convert_dataframe_to_lines
method in the dataframe client that takes advantage of pandas' vectorized string methods. This could then be passed to thewrite
method directly.Here's an example usage:
This approach also uses the
tag_columns
functionality I proposed in #362This implementation could use some more thorough testing for edge cases, but I can start a PR if anyone is interested.
The text was updated successfully, but these errors were encountered: