Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

Performance degradation with line protocol on master vs. v5.0.0 #591

Closed
shushen opened this issue Jun 1, 2018 · 0 comments · Fixed by #592
Closed

Performance degradation with line protocol on master vs. v5.0.0 #591

shushen opened this issue Jun 1, 2018 · 0 comments · Fixed by #592

Comments

@shushen
Copy link
Contributor

shushen commented Jun 1, 2018

On master, commit bf232a7 introduced the much welcomed support of dropping NaN/null fields. But it has done so by assembling line by line with function format_line().

This has degraded the performance significantly. See profiling result attached diagram of how format_line() is taking the majority of consumed time.
performance degradation

Writing a 86400-row dataframe that took only about <10 seconds with v5.0.0 now takes about 90 seconds in my test.

shushen added a commit to shushen/influxdb-python that referenced this issue Jun 1, 2018
Assemble line by line in the commit bf232a7 to remove NaN has
significant performance impact.

This change fixes the issue by keeping the NaN fields before stringify
the dataframe, replacing the fields with empty string, and reverting
back to use pd.DataFrame.sum() function to yield the lines.

Fixes: influxdata#591
shushen added a commit to shushen/influxdb-python that referenced this issue Jun 14, 2018
Assemble line by line in the commit bf232a7 to remove NaN has
significant performance impact.

This change fixes the issue by keeping the NaN fields before stringify
the dataframe, replacing the fields with empty string, and reverting
back to use pd.DataFrame.sum() function to yield the lines.

Fixes: influxdata#591
shushen added a commit to shushen/influxdb-python that referenced this issue Jun 29, 2018
Assemble line by line in the commit bf232a7 to remove NaN has
significant performance impact.

This change fixes the issue by keeping the NaN fields before stringify
the dataframe, replacing the fields with empty string, and reverting
back to use pd.DataFrame.sum() function to yield the lines.

Fixes: influxdata#591
xginn8 pushed a commit that referenced this issue Jun 30, 2018
Assemble line by line in the commit bf232a7 to remove NaN has
significant performance impact.

This change fixes the issue by keeping the NaN fields before stringify
the dataframe, replacing the fields with empty string, and reverting
back to use pd.DataFrame.sum() function to yield the lines.

Fixes: #591
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant