Performance degradation with line protocol on master vs. v5.0.0 #591

shushen · 2018-06-01T19:19:32Z

On master, commit bf232a7 introduced the much welcomed support of dropping NaN/null fields. But it has done so by assembling line by line with function format_line().

This has degraded the performance significantly. See profiling result attached diagram of how format_line() is taking the majority of consumed time.

Writing a 86400-row dataframe that took only about <10 seconds with v5.0.0 now takes about 90 seconds in my test.

The text was updated successfully, but these errors were encountered:

Assemble line by line in the commit bf232a7 to remove NaN has significant performance impact. This change fixes the issue by keeping the NaN fields before stringify the dataframe, replacing the fields with empty string, and reverting back to use pd.DataFrame.sum() function to yield the lines. Fixes: influxdata#591

Assemble line by line in the commit bf232a7 to remove NaN has significant performance impact. This change fixes the issue by keeping the NaN fields before stringify the dataframe, replacing the fields with empty string, and reverting back to use pd.DataFrame.sum() function to yield the lines. Fixes: #591

shushen mentioned this issue Jun 1, 2018

Fix performance degradation with line protocol #592

Merged

xginn8 closed this as completed in #592 Jun 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance degradation with line protocol on master vs. v5.0.0 #591

Performance degradation with line protocol on master vs. v5.0.0 #591

shushen commented Jun 1, 2018

Performance degradation with line protocol on master vs. v5.0.0 #591

Performance degradation with line protocol on master vs. v5.0.0 #591

Comments

shushen commented Jun 1, 2018