This repository was archived by the owner on Oct 29, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 524
Performance degradation with line protocol on master vs. v5.0.0 #591
Comments
shushen
added a commit
to shushen/influxdb-python
that referenced
this issue
Jun 1, 2018
Assemble line by line in the commit bf232a7 to remove NaN has significant performance impact. This change fixes the issue by keeping the NaN fields before stringify the dataframe, replacing the fields with empty string, and reverting back to use pd.DataFrame.sum() function to yield the lines. Fixes: influxdata#591
shushen
added a commit
to shushen/influxdb-python
that referenced
this issue
Jun 14, 2018
Assemble line by line in the commit bf232a7 to remove NaN has significant performance impact. This change fixes the issue by keeping the NaN fields before stringify the dataframe, replacing the fields with empty string, and reverting back to use pd.DataFrame.sum() function to yield the lines. Fixes: influxdata#591
shushen
added a commit
to shushen/influxdb-python
that referenced
this issue
Jun 29, 2018
Assemble line by line in the commit bf232a7 to remove NaN has significant performance impact. This change fixes the issue by keeping the NaN fields before stringify the dataframe, replacing the fields with empty string, and reverting back to use pd.DataFrame.sum() function to yield the lines. Fixes: influxdata#591
xginn8
pushed a commit
that referenced
this issue
Jun 30, 2018
Assemble line by line in the commit bf232a7 to remove NaN has significant performance impact. This change fixes the issue by keeping the NaN fields before stringify the dataframe, replacing the fields with empty string, and reverting back to use pd.DataFrame.sum() function to yield the lines. Fixes: #591
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
On master, commit bf232a7 introduced the much welcomed support of dropping NaN/null fields. But it has done so by assembling line by line with function
format_line()
.This has degraded the performance significantly. See profiling result attached diagram of how

format_line()
is taking the majority of consumed time.Writing a 86400-row dataframe that took only about <10 seconds with v5.0.0 now takes about 90 seconds in my test.
The text was updated successfully, but these errors were encountered: