Problems writing data #612

tschm · 2018-07-09T18:54:52Z

I am a bit disappointed and surprised by the speed I observe when writing to my influxdb. However, I am probably doing something very wrong and would appreciate any pointers into the correct direction.

I am using the latest docker-image.

I have a DataFrame with 10 columns and 10000 rows, e.g. 100 000 data points. To write them into the database took me like 30 seconds! I am running Ubuntu on a 16GB RAM machine with a SSD drive

        x = pd.DatetimeIndex(start=pd.Timestamp("2010-01-01"), periods=10000, freq="D")
        y = pd.DataFrame(index=x, data=pd.np.random.randn(10000, 10))
        print(y)

I write column per column (all using the same measurement but using the name of the column as tag), e.g.

        for key, data in y.items():
            # every column represents a different tag
            self.client.series_upsert(ts=data, tags={"Global tag": "Peter Maffay", "name": key}, field="random", measurement="measure")

I have tried other methods (e.g. the SeriesHelper etc.) but the speed has never really picked up
Here's the decisive fragment from my own client (I inherit from your standard client)

    def series_upsert(self, ts, tags, field, measurement):
        if len(ts) > 0:
            json_body = [{'measurement': measurement,'time': t, 'fields': {field: float(x)}} for t,x in ts.items()]
            self.influxclient.write_points(json_body, time_precision="s", tags=tags, batch_size=10000)

The text was updated successfully, but these errors were encountered:

tschm · 2018-07-09T18:57:42Z

Here's my client:
https://gist.github.com/tschm/4aa4eeda016a326331fbc024c0b12454

epa095 · 2018-07-18T10:09:09Z

5.2.0 was released a day after you reported this, try that one. Also try 5.0.0 and see if it is equally slow.

flikka · 2018-07-19T11:16:42Z

We also experience massive performance problems on writing large-ish data frames with version 5.1.0. One example (see below) uses 5 seconds on my local laptop to write a million lines (one column) with version 5.0.0, but on version 5.1.0 it takes minutes (in fact I gave up before it finished). On the current master the performance is back to 5.0.0 performance. The pypi current latest version 5.2.0 is broken (see #616), so I'll use 5.0.0 or master for now. I guess the fix in #617 will come to the pip-world soon though.

I have no idea what the cause is, maybe something wrong with our setup of InfluxDB itself perhaps...

The below example produces this output for master and similar for 5.0.0:
InfluxDB python version: 5.0.0
Number of points: 10000
Time: 0.20903187998919748
Number of points: 100000
Time: 0.6698060079943389
Number of points: 1000000
Time: 5.531810616987059

But for version 5.1.0 this is how it looks:
InfluxDB python version: 5.1.0
Number of points: 10000
Time: 3.2126710579905193
Number of points: 100000
Time: 34.53647285097395
Number of points: 1000000
<Aborted here, didn't bother to wait>

Example code used (Tried on both 1.5.X and 1.6.0 InfluxDB instance):

from influxdb import DataFrameClient
import pandas as pd
from timeit import default_timer as timer

host = '0.0.0.0'
port = 8086
user = 'admin'
password = ''
db_name = 'test'

def simple_test(num_points, batch_size):
    client = DataFrameClient(host, port, user, password, db_name)
    x = pd.DatetimeIndex(start=pd.Timestamp("2010-01-01"), periods=num_points, freq="S")
    y = pd.DataFrame(index=x, data=pd.np.random.randn(num_points, 1))

    client.create_database(db_name)
    client.write_points(y, "perf-test", batch_size=batch_size, protocol='line')
    client.drop_database(db_name)

if __name__=='__main__':
    print("InfluxDB python version: {}".format(influxdb.__version__))
    num_points_list = [10000, 100000, 1000000]
    batch_size = 10000
    for num_points in num_points_list:
        print("Number of points: {}".format(num_points))
        start = timer()
        simple_test(num_points, batch_size)
        end = timer()
        elapsed = end - start
        print("Time: {}".format(elapsed))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems writing data #612

Problems writing data #612

tschm commented Jul 9, 2018

tschm commented Jul 9, 2018

Uh oh!

epa095 commented Jul 18, 2018

Uh oh!

flikka commented Jul 19, 2018

Uh oh!

Problems writing data #612

Problems writing data #612

Comments

tschm commented Jul 9, 2018

tschm commented Jul 9, 2018

Uh oh!

epa095 commented Jul 18, 2018

Uh oh!

flikka commented Jul 19, 2018

Uh oh!