-
Notifications
You must be signed in to change notification settings - Fork 524
Still can’t query big datasets #800
Comments
@rbdm-qnt thanks for opening this. Are you really doing a Try adding a time constraint to your query to reduce the data being scanned. |
I have no alternative to select *, I'm querying financial data and I need all fields or the data doesn't make sense. This is about 4.5 years of data total, if I query more than 3 months of data I run into this issue. I can't be making 14 separate queries every single time and then patch together the results and reimport them, this is the reason why I have a database and not csv files in the first place. How can I make influx return single rows to my functions, erase them from the Ram and then get the next row and so on? It must be possible in one way or the other |
@rbdm-qnt You can loop your code and issue the query multiple times, each with a it's possible you might also be able to use https://docs.influxdata.com/influxdb/v1.7/query_language/data_exploration/#the-offset-and-soffset-clauses to accomplish this, but i would use time filters first. |
Sounds like a plan. I'll try this right away and report back in a few hours. Shouldn't the chunk function do exactly that automatically? What about stream? |
So, I've tried looping my query in various ways with no luck:
I really really don't know what else to try. I've read every single google page and documentation about this, and went through the entire python client. I'm out of ideas. |
Asked a friend to try a couple queries on his InfluxDB through Grafana:
|
@rbdm-qnt thanks for trying all those different combinations. Here is an example query that works with influxQL and returns data: So you can use those date formats. Could you try manually running that query for a day of data that you think is valid and see if anything comes back directly against your database? Something like this?
|
Thanks for this syntax, so partial good news, this:
I tried all of those with both: |
@rbdm-qnt you should be able to use
|
YES! That fixed it. So, to recap for anyone reading in the future: If you need to make queries bigger than your RAM, put the query in a while loop and use this syntax: |
Uh oh!
There was an error while loading. Please reload this page.
I’m on InfluxDB version 1.7.9, and PythonClient 5.2.3.
I have a database that weights around 28GB, and I’m trying to query it from a Mac (10.12.6 OS if that’s relevant) with 16GB of Ram, using the Python Client (I use python 3.7).
I’ve been fighting with this issue for a week now, at the beginning I would get this error:
requests.exceptions.ChunkedEncodingError: (‘Connection broken: IncompleteRead(0 bytes read)’, IncompleteRead(0 bytes read))
Then I read issues number #450 #523 #531 #538 and #753, implemented the changes from issue #753, and now when I run the query Python simply gives me:
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
I was hoping that would turn my client.query in a generator that yielded every line one at a time every time it gets one, so I can process it, and then it empties the Ram and queries the next line. Basically streaming. "results" is now a generator, but that does not happen anyway.
This is my Python code:
client = InfluxDBClient(host='127.0.0.1', port='8086', username=‘x’, password=‘x’, database=‘x’)
q = 'SELECT * FROM “x”’
result = client.query(q, chunked=True).get_points()
print(“Query done “ + str(datetime.utcfromtimestamp(time.time())))
here I initialize an empty dataframe
for msg in result:
here i iterate through the results, process them and append them to the empty dataframe, and then I save the dataframe to a csv
the ‘x’ are for censorship
Thanks in advance
The text was updated successfully, but these errors were encountered: