Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

Fix chunked query to return chunk resultsets #753

Merged
merged 1 commit into from
Apr 10, 2020

Conversation

hrbonz
Copy link
Contributor

@hrbonz hrbonz commented Sep 8, 2019

When querying large data sets, it's vital to get a chunked responses to manage memory usage. Wrapping the query response in a generator and streaming the request provides the desired result.
It also fixes InfluxDBClient.query() behavior for chunked queries that is currently not working according to specs

close #585
close #531
close #538

@hrbonz
Copy link
Contributor Author

hrbonz commented Sep 8, 2019

I'm currently running tox tests and will fix whatever fails in travisCI

@hrbonz hrbonz force-pushed the wip/chunked_query branch 2 times, most recently from 2a2d459 to a6bc05d Compare September 8, 2019 13:28
hrbonz added a commit to hrbonz/influxdump that referenced this pull request Sep 14, 2019
Option to dump data in a folder in fragment files of a given chunk size.
This works only if the chunksize option is properly implemented in
influxdb-python (see influxdata/influxdb-python#753)
@psy0rz
Copy link

psy0rz commented Oct 9, 2019

Does this also close #538 ?

@hrbonz
Copy link
Contributor Author

hrbonz commented Oct 11, 2019

Yes, it covers it as well. I've had issues with my laptop so got caught in a bit of a snag, will get back to fixing the tests.
The tests were designed to pass rather than reflect the method documented behavior.

@hrbonz hrbonz force-pushed the wip/chunked_query branch from a6bc05d to 3832bd3 Compare March 23, 2020 08:07
@hrbonz hrbonz marked this pull request as ready for review March 23, 2020 08:07
@hrbonz
Copy link
Contributor Author

hrbonz commented Mar 23, 2020

I had to change the test substantially as it was testing a broken behavior not the documented (and better one imho).
I'm testing that I'm getting the proper number of chunked ResultSets and testing that each Resultset returns the proper elements.

@hrbonz
Copy link
Contributor Author

hrbonz commented Mar 23, 2020

On a side note, I've been using this code to move data from an old version of influxdb to 1.7 (and doing some type casting in the middle) with no issue. About 50G of data across 1500 measurements.
Without the proper chunking and generator, my memory was going through the roof.

When querying large data sets, it's vital to get a chunked responses to
manage memory usage. Wrapping the query response in a generator and
streaming the request provides the desired result.
It also fixes `InfluxDBClient.query()` behavior for chunked queries that
is currently not working according to
[specs](https://github.com/influxdata/influxdb-python/blob/master/influxdb/client.py#L429)

close influxdata#585
close influxdata#531
@russorat
Copy link
Contributor

@hrbonz Thank you! i've reached out to @sebito91 to take a look.

@hrbonz
Copy link
Contributor Author

hrbonz commented Apr 7, 2020

Thanks @russorat, hopefully this PR is getting some attention while the new library is getting all the love ;)

@sebito91 sebito91 self-assigned this Apr 8, 2020
Copy link
Contributor

@sebito91 sebito91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff, thank you @hrbonz!

@sebito91 sebito91 merged commit c903d73 into influxdata:master Apr 10, 2020
ocworld pushed a commit to AhnLab-OSS/influxdb-python that referenced this pull request Apr 13, 2020
When querying large data sets, it's vital to get a chunked responses to
manage memory usage. Wrapping the query response in a generator and
streaming the request provides the desired result.
It also fixes `InfluxDBClient.query()` behavior for chunked queries that
is currently not working according to
[specs](https://github.com/influxdata/influxdb-python/blob/master/influxdb/client.py#L429)

Closes influxdata#585.
Closes influxdata#531.
Closes influxdata#538.
@hrbonz
Copy link
Contributor Author

hrbonz commented Apr 13, 2020

@sebito91 quite happy to see it finally done and merged! Now to

ping @psy0rz for the inspiration

@psy0rz
Copy link

psy0rz commented Apr 14, 2020

awesome! thank YOU for actually implementing it all the way :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Out of memory error when querying large data set with chunked=True
4 participants