Properly use chunk_size param #523

anthonyserious · 2017-10-25T19:07:44Z

chunk_size=0 was already a kwarg param for client.query(), but it was not actually passed through to response.iter_lines.

This PR properly passes the parameter down, defaulting to requests.models.ITER_CHUNK_SIZE.

xginn8

Looks good! Thanks for contributing. Can you add a test to make sure we don't drop that parameter going forward? Also, looks like the flake8 lint failed (from travis).

psy0rz · 2017-11-26T00:37:16Z

I'm pretty sure this PR is incorrect: You're confusing requests chunk_size with influxdb chunk_size. They are two separate things.

In influxdb it means the number of POINTS per line in the http response.

In requests it means the number of bytes.

I'm not sure what you're trying to fix?

anthonyserious · 2017-11-27T18:01:57Z

@psy0rz the issue here is that the client code here partially implements chunking, i.e. by defining the associated variables but never actually doing anything with them. This PR properly implements those variables and then passes them to requests, so yes we are now passing the param to requests.

Is there a better way to do this?

anthonyserious · 2017-11-27T18:06:04Z

@xginn8 apologies for the delay - I used by fork for my migration project and haven't needed to come back to this. The CI tests were failing because of a setuptools break that showed up a month or two ago - I believe getting an updated setuptools will fix the problem. Or do you see something else?

The actual problem I'm trying to fix is a performance problem. For extracting ~70k measurements, the client would peg CPU for 60s, whereas with the specified chunk size it's a couple of seconds. You can refer to the issue at #524.

I can also look to add a test to make sure the param isn't dropped.

psy0rz · 2017-11-27T18:11:51Z

I dont understand..influxdb is something different than http chunking, right?

psy0rz · 2017-11-27T18:34:38Z

I fixed your problem with pr #538.

Its called streaming mode.

anthonyserious · 2017-11-27T19:11:55Z

Great, then once that's approved I can do a test. If it solves my performance problem then I'll close this.

psy0rz · 2017-11-27T19:17:01Z

Im traversing 20 million items without very much cpu.

The problem you're having is the same i and anyone else had: the result is combined in memory and then retutned as one big result. So it takes a lot of time and memory.

Perhaps you're willing to try it by cloning my fork from my github page?

Would be great if you could test it:)

pkittenis · 2018-01-02T14:49:20Z

@psy0rz is correct, this PR is confusing influx's chunk size with HTTP's (I added the support for influx chunk size). The perceived 'better performance' is only because a 512 chunk size in bytes on the HTTP request naturally results in fewer influx data points which take less time to get made into a list.

Returning generator instead of pre-generated list is the right way to go, yes, but breaks compatibility which is why it was not included in the original chunked responses PR.

That said, the existing code leaves a lot of room for better performance even without resorting to generators, which do not help if the client is going to iterate over all items in the generator in any case. If breaking compatibility is feasible, I'd be up for taking a stab at re-writing the bulk of series parsing code to that end.

psy0rz · 2018-01-02T20:39:54Z

My pr is backwards compatible.

pkittenis · 2018-01-03T11:53:49Z

But does not solve the issue of point parsing being bottlenecked on client (python) side parsing.

psy0rz · 2018-01-03T12:21:29Z

The parse performance you mean? My patches only add a generator mode which will sopve memory issues and supports infinite result sets. Performance will be roughly the same with smaller result sets.

pkittenis · 2018-01-03T12:32:12Z

Yes. 'Infinite' result sets is not a thing unless you have infinite amounts of ram :) (influx will truncate)

Like I said, generator does not change performance at all when the user really does want to parse all results in the generator. Performance is in both cases bottlenecked on python client side.

psy0rz · 2018-01-03T13:38:47Z

I'm using my patch succesfully with resultsets of over 200 million datapoints.

pkittenis · 2018-01-03T14:09:59Z

And is doing list(results) where results is a generator with your patch any faster than getting results without generator or patch?

psy0rz · 2018-01-03T14:48:03Z

No, only use streaming mode with large resultsets you want to iterate over. If you have a result-set that you want to keep in memory anyway, the old method might be faster. (for other people reading this, we're talking about #538)

pkittenis · 2018-01-03T16:11:11Z

Yes, the point is overhead of result parsing is significant and bottlenecks clients that do want to process all results.

That parsing performance as well as memory footprint (for all results) has a lot of room for improvement for which perhaps completely new result gathering functions and classes would be best, to not impact backwards compatibility (at the expense of multiple ways to do get results).

If enough users feel that parsing performance and memory requirements for all of a result set is an issue then I can take a stab at it - it is impacting my own uses of the library as well.

sebito91 · 2020-04-10T17:47:00Z

Closed by #753 and #538

anthonyserious added 2 commits October 25, 2017 14:36

Fix chucnk_size param

faea66a

512 -> ITER_CHUNK_SIZE

a2cdf47

anthonyserious mentioned this pull request Oct 25, 2017

Properly use chunk_size param #524

Closed

xginn8 suggested changes Oct 28, 2017

View reviewed changes

Merge branch 'master' into master

b77a7a2

xginn8 added the pending contributor label Oct 30, 2017

rbdm-qnt mentioned this pull request Mar 31, 2020

Still can’t query big datasets #800

Closed

sebito91 self-assigned this Apr 8, 2020

sebito91 closed this Apr 10, 2020

Properly use chunk_size param #523

Properly use chunk_size param #523

Uh oh!

Conversation

anthonyserious commented Oct 25, 2017

Uh oh!

xginn8 left a comment

Choose a reason for hiding this comment

Uh oh!

psy0rz commented Nov 26, 2017

Uh oh!

anthonyserious commented Nov 27, 2017

Uh oh!

anthonyserious commented Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psy0rz commented Nov 27, 2017

Uh oh!

psy0rz commented Nov 27, 2017

Uh oh!

anthonyserious commented Nov 27, 2017

Uh oh!

psy0rz commented Nov 27, 2017

Uh oh!

pkittenis commented Jan 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psy0rz commented Jan 2, 2018

Uh oh!

pkittenis commented Jan 3, 2018

Uh oh!

psy0rz commented Jan 3, 2018

Uh oh!

pkittenis commented Jan 3, 2018

Uh oh!

psy0rz commented Jan 3, 2018

Uh oh!

pkittenis commented Jan 3, 2018

Uh oh!

psy0rz commented Jan 3, 2018

Uh oh!

pkittenis commented Jan 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebito91 commented Apr 10, 2020

Uh oh!

Uh oh!

anthonyserious commented Nov 27, 2017 •

edited

Loading

pkittenis commented Jan 2, 2018 •

edited

Loading

pkittenis commented Jan 3, 2018 •

edited

Loading