Skip to content

OTLPMetricExporter fails to send more than 4MB of data #2710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
overmeulen opened this issue May 24, 2022 · 14 comments · Fixed by #2809
Closed

OTLPMetricExporter fails to send more than 4MB of data #2710

overmeulen opened this issue May 24, 2022 · 14 comments · Fixed by #2809
Assignees
Labels
bug Something isn't working metrics

Comments

@overmeulen
Copy link
Contributor

Describe your environment
Python 3.6.8
Opentelemetry Python 1.12.0
Opentelemetry Collector 0.38.0

Steps to reproduce
Metrics SDK with quite a few observables generating more than 4MB of data (datapoints)

What is the expected behavior?
The datapoints are sent to the collector without any problem

What is the actual behavior?
The export to the collector fails with StatusCode.RESOURCE_EXHAUSTED
The exporter keeps on sending the same batch over and over until the data gets dropped

Additional context
One solutions would by to have a configurable "max batch size" in the OTLPMetricExporter, like there is today in the BatchLogProcessor for logs.
Another solution would be for the OTLP gRPC exporter to automatically retry with a smaller batch if it receives a StatusCode.RESOURCE_EXHAUSTED ?

@overmeulen overmeulen added the bug Something isn't working label May 24, 2022
@aabmass
Copy link
Member

aabmass commented May 24, 2022

Thanks for trying out the metrics SDK 🙂

The difference between metrics and trace/logs here is that all the metrics come in at once and there is no batching in the SDK. It is simply evaluating all of observable instruments and it's causing the issues. Do folks think we should add this batching mechanism to the PeriodicExportingMetricReader so it can buffer spans into the exporter?

Another solution would be for the OTLP gRPC exporter to automatically retry with a smaller batch if it receives a StatusCode.RESOURCE_EXHAUSTED ?

+1, there is this proposal open-telemetry/opentelemetry-proto#390 which I believe would make this work?

@srikanthccv
Copy link
Member

Do folks think we should add this batching mechanism to the PeriodicExportingMetricReader so it can buffer spans into the exporter?

How does this help? The volume of data that reaches exporter would still be of same size for each export cycle, right?

@aabmass
Copy link
Member

aabmass commented May 24, 2022

As a place to configure the batch size, and then the PeriodicExportingMetricReader will call the exporter once per batch. @srikanthccv would you prefer to just have the individual exporters handle batching on their own?

@srikanthccv
Copy link
Member

Yes, I think we already do this in some exporters which take care of some protocol/coding specific limits. I would like to hear more the batching in metrics. I am trying to understand if each collect only limits the number of points collected and then call the exporter or the collection get all data once and then calls exporter multiple times?

@srikanthccv
Copy link
Member

We briefly discussed this today. We should look at the spec for what is the correct status code for errors related to payload size and see if the response from collector includes what's the acceptable size to devide the batch to chunks before exporting.

@overmeulen
Copy link
Contributor Author

It would be great to have this kind of behavior but I think we should also be able to configure a "max batch size" in the OTLPMetricExporter.
If at each interval I generate 5000 data-points for a size of 6MB I don't want the first export request to fail every time.
The automatic downsizing of the batch when receiving StatusCode.RESOURCE_EXHAUSTED would be great for sporadic errors.

@lzchen lzchen added the metrics label Jun 10, 2022
@aabmass
Copy link
Member

aabmass commented Jun 23, 2022

It would be great to have this kind of behavior but I think we should also be able to configure a "max batch size" in the OTLPMetricExporter.

@overmeulen any chance you'd be willing to send a PR for this?

@overmeulen
Copy link
Contributor Author

@aabmass
Copy link
Member

aabmass commented Jun 23, 2022

Thanks, I'll assign you the issue. We haven't implemented the HTTP exporter yet, so that seems reasonable to me.

@srikanthccv
Copy link
Member

Is there some env or something similar spec'd to configure this max batch size?

@overmeulen
Copy link
Contributor Author

@overmeulen
Copy link
Contributor Author

PR created and ready to be reviewed

@aabmass
Copy link
Member

aabmass commented Sep 7, 2022

Just adding the discussions from the PR #2809 which adds a max_export_batch_size. @overmeulen said

I don't really have a recommendation for the max_export_batch_size, it highly depends on the type of metrics and the number of attributes...
Batching on the byte size would indeed be better but much more complex.
The idea here was to add a first level of protection against this 4MB limit but as you said it won't completely prevent you from reaching the limit from time to time.

We are going ahead with this for now to keep it simple. Two alternatives would be

  • Calling ByteSize() on the protobufs to check the request doesn't exceed 4MB (or a configurable limit) before sending. This could be computationally expensive if not done carefully since byte size calculation is recursive (I believe protobuf lib does cache this though). But it would keep the batches as large as possible.
  • Split the original requests into chunks after receiving a RESOURCE_EXHAUSTED response as mentioned in OTLPMetricExporter fails to send more than 4MB of data #2710 (comment)

There's also the option of sending requests in parallel, where #2809 is sending each chunk serially.

@aabmass
Copy link
Member

aabmass commented Sep 7, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working metrics
Projects
None yet
4 participants