OTLPMetricExporter fails to send more than 4MB of data #2710

overmeulen · 2022-05-24T07:05:40Z

Describe your environment
Python 3.6.8
Opentelemetry Python 1.12.0
Opentelemetry Collector 0.38.0

Steps to reproduce
Metrics SDK with quite a few observables generating more than 4MB of data (datapoints)

What is the expected behavior?
The datapoints are sent to the collector without any problem

What is the actual behavior?
The export to the collector fails with StatusCode.RESOURCE_EXHAUSTED
The exporter keeps on sending the same batch over and over until the data gets dropped

Additional context
One solutions would by to have a configurable "max batch size" in the OTLPMetricExporter, like there is today in the BatchLogProcessor for logs.
Another solution would be for the OTLP gRPC exporter to automatically retry with a smaller batch if it receives a StatusCode.RESOURCE_EXHAUSTED ?

aabmass · 2022-05-24T16:30:18Z

Thanks for trying out the metrics SDK 🙂

The difference between metrics and trace/logs here is that all the metrics come in at once and there is no batching in the SDK. It is simply evaluating all of observable instruments and it's causing the issues. Do folks think we should add this batching mechanism to the PeriodicExportingMetricReader so it can buffer spans into the exporter?

Another solution would be for the OTLP gRPC exporter to automatically retry with a smaller batch if it receives a StatusCode.RESOURCE_EXHAUSTED ?

+1, there is this proposal open-telemetry/opentelemetry-proto#390 which I believe would make this work?

srikanthccv · 2022-05-24T17:16:15Z

Do folks think we should add this batching mechanism to the PeriodicExportingMetricReader so it can buffer spans into the exporter?

How does this help? The volume of data that reaches exporter would still be of same size for each export cycle, right?

aabmass · 2022-05-24T18:04:58Z

As a place to configure the batch size, and then the PeriodicExportingMetricReader will call the exporter once per batch. @srikanthccv would you prefer to just have the individual exporters handle batching on their own?

srikanthccv · 2022-05-24T18:54:10Z

Yes, I think we already do this in some exporters which take care of some protocol/coding specific limits. I would like to hear more the batching in metrics. I am trying to understand if each collect only limits the number of points collected and then call the exporter or the collection get all data once and then calls exporter multiple times?

srikanthccv · 2022-05-26T18:22:30Z

We briefly discussed this today. We should look at the spec for what is the correct status code for errors related to payload size and see if the response from collector includes what's the acceptable size to devide the batch to chunks before exporting.

overmeulen · 2022-05-27T07:01:30Z

It would be great to have this kind of behavior but I think we should also be able to configure a "max batch size" in the OTLPMetricExporter.
If at each interval I generate 5000 data-points for a size of 6MB I don't want the first export request to fail every time.
The automatic downsizing of the batch when receiving StatusCode.RESOURCE_EXHAUSTED would be great for sporadic errors.

aabmass · 2022-06-23T16:27:48Z

It would be great to have this kind of behavior but I think we should also be able to configure a "max batch size" in the OTLPMetricExporter.

@overmeulen any chance you'd be willing to send a PR for this?

overmeulen · 2022-06-23T16:37:38Z

Sure. So the idea would be to do the fix directly in the gRPC exporter ?
https://github.com/open-telemetry/opentelemetry-python/blob/main/exporter/opentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/metric_exporter/__init__.py

aabmass · 2022-06-23T17:45:02Z

Thanks, I'll assign you the issue. We haven't implemented the HTTP exporter yet, so that seems reasonable to me.

srikanthccv · 2022-06-23T19:18:26Z

Is there some env or something similar spec'd to configure this max batch size?

overmeulen · 2022-06-23T19:28:58Z

I was thinking of doing something similar to BatchLogProcessor
https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-sdk/src/opentelemetry/sdk/_logs/export/__init__.py#L146

overmeulen · 2022-07-06T13:41:36Z

PR created and ready to be reviewed

aabmass · 2022-09-07T15:53:00Z

Just adding the discussions from the PR #2809 which adds a max_export_batch_size. @overmeulen said

I don't really have a recommendation for the max_export_batch_size, it highly depends on the type of metrics and the number of attributes...
Batching on the byte size would indeed be better but much more complex.
The idea here was to add a first level of protection against this 4MB limit but as you said it won't completely prevent you from reaching the limit from time to time.

We are going ahead with this for now to keep it simple. Two alternatives would be

Calling ByteSize() on the protobufs to check the request doesn't exceed 4MB (or a configurable limit) before sending. This could be computationally expensive if not done carefully since byte size calculation is recursive (I believe protobuf lib does cache this though). But it would keep the batches as large as possible.
Split the original requests into chunks after receiving a RESOURCE_EXHAUSTED response as mentioned in OTLPMetricExporter fails to send more than 4MB of data #2710 (comment)

There's also the option of sending requests in parallel, where #2809 is sending each chunk serially.

aabmass · 2022-09-07T15:53:45Z

See also open-telemetry/opentelemetry-specification#2772

overmeulen added the bug Something isn't working label May 24, 2022

lzchen added the metrics label Jun 10, 2022

aabmass assigned overmeulen Jun 23, 2022

overmeulen mentioned this issue Jul 6, 2022

Add a configurable max_export_batch_size to the gRPC metrics exporter #2809

Merged

11 tasks

srikanthccv closed this as completed in #2809 Sep 9, 2022

This was referenced May 9, 2025

Add HTTP OTLPMetricExporter configurable max export batch size, like gRPC #4577

Open

Add OTLP HTTP MetricExporter max_export_batch_size #4576

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OTLPMetricExporter fails to send more than 4MB of data #2710

OTLPMetricExporter fails to send more than 4MB of data #2710

overmeulen commented May 24, 2022

aabmass commented May 24, 2022

Uh oh!

srikanthccv commented May 24, 2022

Uh oh!

aabmass commented May 24, 2022

Uh oh!

srikanthccv commented May 24, 2022

Uh oh!

srikanthccv commented May 26, 2022

Uh oh!

overmeulen commented May 27, 2022

Uh oh!

aabmass commented Jun 23, 2022

Uh oh!

overmeulen commented Jun 23, 2022

Uh oh!

aabmass commented Jun 23, 2022

Uh oh!

srikanthccv commented Jun 23, 2022

Uh oh!

overmeulen commented Jun 23, 2022

Uh oh!

overmeulen commented Jul 6, 2022

Uh oh!

aabmass commented Sep 7, 2022

Uh oh!

aabmass commented Sep 7, 2022

Uh oh!

OTLPMetricExporter fails to send more than 4MB of data #2710

OTLPMetricExporter fails to send more than 4MB of data #2710

Comments

overmeulen commented May 24, 2022

aabmass commented May 24, 2022

Uh oh!

srikanthccv commented May 24, 2022

Uh oh!

aabmass commented May 24, 2022

Uh oh!

srikanthccv commented May 24, 2022

Uh oh!

srikanthccv commented May 26, 2022

Uh oh!

overmeulen commented May 27, 2022

Uh oh!

aabmass commented Jun 23, 2022

Uh oh!

overmeulen commented Jun 23, 2022

Uh oh!

aabmass commented Jun 23, 2022

Uh oh!

srikanthccv commented Jun 23, 2022

Uh oh!

overmeulen commented Jun 23, 2022

Uh oh!

overmeulen commented Jul 6, 2022

Uh oh!

aabmass commented Sep 7, 2022

Uh oh!

aabmass commented Sep 7, 2022

Uh oh!