KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready #20345

kamalcph · 2025-08-12T17:10:12Z

During broker restarts, the topic-based RemoteLogMetadataManager (RLMM)
constructs the state by reading the internal __remote_log_metadata
topic. When the partition is not ready to perform remote storage
operations, then ReplicaNotAvailableException thrown back to the
consumer. The clients retries the request immediately.

This results in a lot of FETCH requests on the broker and utilizes the
request handler threads. Using the CountdownLatch to reduce the
frequency of ReplicaNotAvailableException thrown back to the clients.
This will improve the request handler thread usage on the broker.

Previously for one consumer, when RLMM is not ready for a partition,
then ~9K FetchConsumer requests / sec are received on the broker. With
this patch, the number of FETCH requests reduced by 95% to 600 / sec.

…own to clients when RLMM is not ready During broker restarts, the topic-based RemoteLogMetadataManager (RLMM) constructs the state by reading the internal __remote_log_metadata topic. When the partition is not ready to perform remote storage operations, then ReplicaNotAvailableException thrown back to the consumer. The clients retries the request immediately. This results in a lot of FETCH requests on the broker and utilizes the request handler threads. Using the CountdownLatch to reduce the frequency of ReplicaNotAvailableException thrown back to the clients. This will improve the request handler thread usage on the broker. Previously, when RLMM is not ready for a partition, then ~9K FetchConsumer requests / sec are received on the broker. With this patch, the number of FETCH requests come down to 600 / sec.

kamalcph · 2025-08-12T17:14:20Z

@satishd @showuon

Call for review. PTAL. Thanks!

github-actions bot added triage PRs from the community storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature small Small PRs labels Aug 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready #20345

KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready #20345

kamalcph commented Aug 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

kamalcph commented Aug 12, 2025

Uh oh!

Uh oh!

KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready #20345

Are you sure you want to change the base?

KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready #20345

Conversation

kamalcph commented Aug 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kamalcph commented Aug 12, 2025

Uh oh!

Uh oh!

kamalcph commented Aug 12, 2025 •

edited by github-actions bot

Loading