Skip to content

KAFKA-19296: Revise AsyncKafkaConsumer event wakeup approach #20323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

kirktrue
Copy link
Contributor

@kirktrue kirktrue commented Aug 7, 2025

Remove use of wakeup() call to alert the network thread that there's work to do.

Remove use of wakeup() call to alert the network thread that there's work to do
@github-actions github-actions bot added triage PRs from the community consumer clients small Small PRs labels Aug 7, 2025
@kirktrue
Copy link
Contributor Author

kirktrue commented Aug 7, 2025

Results for August 7, 2025 testing

Metrics

The metrics shown below come from three separate test runs:

  1. Test run CLASSIC uses the CLASSIC group protocol from trunk
  2. Test run CONSUMER uses the CONSUMER group protocol, also from trunk
  3. Test run CONSUMER (branch) uses the CONSUMER group protocol from my current development branch (KAFKA-19296-revise-wakeups)

Results:

Metric CLASSIC CONSUMER (trunk) CONSUMER (branch)
bytes-consumed-rate 161948243.756 48869612.029 49132111.330
bytes-consumed-total 52099983328.000 52098998117.000 52099095023.000
fetch-latency-avg 30.181 10.375 8.385
fetch-latency-max 395.000 446.000 274.000
fetch-rate 158.164 47.708 47.954
fetch-size-avg 1024027.703 1024429.637 1023398.415
fetch-size-max 1033664.000 1033664.000 1033664.000
fetch-throttle-time-avg 0.000 0.000 0.000
fetch-throttle-time-max 0.000 0.000 0.000
fetch-total 50876.000 50881.000 50883.000
records-consumed-rate 310836.150 93798.193 94303.477
records-consumed-total 99999968.000 99998077.000 99998263.000
records-lag-max 28765980.000 26517085.000 26606365.000
records-lead-min 13050850.000 15613310.000 15524026.000
records-per-request-avg 1965.504 1966.276 1964.296
commit-sync-time-ns-total 0.000 0.000 0.000
committed-time-ns-total 0.000 0.000 0.000
incoming-byte-rate 162512956.322 49048728.884 49405380.666
incoming-byte-total 52302818435.000 52305011340.000 52306097118.000
io-ratio 0.238 0.093 0.100
io-time-ns-avg 3761.542 2267.901 830.057
io-time-ns-total 77782691789.000 100466922476.000 108825346390.000
io-wait-ratio 0.313 0.522 0.410
io-wait-time-ns-avg 4946.433 12676.522 3391.519
io-wait-time-ns-total 111955075540.000 575715209235.000 436677230932.000
last-poll-seconds-ago 0.000 0.000 0.000
network-io-rate 2289.783 664.182 701.663
network-io-total 731587.000 698801.000 739721.000
outgoing-byte-rate 17636.502 5356.321 5397.161
outgoing-byte-total 5678537.000 5713940.000 5712254.000
poll-idle-ratio-avg NaN NaN NaN
request-rate 158.645 48.113 48.497
request-size-avg 111.170 111.327 111.289
request-size-max 237.000 203.000 202.000
request-total 51080.000 51341.000 51333.000
response-rate 158.669 48.129 48.497
response-total 51076.000 51341.000 51333.000
select-rate 63253.029 41203.090 120756.753
select-total 20355063.000 43510072.000 128998886.000
time-between-poll-avg 0.016 0.053 0.053
time-between-poll-max 74.000 10.000 272.000
Average CPU load 77.34% 145.66% 117.61%
P99 CPU load 77.50% 148.00% 120.00%
Max CPU load 77.50% 148.00% 120.00%

Execution

The test is run on a single AWS instance targeting a six-node cluster. The data on the topic is pre-populated with 100,000,000 messages of 512 bytes each before any of the tests are run. A single “warm up” run of the performance test is executed before executing the three test runs mentioned above.

The command to execute the test is:

kafka-consumer-perf-test.sh \
  --bootstrap-server $BOOTSTRAP_SERVER \
  --topic $TOPIC_NAME \
  --messages $NUM_MESSAGES \
  --consumer.config conf/consumer-$GROUP_PROTOCOL.properties \
  --show-detailed-stats \
  --print-metrics

NUM_MESSAGES is set to 100000000 (100 million).

Client configuration

The consumer configuration sets max.poll.records=5, but is otherwise mostly defaults apart from connectivity/authentication:

bootstrap.servers=$BOOTSTRAP_SERVER
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required
  username="$USER_ID"
  password='$USER_PASSWORD' ;
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.endpoint.identification.algorithm=
client.dns.lookup=use_all_dns_ips

# Key
max.poll.records=5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants