Skip to content

Bug fix: KafkaConsumer.position() #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

stefanth
Copy link
Contributor

@stefanth stefanth commented Mar 9, 2016

Method KafkaConsumer.position() was not refreshing position correctly
when calling self._update_fetch_positions().

Method KafkaConsumer.position() was not refreshing position correctly
when calling self._update_fetch_positions().
@@ -462,7 +462,7 @@ def _poll_once(self, timeout_ms):
return records

self._fetcher.init_fetches()
self._client.poll(timeout_ms)
self._client.poll(timeout_ms, sleep=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a substantial change - why? The current consumer design expects poll() to return immediately if there is nothing to block on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a consumer is running that doesn't currently have anything to do it will print out a constant stream of the following error when poll is called in a loop (which I assume is the normal method of consuming endlessly):
"_poll called with a non-zero timeout and sleep=False -- but there was nothing to do. This can cause high CPU usage during idle."

This is similar to issue #494 that had been fixed but it seems that commit b8c2097 broke it again. I feel that it is unneeded work for every client to check on the partition assignments before every call to poll to make sure that it is non-empty. For the client, there is little difference between receiving no data because there are no partitions currently assigned and receiving no data because the topic has no new messages. So allowing poll to sleep if a timeout is specified helps simplying the usage of the library.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thoughts. Can you remove from this PR? This is a separate issue (and also I believe the incorrect fix for the bug you described).

@stefanth
Copy link
Contributor Author

stefanth commented Mar 9, 2016

You are right, it is a different issue that accidentally got added to the pull request because it was tracking my master. I have removed it. The poll() sleep issue is in issue #572.

@stefanth stefanth closed this Mar 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants