Skip to content

Sort partitions before calling partitioner #905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 19, 2016
Merged

Conversation

ms7s
Copy link
Contributor

@ms7s ms7s commented Dec 6, 2016

Current partitioners assume that the partitions are sorted according
to partition ID in all_partitions. However, this is not guaranteed
in the KafkaProducer implementation as the values that are passed
come from a set. Sets are not guaranteed to iterate values in any
particular order, so we need to sort the values before passing
them further along.

Before this change, the code depended on internal implementation of
Python interpreters. In CPython 3.5 and lower it seems that integers
are returned in sorted order from sets so the code appears to work.
In PyPy and CPython 3.6, sets and dictionaries preserve the order
of insertions [1] which means that the code may not work in these
environments (I have not tested this). As far as I could find,
the order of partitions used in this case is the order that is
returned by the broker, but the documentation does not say anything
about partition order.

[1] https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-compactdict

Current partitioners assume that the partitions are sorted according
to partition ID in all_partitions. However, this is not guaranteed
in the KafkaProducer implementation as the values that are passed
come from a set. Sets are not guaranteed to iterate values in any
particular order, so we need to sort the values before passing
them further along.

Before this change, the code depended on internal implementation of
Python interpreters. In CPython 3.5 and lower it seems that integers
are returned in sorted order from sets so the code appears to work.
In PyPy and CPython 3.6, sets and dictionaries preserve the order
of insertions [1] which means that the code may not work in these
environments (I have not tested this). As far as I could find,
the order of partitions used in this case is the order that is
returned by the broker, but the documentation does not say anything
about partition order.

[1] https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-compactdict
@@ -625,7 +625,7 @@ def _partition(self, topic, partition, key, value,
assert partition in self._metadata.partitions_for_topic(topic), 'Unrecognized partition'
return partition

all_partitions = list(self._metadata.partitions_for_topic(topic))
all_partitions = sorted(list(self._metadata.partitions_for_topic(topic)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted will operate on a set, so I think we can drop the explicit list conversion

@dpkp dpkp merged commit 46f9b1f into dpkp:master Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants