Description
Hi all
I feel that the current seek implementation in the SimpleConsumer is rather confusing, there feels like there are two issues here:
- In the case of a consumer with multiple partitions, the divide by partitions and find the offset does not seem like it would render the desired effect, which is described somewhat in SimpleConsumer seek & pending #67
- The offset whence behaviour works from a delta, that is with say whence 0 it is not seek to this absolute offset, but rather seek to the
first available message offset
+seek offset
.
The API as described does make sense in the context of being analogous to fseek()
but is confusing with regard to kafka as a queue, since at present 0.8.0 does not automatically maintain client offsets, most clients are forced to maintain these directly.
Since the seek
behaviour always adds the lowest offset and the delta it becomes difficult to actually maintain these numbers, especially as there is no natural way to get the lowest offset in a partition (which would allow computing the delta ...)
I feel that the API in general is very confusing and extremely subtle, I have for my own purposes created a new api call called seek_absolute
which can be found on this commit (https://github.com/GregBowyer/kafka-python/commit/056cba565f356787f1b027a881a7487be5a758ce) but rather than make a pull request, I think that more discussion is really needed over how the seek contract should work in general
Thoughts ?