Various changes/fixes #88

rdiomar · 2014-01-07T00:17:16Z

This set of changes addresses several bugs and issues, and fixes most of the tests. It also changes the behavior of SimpleConsumer in a few ways. Here is a summary of all the changes, mostly copied from the major commits:

Allow customizing socket timeouts.
Previously, if you try to consume a message with a timeout greater than 10 seconds, but you don't receive data in those 10 seconds, a socket.timeout exception is raised. This allows a higher socket timeout to be set, or even None for no timeout.
Read the correct number of bytes from kafka.
According to the protocol documentation, the 4 byte integer at the beginning of a response represents the size of the payload only, not including those bytes. See the Kafka docs here.
Guarantee reading the expected number of bytes from the socket every time.
Remove bufsize from client and conn.
Since bufsize in the socket is not actually enforced, but it is used by the consumer when creating requests, moving it there until a better solution is implemented
SimpleConsumer flow changes:
- Combine partition fetch requests into a single request
- Put the messages received in a queue and update offsets
- Grab as many messages from the queue as requested
- When the queue is empty, request more
- timeout param for get_messages() is the actual timeout for getting those messages
- Based on Resize buffer (rather than minbytes) - Fix issue 73 #74 -
  don't increase min_bytes if the consumer fetch buffer size is too small.
- Notes:
  - Previously, when querying each partition separately, it was possible to block waiting for messages on partition 0 even if there are new ones in partition 1. These changes allow us to block while aiting for messages on all partitions, and reduce total number of kafka requests.
  - Use Queue.Queue for single proc Queue instead of already imported multiprocessing.Queue because the latter doesn't seem to guarantee immediate availability of items after a put.
Remove SimpleConsumer queue size limit since it can cause the iterator to block forever if it's reached.
Fix error handling in client.
Add an optional limit to how much the consumer fetch buffer size can grow.
Add and fix integration tests.
Add and fix unit tests.
This is pretty much a rewrite. The tests that involve offset requests/responses are not implemented since that API is not supported in kafka 0.8 yet. Only kafka.codec and kafka.protocol are currently tested, so there is more work to be done here.
Style fixes and comments.

Previously, if you try to consume a message with a timeout greater than 10 seconds, but you don't receive data in those 10 seconds, a socket.timeout exception is raised. This allows a higher socket timeout to be set, or even None for no timeout.

According to the protocol documentation, the 4 byte integer at the beginning of a response represents the size of the payload only, not including those bytes. See http://goo.gl/rg5uom

…y time * Remove bufsize from client and conn, since they're not actually enforced Notes: This commit changes behavior a bit by raising a BufferUnderflowError when no data is received for the message size rather than a ConnectionError. Since bufsize in the socket is not actually enforced, but it is used by the consumer when creating requests, moving it there until a better solution is implemented.

…hContext

* Combine partition fetch requests into a single request * Put the messages received in a queue and update offsets * Grab as many messages from the queue as requested * When the queue is empty, request more * timeout param for get_messages() is the actual timeout for getting those messages * Based on dpkp#74 - don't increase min_bytes if the consumer fetch buffer size is too small. Notes: Change MultiProcessConsumer and _mp_consume() accordingly. Previously, when querying each partition separately, it was possible to block waiting for messages on partition 0 even if there are new ones in partition 1. These changes allow us to block while waiting for messages on all partitions, and reduce total number of kafka requests. Use Queue.Queue for single proc Queue instead of already imported multiprocessing.Queue because the latter doesn't seem to guarantee immediate availability of items after a put: >>> from multiprocessing import Queue >>> q = Queue() >>> q.put(1); q.get_nowait() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 152, in get_nowait return self.get(False) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 134, in get raise Empty Queue.Empty

to block forever if it's reached.

… iterator to exit when reached. Also put constant timeout values in pre-defined constants

Will remove once any error handling issues are resolved.

This is pretty much a rewrite. The tests that involve offset requests/responses are not implemented since that API is not supported in kafka 0.8 yet. Only kafka.codec and kafka.protocol are currently tested, so there is more work to be done here.

We always store the offset of the next available message, so we shouldn't decrement the offset deltas when seeking by an extra 1

…data Also, log.exception() is unhelpfully noisy. Use log.error() with some error details in the message instead.

This differentiates between errors that occur when sending the request and receiving the response, and adds BufferUnderflowError handling.

…tch size is too small Note: This can cause fetching a message to exceed a given timeout, but timeouts are not guaranteed anyways, and in this case it's the client's fault for not sending a big enough buffer size rather than the kafka server. This can be bad if max_fetch_size is None (no limit) and there is some message in Kafka that is crazy huge, but that is why we should have some max_fetch_size.

…ed in integration tests If some of the tests stop brokers then error out, the teardown method will try to close the same brokers and fail. This change allows it to continue.

This is better since the tests stop/start brokers, and if something goes wrong they can affect eachother.

…r debugging

stephenarmstrong · 2014-01-07T01:35:47Z

kafka/consumer.py

+            for partition in partitions:
+                requests.append(FetchRequest(self.topic, partition,
+                                             self.offsets[partition],
+                                             self.buffer_size))


Isn't buffer_size being passed into as the max message size here? Shouldn't buffer_size be used for min_bytes below (line 374), and this should be a max message size parameter that doesn't grow on error?

buffer_size is the value passed to kafka to tell it how much data we can handle. We don't actually have a real buffer, but we have to pass something to kafka. If that size is smaller than the next available message, kafka will not send any messages. The client can either increase the buffer size, skip an offset, or give up and go home. We increase the buffer size up to self.max_buffer_size if it's not None.

min_bytes is how much data kafka should wait for to be available before responding. It is set to 0 or 1 in the FetchContext based on weather or not we want to block (i.e. have at least 1 message before returning). If we pass buffer_size for min_bytes, and buffer_size is, say, 4k, then kafka will wait for 4k of data to be available before responding, which is not really what we want.

If buffer_size is what we're sending to Kafka as the max data we can handle, why isn't it always max_buffer_size?

I don't know if this expand-window-and-retry flow makes sense for either parameter we are sending to Kafka (see my comment in another thread about this)

Yeah increasing the buffer_size as needed looked weird to me too, I just added max_buffer_size as a quick fix to avoid getting to numbers that are too large. Since this is not a bug, and this pull request already has a million commits, I'm more inclined to make this change separately.

* If the connection is dirty, reinit * If we get a BufferUnderflowError, the server could have gone away, so mark it dirty

dpkp · 2014-01-08T02:49:51Z

great cleanup. +1. and all tests pass for me!

turtlesoupy · 2014-01-08T18:07:28Z

kafka/conn.py

+        bytes_left = num_bytes
+        resp = ''
+        log.debug("About to read %d bytes from Kafka", num_bytes)
+        if self._dirty:


Stylistically, I think that this check should probably move into recv instead of _read_bytes (i.e. give this method less responsibility)

The reason I put it here is because it's right before calling self._socket.recv(), just as it is is right before calling self._socket.sendall() in send().

turtlesoupy · 2014-01-08T18:15:01Z

Also 👍 merge ASAP!

Both errors are handled the same way when raised and caught, so this makes sense.

rdiomar · 2014-01-13T20:05:04Z

Everyone cool with getting this merged?

turtlesoupy · 2014-01-13T20:58:22Z

@rdiomar although I'm not looking forward to my rebase, go for it!

Various changes/fixes, including: * Allow customizing socket timeouts * Read the correct number of bytes from kafka * Guarantee reading the expected number of bytes from the socket every time * Remove bufsize from client and conn * SimpleConsumer flow changes * Fix some error handling * Add optional upper limit to consumer fetch buffer size * Add and fix unit and integration tests

rdiomar added 24 commits January 6, 2014 15:14

Allow customizing socket timeouts.

60ccb4d

Previously, if you try to consume a message with a timeout greater than 10 seconds, but you don't receive data in those 10 seconds, a socket.timeout exception is raised. This allows a higher socket timeout to be set, or even None for no timeout.

Read the correct number of bytes from kafka.

0f2b08d

According to the protocol documentation, the 4 byte integer at the beginning of a response represents the size of the payload only, not including those bytes. See http://goo.gl/rg5uom

Allow None timeout in FetchContext even if block is False

4d6bafa

Reset consumer fields to original values rather than defaults in Fetc…

5dd8d81

…hContext

Remove SimpleConsumer queue size limit since it can cause the iterator

b68523f

to block forever if it's reached.

Add buffer_size param description to docstring

450faeb

Add iter_timeout option to SimpleConsumer. If not None, it causes the…

dc4198b

… iterator to exit when reached. Also put constant timeout values in pre-defined constants

Add comments and maintain 80 character line limit

c1ba510

Add and fix comments to protocol.py

93b6579

Add note about questionable error handling while decoding messages.

009ed92

Will remove once any error handling issues are resolved.

Fix unit tests.

b6b1ba0

This is pretty much a rewrite. The tests that involve offset requests/responses are not implemented since that API is not supported in kafka 0.8 yet. Only kafka.codec and kafka.protocol are currently tested, so there is more work to be done here.

Style fix for imports

99b561d

Fix seek offset deltas

81d001b

We always store the offset of the next available message, so we shouldn't decrement the offset deltas when seeking by an extra 1

Raise a ConnectionError when a socket.error is raised when receiving …

d1e4fd2

…data Also, log.exception() is unhelpfully noisy. Use log.error() with some error details in the message instead.

Fix client error handling

8540f1f

This differentiates between errors that occur when sending the request and receiving the response, and adds BufferUnderflowError handling.

Handle starting/stopping Kafka brokers that are already started/stopp…

6d2b28a

…ed in integration tests If some of the tests stop brokers then error out, the teardown method will try to close the same brokers and fail. This change allows it to continue.

Remove unnecessary brackets

5721854

Fix client and consumer params in integration tests

59f884c

Add tests for limited and unlimited consumer max_buffer_size

d736d0b

Make kafka brokers per-test in failover integration tests

c11ff04

This is better since the tests stop/start brokers, and if something goes wrong they can affect eachother.

Add object type and ID to message prefix in fixtures output for easie…

e0c45ff

…r debugging

stephenarmstrong reviewed Jan 7, 2014
View reviewed changes

rdiomar added 5 commits January 7, 2014 17:35

Use the same timeout when reinitializing a connection

e5a5477

Handle dirty flag in conn.recv()

b4c20ac

* If the connection is dirty, reinit * If we get a BufferUnderflowError, the server could have gone away, so mark it dirty

Remove unnecessary method

2077653

Skip snappy/gzip tests if they're not available

f333e91

Some cleanup and easier to read test fixture output

9cbe45d

dpkp mentioned this pull request Jan 8, 2014

Check for socket errors / socket status on recv as well as send #89

Closed

turtlesoupy reviewed Jan 8, 2014
View reviewed changes

rdiomar added 2 commits January 8, 2014 11:30

Change BufferUnderflowError to ConnectionError in conn._read_bytes()

317c848

Both errors are handled the same way when raised and caught, so this makes sense.

Change log.error() back to log.exception()

a0c7141

rdiomar referenced this pull request in waliaashish85/kafka-python Jan 9, 2014

Don't replay the message from last commited offset

4103da4

rdiomar merged commit 87c7f9d into dpkp:master Jan 13, 2014

This was referenced Jan 16, 2014

Resize buffer (rather than minbytes) - Fix issue 73 #74

Closed

Fixing issue 86 #87

Closed

rdiomar mentioned this pull request Jan 28, 2014

Allow KafkaClient to take in a list of brokers for bootstrapping #70

Closed

dpkp mentioned this pull request Feb 25, 2014

Exception 'i' format requires -2147483648 <= number <= 2147483647 #86

Closed

dpkp mentioned this pull request Apr 8, 2014

Test Suite #156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Various changes/fixes #88

Various changes/fixes #88

Uh oh!

rdiomar commented Jan 7, 2014

Uh oh!

stephenarmstrong Jan 7, 2014

Uh oh!

rdiomar Jan 7, 2014

Uh oh!

stephenarmstrong Jan 7, 2014

Uh oh!

rdiomar Jan 8, 2014

Uh oh!

dpkp commented Jan 8, 2014

Uh oh!

turtlesoupy Jan 8, 2014

Uh oh!

rdiomar Jan 8, 2014

Uh oh!

turtlesoupy commented Jan 8, 2014

Uh oh!

rdiomar commented Jan 13, 2014

Uh oh!

turtlesoupy commented Jan 13, 2014

Uh oh!

Uh oh!

Various changes/fixes #88

Various changes/fixes #88

Uh oh!

Conversation

rdiomar commented Jan 7, 2014

Uh oh!

stephenarmstrong Jan 7, 2014

Choose a reason for hiding this comment

Uh oh!

rdiomar Jan 7, 2014

Choose a reason for hiding this comment

Uh oh!

stephenarmstrong Jan 7, 2014

Choose a reason for hiding this comment

Uh oh!

rdiomar Jan 8, 2014

Choose a reason for hiding this comment

Uh oh!

dpkp commented Jan 8, 2014

Uh oh!

turtlesoupy Jan 8, 2014

Choose a reason for hiding this comment

Uh oh!

rdiomar Jan 8, 2014

Choose a reason for hiding this comment

Uh oh!

turtlesoupy commented Jan 8, 2014

Uh oh!

rdiomar commented Jan 13, 2014

Uh oh!

turtlesoupy commented Jan 13, 2014

Uh oh!

Uh oh!