-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis results backend: apply_async().get() hangs forever after disconnection from redis-server #4857
Comments
lets continue the discussion there instead? or this can be open as a separate related issue? |
@amitlicht did you try configuring a socket timeout ? |
@georgepsarakis I did try to circumvent the entire internal pub-sub usage within celery, by adding our own subscriber on Just throwing an idea out there - perhaps celery's usage of redis-py somehow disables redis-py disconnection handling? |
@amitlicht sorry I did not clarify earlier, this is the socket connect timeout and is a different setting, with a default value of |
@georgepsarakis I did not try |
@amitlicht this is a typo probably, the setting is available in the latest release: https://github.com/celery/celery/blob/v4.2.0/celery/backends/redis.py#L173 |
@georgepsarakis just tried it with socket connect timeout of 10 seconds. Issue remains.
|
@amitlicht thanks for trying this, I will look into this a bit further. |
@amitlicht I found this redis-py issue which mentions that the use of TCP Keep-Alive setting may be of use here, since it will result in faster detection of the connection error. I don't think the Redis backend supports this option though, but it should not be very difficult for you to try it. If you want help, let me know. |
What I don't understand is why you would need any kind of special settings to begin with. As I mentioned, I managed to circumvent the issue entirely by just initializing my own redis-py client, a pubsub consumer out of it, and subscribe it to the
This works as is without any special configuration, server or client side. Which makes me think the problem on celery side is some special configuration to begin with... |
@amitlicht thanks for the feedback. I just noticed that you are using If possible please provide a stack trace, generated when you are stopping a client that seems idle, that should help us locate the source of the issue. |
Any update on this? |
@acmisiti please check my previous comment. Can you perhaps provide a stack trace generated when stopping an idle client? |
Providing a stacktrace on an idle client is non-trivial since it's all running in a containerized env. As I said we went around it using a rather ugly but working hack... |
@georgepsarakis I can provide a stack trace for this from a keyboard interrupt from running a service in a shell that has the same issue. Only affects Celery >= 4.2.0 -- Kombu version doesn't seem to matter, last known good is Celery == 4.1.0, and issue related to Redis verified by removing the Redis results backend, the issue clears. Below is the stacktrace with the following setup:
Stacktrace: File "/opt/conda/envs/the_service/lib/python2.7/site-packages/some/package/that/works.py", line 795, in read_data
return request.get().get()
File "/opt/conda/envs/the_service/lib/python2.7/site-packages/celery/result.py", line 224, in get
on_message=on_message,
File "/opt/conda/envs/the_service/lib/python2.7/site-packages/celery/backends/async.py", line 188, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "/opt/conda/envs/the_service/lib/python2.7/site-packages/celery/backends/async.py", line 257, in _wait_for_pending
sleep(0)
File "/opt/conda/envs/the_service/lib/python2.7/site-packages/gevent/hub.py", line 167, in sleep
waiter.get()
File "/opt/conda/envs/the_service/lib/python2.7/site-packages/gevent/hub.py", line 899, in get
return self.hub.switch()
File "/opt/conda/envs/the_service/lib/python2.7/site-packages/gevent/hub.py", line 627, in switch
switch_out = getattr(getcurrent(), 'switch_out', None)
KeyboardInterrupt
It will just wait forever until the service is killed. |
@stringfellow could you kindly try this with celery 4.3rc3 once? |
@auvipy not easily... after finding this issue and posting the stack trace I removed the Redis backend (this is in production and we weren't using the results anyway) - I'm away from work until early April now so am unlikely to have time to do it. Are you not able to reproduce the issue in testing (I couldn't reproduce it on my local Docker stack)? |
update? |
does updating gevent fix this in master? |
Version info
Steps to reproduce
Celery configuration: broker is rabbitmq, results backend is local redis (password protected). Both broker & client are gevented.
Expected behavior
ConnectionError: Error while reading from socket: ('Connection closed by server.',)
)Actual behavior
Additional information
The text was updated successfully, but these errors were encountered: