-
Notifications
You must be signed in to change notification settings - Fork 584
Closed
Description
There was an issue reported to Lyra that seems to have more to do with the rabbitmq-java-client. Basically the problem is that when rabbitmq connections are proxied through an AWS elastic load balancer, ELB might accept a TCP connection but not respond to the initial handshake which leaves the client hanging forever. ELB may even close the connection, but I believe the BlockingCell is just left waiting forever. Here's a call stack from a mocked up test that reproduces this scenario:
Thread [lyra-recovery-1] (Suspended)
waiting for: BlockingValueOrException<V,E> (id=28)
Object.wait(long) line: not available [native method]
BlockingValueOrException<V,E>(Object).wait() line: 503
BlockingValueOrException<V,E>(BlockingCell<T>).get() line: 50
BlockingValueOrException<V,E>(BlockingCell<T>).uninterruptibleGet() line: 89
BlockingValueOrException<V,E>.uninterruptibleGetValue() line: 33
AMQChannel$SimpleBlockingRpcContinuation(AMQChannel$BlockingRpcContinuation<T>).getReply() line: 348
AMQConnection.start() line: 294
ConnectionFactory.newConnection(ExecutorService, Address[]) line: 603
ConnectionHandler$3.call() line: 243
ConnectionHandler$3.call() line: 236
ConnectionHandler(RetryableResource).callWithRetries(Callable<T>, RecurringPolicy<?>, RecurringStats, Set<Class<Exception>>, boolean, boolean) line: 51
ConnectionHandler.createConnection(RecurringPolicy<?>, Set<Class<Exception>>, boolean) line: 236
ConnectionHandler.recoverConnection() line: 271
ConnectionHandler.access$100(ConnectionHandler) line: 41
ConnectionHandler$ConnectionShutdownListener$1.run() line: 95
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1110
ThreadPoolExecutor$Worker.run() line: 603
Thread.run() line: 722
The first idea that comes to my mind is that everything that happens inside AMQConnection.start() should be covered by the connection timeout setting and/or an eventual connection closure should unblock the BlockingCell.