Skip to content

Client may hang during initial connection #65

@jhalterman

Description

@jhalterman

There was an issue reported to Lyra that seems to have more to do with the rabbitmq-java-client. Basically the problem is that when rabbitmq connections are proxied through an AWS elastic load balancer, ELB might accept a TCP connection but not respond to the initial handshake which leaves the client hanging forever. ELB may even close the connection, but I believe the BlockingCell is just left waiting forever. Here's a call stack from a mocked up test that reproduces this scenario:

Thread [lyra-recovery-1] (Suspended)    
    waiting for: BlockingValueOrException<V,E>  (id=28) 
    Object.wait(long) line: not available [native method]   
    BlockingValueOrException<V,E>(Object).wait() line: 503  
    BlockingValueOrException<V,E>(BlockingCell<T>).get() line: 50   
    BlockingValueOrException<V,E>(BlockingCell<T>).uninterruptibleGet() line: 89    
    BlockingValueOrException<V,E>.uninterruptibleGetValue() line: 33    
    AMQChannel$SimpleBlockingRpcContinuation(AMQChannel$BlockingRpcContinuation<T>).getReply() line: 348    
    AMQConnection.start() line: 294 
    ConnectionFactory.newConnection(ExecutorService, Address[]) line: 603   
    ConnectionHandler$3.call() line: 243    
    ConnectionHandler$3.call() line: 236    
    ConnectionHandler(RetryableResource).callWithRetries(Callable<T>, RecurringPolicy<?>, RecurringStats, Set<Class<Exception>>, boolean, boolean) line: 51 
    ConnectionHandler.createConnection(RecurringPolicy<?>, Set<Class<Exception>>, boolean) line: 236    
    ConnectionHandler.recoverConnection() line: 271 
    ConnectionHandler.access$100(ConnectionHandler) line: 41    
    ConnectionHandler$ConnectionShutdownListener$1.run() line: 95   
    ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1110  
    ThreadPoolExecutor$Worker.run() line: 603   
    Thread.run() line: 722  

The first idea that comes to my mind is that everything that happens inside AMQConnection.start() should be covered by the connection timeout setting and/or an eventual connection closure should unblock the BlockingCell.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions