Skip to content

Commit 8778b27

Browse files
ajyoung-oracledavem330
authored andcommitted
ldmvsw: tx queue stuck in stopped state after LDC reset
The following patch fixes an issue with the ldmvsw driver where the network connection of a guest domain becomes non-functional after the guest domain has panic'd and rebooted. The root cause was determined to be from the following series of events: 1. Guest domain panics - resulting in the guest no longer processing network packets (from ldmvsw driver) 2. The ldmvsw driver (in the control domain) eventually exerts flow control due to no more available tx drings and stops the tx queue for the guest domain 3. The LDC of the network connection for the guest is reset when the guest domain reboots after the panic. 4. The LDC reset event is received by the ldmvsw driver and the ldmvsw responds by clearing the tx queue for the guest. 5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is the normal method to re-enable the tx queue. But the ACK never comes because the tx queue was cleared due to the LDC reset. To fix this issue, in addition to clearing the tx queue, re-enable the tx queue on a LDC reset. This prevents the ldmvsw from getting caught in this deadlocked state of waiting for a DATA ACK which will never come. Signed-off-by: Aaron Young <Aaron.Young@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 04f762e commit 8778b27

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

drivers/net/ethernet/sun/sunvnet_common.c

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -704,9 +704,8 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf)
704704
return 0;
705705
}
706706

707-
/* Got back a STOPPED LDC message on port. If the queue is stopped,
708-
* wake it up so that we'll send out another START message at the
709-
* next TX.
707+
/* If the queue is stopped, wake it up so that we'll
708+
* send out another START message at the next TX.
710709
*/
711710
static void maybe_tx_wakeup(struct vnet_port *port)
712711
{
@@ -734,6 +733,7 @@ EXPORT_SYMBOL_GPL(sunvnet_port_is_up_common);
734733

735734
static int vnet_event_napi(struct vnet_port *port, int budget)
736735
{
736+
struct net_device *dev = VNET_PORT_TO_NET_DEVICE(port);
737737
struct vio_driver_state *vio = &port->vio;
738738
int tx_wakeup, err;
739739
int npkts = 0;
@@ -747,6 +747,16 @@ static int vnet_event_napi(struct vnet_port *port, int budget)
747747
if (event == LDC_EVENT_RESET) {
748748
vnet_port_reset(port);
749749
vio_port_up(vio);
750+
751+
/* If the device is running but its tx queue was
752+
* stopped (due to flow control), restart it.
753+
* This is necessary since vnet_port_reset()
754+
* clears the tx drings and thus we may never get
755+
* back a VIO_TYPE_DATA ACK packet - which is
756+
* the normal mechanism to restart the tx queue.
757+
*/
758+
if (netif_running(dev))
759+
maybe_tx_wakeup(port);
750760
}
751761
port->rx_event = 0;
752762
return 0;

0 commit comments

Comments
 (0)