Skip to content

Commit d2aa060

Browse files
Huy NguyenSaeed Mahameed
authored andcommitted
net/mlx5: Cancel health poll before sending panic teardown command
After the panic teardown firmware command, health_care detects the error in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is no longer needed because the panic teardown firmware command will bring down the PCI bus communication with the HCA. The solution is to cancel the health care timer and its pending workqueue request before sending panic teardown firmware command. Kernel trace: mlx5_core 0033:01:00.0: Shutdown was called mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1 mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0 mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end Unable to handle kernel paging request for data at address 0x0000003f Faulting instruction address: 0xc0080000434b8c80 Fixes: 8812c24 ('net/mlx5: Add fast unload support in shutdown flow') Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
1 parent b8cce68 commit d2aa060

File tree

1 file changed

+7
-0
lines changed
  • drivers/net/ethernet/mellanox/mlx5/core

1 file changed

+7
-0
lines changed

drivers/net/ethernet/mellanox/mlx5/core/main.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1482,9 +1482,16 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev)
14821482
return -EAGAIN;
14831483
}
14841484

1485+
/* Panic tear down fw command will stop the PCI bus communication
1486+
* with the HCA, so the health polll is no longer needed.
1487+
*/
1488+
mlx5_drain_health_wq(dev);
1489+
mlx5_stop_health_poll(dev);
1490+
14851491
ret = mlx5_cmd_force_teardown_hca(dev);
14861492
if (ret) {
14871493
mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: %d\n", ret);
1494+
mlx5_start_health_poll(dev);
14881495
return ret;
14891496
}
14901497

0 commit comments

Comments
 (0)