Skip to content

Readiness probe should not fail when shutting down #13689

@dippynark

Description

@dippynark

What happened:

HTTP request timeout when using externalTrafficPolicy Local and terminating the last Nginx Ingress Controller Pod on a node:

Get "https://example.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

What you expected to happen:

Connections and requests should be drained gracefully.

When an Nginx Ingress Controller Pod receives the TERM signal from the kubelet, the readiness probe starts failing straight away:

n.isShuttingDown = true

if n.isShuttingDown {
return fmt.Errorf("the ingress controller is shutting down")
}

When using externalTrafficPolicy Local, if the last Nginx Ingress Controller Pod on the node is terminated and the Pod readiness check has failed but the load balancer has not yet probed the node health check, kube-proxy will remove all routes and new connections will be blackholed.

This is fairly likely when using Nginx Ingress Controller with its default configuration since the wait-shutdown binary causes the readiness check to fail but allows Nginx to continue serving (which may continue for a while if the --shutdown-grace-period flag is set).

Instead, to make use of KEP-1669: Proxy Terminating Endpoints, the Nginx Ingress Controller readiness check should continue to succeed after termination to allow existing connnections to be served by terminating Pods until the load balancer has stopped sending new connections. Nginx can then terminate any ongoing connections gracefully.

I think this can be fixed by simply removing the following lines:

if n.isShuttingDown {
return fmt.Errorf("the ingress controller is shutting down")
}

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.12.1
  Build:         51c2b819690bbf1709b844dbf321a9acf6eda5a7
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.5

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.33.2-gke.1240000

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release): Container-Optimized OS
  • How was the ingress-nginx-controller installed: Bare metal installation manifests

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions