-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Description
What happened:
Have an application hosted within AWS EKS, integrated with API Gateway Via VPC Link. We have about 3-6 Nginx Pods spread throughout 3 availability zones, with about 60 application pods also spread throughout those availability zones. We are using the ingress-nginx
helm chart, on version 4.12.1.
We have extensively load tested the architecture in lower environments however when deploying to production we observed, at a rate of about 1/6th of all calls would return a 500. Initially thought to be an upstream application error we investigated and realized that the applciation completes each request successfully but the ingress controller transforms the applications 200 into a 500 and forwards that to the API Gateway which then shows an error on the client side.
Error from API Gateway Logs:
<html>
<head>
<title>Internal Server Error</title>
</head>
<body>
<h1><p>Internal Server Error</p></h1>
</body>
</html>
Log from the Ingress Controller (Slightly obfuscated):
10.4.60.47 - - [15/Jul/2025:03:57:42 +0000] "POST /endpoint/a HTTP/1.1" 500 141 "-" "AMAZONAPIGATEWAYID" 660 0.375 [pod_app] [] 10.4.16.33:9001 141 0.375 500 InnerReqId
Something important of note, we have experienced internal server errors before, but when they are caused from an unhandled application error the response is of content type application/json
not of type text/html
What you expected to happen:
Nginx should forward all successful application requests untransformed to the Gateway in the way it was received.
Might be a misconfiguration but for the life of us we cannot seem to nail what the misconfig might be. This occurs during high load.
NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version
):
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.12.1
Build: 51c2b819690bbf1709b844dbf321a9acf6eda5a7
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.5
-------------------------------------------------------------------------------
Kubernetes version (use kubectl version
):
└─(09:44:45)──> kubectl version ──(Tue,Jul29)─┘
Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.33.1-eks-595af52
Environment:
-
Cloud provider or hardware configuration: AWS EKS Auto Mode
-
OS (e.g. from /etc/os-release): Alpine Linux
-
Kernel (e.g.
uname -a
):Linux 6.12.35 #1 SMP PREEMPT_DYNAMIC Wed Jul 16 23:35:39 UTC 2025 x86_64 Linux
-
Install tools:
- Cluster was created via Terraform using the AWS EKS Module, Ingress was created via helm with a config map provided for some specific values. All other values would remain default.
-
How was the ingress-nginx-controller installed:
- If helm was used then please show output of
helm ls -A | grep -i ingress
nginx-ingress-controller utils 35 2025-07-29 13:43:48.126001586 +0000 UTC deployed ingress-nginx-4.12.1 1.12.1
- If helm was used then please show output of
helm -n <ingresscontrollernamespace> get values <helmreleasename>
- If helm was used then please show output of
USER-SUPPLIED VALUES:
controller:
config:
client-body-timeout: "30"
forwarded-for-headers: X-Forwarded-For
keep-alive-requests: "20000"
max-worker-connections: 32000
proxy-body-size: 150m
proxy-read-timeout: "60"
proxy-send-timeout: "60"
proxy-stream-timeout: "60"
ssl-redirect: "true"
upstream-keepalive-connections: "4000"
use-forwarded-headers: "true"
worker-processes: "8"
kind: Deployment
metrics:
enabled: true
replicaCount: 3
resources:
limits:
cpu: 4000m
memory: 4Gi
requests:
cpu: 1000m
memory: 1Gi
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-a,subnet-b,subnet-c
externalTrafficPolicy: Local
type: LoadBalancer
spec:
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 5; /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit;
while pgrep -x nginx; do sleep 1; done
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: '{{ .Release.Name }}'
app.kubernetes.io/name: '{{ include "ingress-nginx.name" . }}'
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
serviceAccount:
create: false
name: nginx-ingress-controller
- Current State of the controller:
kubectl describe ingressclasses
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx-ingress-controller
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.12.1
helm.sh/chart=ingress-nginx-4.12.1
Annotations: meta.helm.sh/release-name: nginx-ingress-controller
meta.helm.sh/release-namespace: utils
Controller: k8s.io/ingress-nginx
Events: <none>
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name: nginx-ingress-controller
Namespace: utils
Priority: 0
Service Account: nginx-ingress-controller
Node: node-a/10.4.22.49
Start Time: Tue, 29 Jul 2025 04:04:34 -0400
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx-ingress-controller
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.12.1
helm.sh/chart=ingress-nginx-4.12.1
pod-template-hash=66f9f9cf6c
Annotations: <none>
Status: Running
IP: 10.4.24.180
IPs:
IP: 10.4.24.180
Controlled By: ReplicaSet/nginx-ingress-controller
Containers:
controller:
Container ID: containerd://3626f36bc31f6b5d59822903d791ea8dfdf1da62aef9ee3c93314e076a539869
Image: registry.k8s.io/ingress-nginx/controller:v1.12.1@sha256:d2fbc4ec70d8aa2050dd91a91506e998765e86c96f32cffb56c503c9c34eed5b
Image ID: registry.k8s.io/ingress-nginx/controller@sha256:d2fbc4ec70d8aa2050dd91a91506e998765e86c96f32cffb56c503c9c34eed5b
Ports: 80/TCP, 443/TCP, 10254/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/nginx-ingress-controller-ingress-nginx-controller
--election-id=nginx-ingress-controller-ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/nginx-ingress-controller-ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--enable-metrics=true
--v=3
State: Running
Started: Tue, 29 Jul 2025 04:04:40 -0400
Ready: True
Restart Count: 0
Limits:
cpu: 4
memory: 4Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-ingress-nginx-controller-66f9f9cfqzfnt (v1:metadata.name)
POD_NAMESPACE: utils (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
...
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5fz7v (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-controller-ingress-nginx-admission
Optional: false
kube-api-access-5fz7v:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
Optional: false
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress-controller,app.kubernetes.io/name=ingress-nginx
Events: <none>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
- Noticed the following in the events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 57m service Failed deploy model due to operation error Elastic Load Balancing v2: DeleteLoadBalancer, https response error StatusCode: 400, RequestID:..., ResourceInUse: Load balancer '...' cannot be deleted because it is currently associated with another service
Metadata
Metadata
Assignees
Labels
Type
Projects
Status