Application Pod Returning 200, Nginx Returning 500

**What happened**:

Have an application hosted within AWS EKS, integrated with API Gateway Via VPC Link. We have about 3-6 Nginx Pods spread throughout 3 availability zones, with about 60 application pods also spread throughout those availability zones. We are using the `ingress-nginx` helm chart, on version `4.12.1.` We have extensively load tested the architecture in lower environments however when deploying to production we observed, at a rate of about 1/6th of all calls would return a 500. Initially thought to be an upstream application error we investigated and realized that the applciation completes each request successfully but the ingress controller transforms the applications 200 into a 500 and forwards that to the API Gateway which then shows an error on the client side. 

Error from API Gateway Logs:
```
<html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>
    
  </body>
</html>
```

Log from the Ingress Controller (Slightly obfuscated):
`10.4.60.47 - - [15/Jul/2025:03:57:42 +0000] "POST /endpoint/a HTTP/1.1" 500 141 "-" "AMAZONAPIGATEWAYID" 660 0.375 [pod_app] [] 10.4.16.33:9001 141 0.375 500 InnerReqId`

Something important of note, we have experienced internal server errors before, but when they are caused from an unhandled application error the response is of content type `application/json` not of type `text/html`



**What you expected to happen**:

Nginx should forward all successful application requests untransformed to the Gateway in the way it was received.


Might be a misconfiguration but for the life of us we cannot seem to nail what the misconfig might be. This occurs during high load.

**NGINX Ingress controller version** (exec into the pod and run `/nginx-ingress-controller --version`):
```
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.12.1
  Build:         51c2b819690bbf1709b844dbf321a9acf6eda5a7
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.5

-------------------------------------------------------------------------------
```
**Kubernetes version** (use `kubectl version`):

```
└─(09:44:45)──> kubectl version                                                                                                                                 ──(Tue,Jul29)─┘
Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.33.1-eks-595af52
```

**Environment**:

- **Cloud provider or hardware configuration**: AWS EKS Auto Mode
- **OS** (e.g. from /etc/os-release): Alpine Linux
- **Kernel** (e.g. `uname -a`):  `Linux 6.12.35 #1 SMP PREEMPT_DYNAMIC Wed Jul 16 23:35:39 UTC 2025 x86_64 Linux`
- **Install tools**:
  - Cluster was created via Terraform using the AWS EKS Module, Ingress was created via helm with a config map provided for some specific values. All other values would remain default.

- **How was the ingress-nginx-controller installed**:
  - If helm was used then please show output of `helm ls -A | grep -i ingress`
  - ` nginx-ingress-controller        utils                   35              2025-07-29 13:43:48.126001586 +0000 UTC deployed        ingress-nginx-4.12.1                    1.12.1`
  - If helm was used then please show output of `helm -n <ingresscontrollernamespace> get values <helmreleasename>`
```
USER-SUPPLIED VALUES:
controller:
  config:
    client-body-timeout: "30"
    forwarded-for-headers: X-Forwarded-For
    keep-alive-requests: "20000"
    max-worker-connections: 32000
    proxy-body-size: 150m
    proxy-read-timeout: "60"
    proxy-send-timeout: "60"
    proxy-stream-timeout: "60"
    ssl-redirect: "true"
    upstream-keepalive-connections: "4000"
    use-forwarded-headers: "true"
    worker-processes: "8"
  kind: Deployment
  metrics:
    enabled: true
  replicaCount: 3
  resources:
    limits:
      cpu: 4000m
      memory: 4Gi
    requests:
      cpu: 1000m
      memory: 1Gi
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-a,subnet-b,subnet-c
    externalTrafficPolicy: Local
    type: LoadBalancer
  spec:
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - sleep 5; /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit;
            while pgrep -x nginx; do sleep 1; done
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: '{{ .Release.Name }}'
        app.kubernetes.io/name: '{{ include "ingress-nginx.name" . }}'
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
serviceAccount:
  create: false
  name: nginx-ingress-controller
```

- **Current State of the controller**:
  - `kubectl describe ingressclasses`
```
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=nginx-ingress-controller
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.12.1
              helm.sh/chart=ingress-nginx-4.12.1
Annotations:  meta.helm.sh/release-name: nginx-ingress-controller
              meta.helm.sh/release-namespace: utils
Controller:   k8s.io/ingress-nginx
Events:       <none>
```

  - `kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>`
```
Name:             nginx-ingress-controller
Namespace:        utils
Priority:         0
Service Account:  nginx-ingress-controller
Node:             node-a/10.4.22.49
Start Time:       Tue, 29 Jul 2025 04:04:34 -0400
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=nginx-ingress-controller
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.12.1
                  helm.sh/chart=ingress-nginx-4.12.1
                  pod-template-hash=66f9f9cf6c
Annotations:      <none>
Status:           Running
IP:               10.4.24.180
IPs:
  IP:           10.4.24.180
Controlled By:  ReplicaSet/nginx-ingress-controller
Containers:
  controller:
    Container ID:    containerd://3626f36bc31f6b5d59822903d791ea8dfdf1da62aef9ee3c93314e076a539869
    Image:           registry.k8s.io/ingress-nginx/controller:v1.12.1@sha256:d2fbc4ec70d8aa2050dd91a91506e998765e86c96f32cffb56c503c9c34eed5b
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:d2fbc4ec70d8aa2050dd91a91506e998765e86c96f32cffb56c503c9c34eed5b
    Ports:           80/TCP, 443/TCP, 10254/TCP, 8443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/nginx-ingress-controller-ingress-nginx-controller
      --election-id=nginx-ingress-controller-ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/nginx-ingress-controller-ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --enable-metrics=true
      --v=3
    State:          Running
      Started:      Tue, 29 Jul 2025 04:04:40 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     4
      memory:  4Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:                     nginx-ingress-controller-ingress-nginx-controller-66f9f9cfqzfnt (v1:metadata.name)
      POD_NAMESPACE:                utils (v1:metadata.namespace)
      LD_PRELOAD:                   /usr/local/lib/libmimalloc.so
      ...

    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5fz7v (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-ingress-controller-ingress-nginx-admission
    Optional:    false
  kube-api-access-5fz7v:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    Optional:                 false
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/os=linux
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress-controller,app.kubernetes.io/name=ingress-nginx
Events:                       <none>
```
  - `kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>`
  - Noticed the following in the events
```
Events:
  Type     Reason             Age    From     Message
  ----     ------             ----   ----     -------
  Warning  FailedDeployModel  57m    service  Failed deploy model due to operation error Elastic Load Balancing v2: DeleteLoadBalancer, https response error StatusCode: 400, RequestID:..., ResourceInUse: Load balancer '...' cannot be deleted because it is currently associated with another service
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Application Pod Returning 200, Nginx Returning 500 #13684

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Application Pod Returning 200, Nginx Returning 500 #13684

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions