Skip to content

Application Pod Returning 200, Nginx Returning 500 #13684

@rkbabunga

Description

@rkbabunga

What happened:

Have an application hosted within AWS EKS, integrated with API Gateway Via VPC Link. We have about 3-6 Nginx Pods spread throughout 3 availability zones, with about 60 application pods also spread throughout those availability zones. We are using the ingress-nginx helm chart, on version 4.12.1. We have extensively load tested the architecture in lower environments however when deploying to production we observed, at a rate of about 1/6th of all calls would return a 500. Initially thought to be an upstream application error we investigated and realized that the applciation completes each request successfully but the ingress controller transforms the applications 200 into a 500 and forwards that to the API Gateway which then shows an error on the client side.

Error from API Gateway Logs:

<html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>
    
  </body>
</html>

Log from the Ingress Controller (Slightly obfuscated):
10.4.60.47 - - [15/Jul/2025:03:57:42 +0000] "POST /endpoint/a HTTP/1.1" 500 141 "-" "AMAZONAPIGATEWAYID" 660 0.375 [pod_app] [] 10.4.16.33:9001 141 0.375 500 InnerReqId

Something important of note, we have experienced internal server errors before, but when they are caused from an unhandled application error the response is of content type application/json not of type text/html

What you expected to happen:

Nginx should forward all successful application requests untransformed to the Gateway in the way it was received.

Might be a misconfiguration but for the life of us we cannot seem to nail what the misconfig might be. This occurs during high load.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.12.1
  Build:         51c2b819690bbf1709b844dbf321a9acf6eda5a7
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.5

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

└─(09:44:45)──> kubectl version                                                                                                                                 ──(Tue,Jul29)─┘
Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.33.1-eks-595af52

Environment:

  • Cloud provider or hardware configuration: AWS EKS Auto Mode

  • OS (e.g. from /etc/os-release): Alpine Linux

  • Kernel (e.g. uname -a): Linux 6.12.35 #1 SMP PREEMPT_DYNAMIC Wed Jul 16 23:35:39 UTC 2025 x86_64 Linux

  • Install tools:

    • Cluster was created via Terraform using the AWS EKS Module, Ingress was created via helm with a config map provided for some specific values. All other values would remain default.
  • How was the ingress-nginx-controller installed:

    • If helm was used then please show output of helm ls -A | grep -i ingress
    • nginx-ingress-controller utils 35 2025-07-29 13:43:48.126001586 +0000 UTC deployed ingress-nginx-4.12.1 1.12.1
    • If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>
USER-SUPPLIED VALUES:
controller:
  config:
    client-body-timeout: "30"
    forwarded-for-headers: X-Forwarded-For
    keep-alive-requests: "20000"
    max-worker-connections: 32000
    proxy-body-size: 150m
    proxy-read-timeout: "60"
    proxy-send-timeout: "60"
    proxy-stream-timeout: "60"
    ssl-redirect: "true"
    upstream-keepalive-connections: "4000"
    use-forwarded-headers: "true"
    worker-processes: "8"
  kind: Deployment
  metrics:
    enabled: true
  replicaCount: 3
  resources:
    limits:
      cpu: 4000m
      memory: 4Gi
    requests:
      cpu: 1000m
      memory: 1Gi
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-a,subnet-b,subnet-c
    externalTrafficPolicy: Local
    type: LoadBalancer
  spec:
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - sleep 5; /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit;
            while pgrep -x nginx; do sleep 1; done
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: '{{ .Release.Name }}'
        app.kubernetes.io/name: '{{ include "ingress-nginx.name" . }}'
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
serviceAccount:
  create: false
  name: nginx-ingress-controller
  • Current State of the controller:
    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=nginx-ingress-controller
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.12.1
              helm.sh/chart=ingress-nginx-4.12.1
Annotations:  meta.helm.sh/release-name: nginx-ingress-controller
              meta.helm.sh/release-namespace: utils
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name:             nginx-ingress-controller
Namespace:        utils
Priority:         0
Service Account:  nginx-ingress-controller
Node:             node-a/10.4.22.49
Start Time:       Tue, 29 Jul 2025 04:04:34 -0400
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=nginx-ingress-controller
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.12.1
                  helm.sh/chart=ingress-nginx-4.12.1
                  pod-template-hash=66f9f9cf6c
Annotations:      <none>
Status:           Running
IP:               10.4.24.180
IPs:
  IP:           10.4.24.180
Controlled By:  ReplicaSet/nginx-ingress-controller
Containers:
  controller:
    Container ID:    containerd://3626f36bc31f6b5d59822903d791ea8dfdf1da62aef9ee3c93314e076a539869
    Image:           registry.k8s.io/ingress-nginx/controller:v1.12.1@sha256:d2fbc4ec70d8aa2050dd91a91506e998765e86c96f32cffb56c503c9c34eed5b
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:d2fbc4ec70d8aa2050dd91a91506e998765e86c96f32cffb56c503c9c34eed5b
    Ports:           80/TCP, 443/TCP, 10254/TCP, 8443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/nginx-ingress-controller-ingress-nginx-controller
      --election-id=nginx-ingress-controller-ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/nginx-ingress-controller-ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --enable-metrics=true
      --v=3
    State:          Running
      Started:      Tue, 29 Jul 2025 04:04:40 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     4
      memory:  4Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:                     nginx-ingress-controller-ingress-nginx-controller-66f9f9cfqzfnt (v1:metadata.name)
      POD_NAMESPACE:                utils (v1:metadata.namespace)
      LD_PRELOAD:                   /usr/local/lib/libmimalloc.so
      ...

    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5fz7v (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-ingress-controller-ingress-nginx-admission
    Optional:    false
  kube-api-access-5fz7v:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    Optional:                 false
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/os=linux
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress-controller,app.kubernetes.io/name=ingress-nginx
Events:                       <none>
  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
  • Noticed the following in the events
Events:
  Type     Reason             Age    From     Message
  ----     ------             ----   ----     -------
  Warning  FailedDeployModel  57m    service  Failed deploy model due to operation error Elastic Load Balancing v2: DeleteLoadBalancer, https response error StatusCode: 400, RequestID:..., ResourceInUse: Load balancer '...' cannot be deleted because it is currently associated with another service

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/supportCategorizes issue or PR as a support question.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions