Skip to content

[🐛 Bug]: Override the variable SE_DRAIN_AFTER_SESSION_COUNT when using FluxCD HelmReleases (scalingType : deployment) #2904

@lukaszmatura-sa

Description

@lukaszmatura-sa

What happened?

This ticket is almost identical as my previous ticket:
#2901

but this time I want to emphasize that we need the functionality for the scalingType: deployment.

So, the logic in this PR is good but we'd like to extend to to handle scalingType: deployment:
#2902

In brief, the problem is the following:
In case of scalingType: deployment the default value of "SE_DRAIN_AFTER_SESSION_COUNT" is set to zero (0) and there's still no way to override it (to "30" in our case) because when I try to change it for the Chrome Node in the following way:

   extraEnvironmentVariables: # Custom environment variables for chromeNode
       - name: SE_DRAIN_AFTER_SESSION_COUNT
         value: "30"

then the HelmRelease (managed by FluxCD) complains about that, and cannot perform a patch:

Image

Therefore, each time we upgrade selenium-grid, we have to manually remove all underlying selenium Deployments from all our clusters, and then resume the Helm Release manually so it creates everything (all the resources) from scratch without using "patch" to any existing Deployments.
This is cumbersome and is causing downtimes.

Command used to start Selenium Grid with Docker (or Kubernetes)

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: comp-tests-selenium
spec:
  releaseName: comp-tests-selenium
  chart:
    spec:
      chart: selenium-grid
      sourceRef:
        kind: HelmRepository
        name: selenium-grid
      version: "0.45.1"
  interval: 10m
  timeout: 9m30s
  install:
    remediation:
      retries: 3
  # https://github.com/SeleniumHQ/docker-selenium/blob/trunk/charts/selenium-grid/values.yaml
  values:
    global:
      seleniumGrid:
        imagePullSecret: artifactory
        kubectlImage: docker.company.com/bitnami/kubectl:1.31
        imageRegistry: docker.company.com/selenium
    isolateComponents: false
    chromeNode:
      scaledObjectOptions:
        scaleTargetRef:
          name: selenium-chrome-node
      securityContext:
        allowPrivilegeEscalation: false
        runAsNonRoot: true
        capabilities:
          drop: [ "ALL" ]
        seccompProfile:
          type: RuntimeDefault
      imageName: node-chrome
      dshmVolumeSizeLimit: 1.5Gi
      replicas: 2
      resources:
        limits:
          cpu: 2 #by default from helm charts defined to 1
          memory: 1.5Gi
        requests:
          memory: 1Gi
          cpu: 1
      startupProbe:
        httpGet:
          path: /status
          port: 5555
        failureThreshold: 120
        periodSeconds: 5
      terminationGracePeriodSeconds: 90
#       Allow pod correctly shutdown
      deregisterLifecycle:
        preStop:
          exec:
            command: [ "bash", "-c", "/opt/bin/nodePreStop.sh" ]
      extraEnvironmentVariables: # Custom environment variables for chromeNode
        - name: SCREEN_WIDTH
          value: "1920"
        - name: SCREEN_HEIGHT
          value: "1080"
        - name: SCREEN_DEPTH
          value: "24"
        - name: SCREEN_DPI
          value: "74"
        - name: SE_DRAIN_AFTER_SESSION_COUNT
          value: "30"
        - name: SE_NODE_SESSION_TIMEOUT # The Node will automatically kill a session that has not had any activity in the last X seconds. This will release the slot for other tests
          value: "60"
        - name: SE_NODE_GRID_URL
          value: "http://comp-tests-selenium-selenium-hub.comp-tests-selenium${namespace_suffix}.svc:4444" #hrName-selenium-hub.namespace
        - name: SE_EVENT_BUS_HOST
          value: "comp-tests-selenium-selenium-hub.comp-tests-selenium${namespace_suffix}" #hrName-selenium-hub.namespace
      nodeSelector:
        qa: "true"
      tolerations:
        - key: qa
          value: "true"
          effect: NoSchedule
    firefoxNode:
      enabled: false
    edgeNode:
      enabled: false
    hub:
      securityContext:
        allowPrivilegeEscalation: false
        runAsNonRoot: true
        capabilities:
          drop: [ "ALL" ]
        seccompProfile:
          type: RuntimeDefault
      #      affinity: consider podAntiAffinity with hub and nodes, from newer versions chart provides this possibility
      imageName: hub
      serviceType: ClusterIP
      resources:
        limits:
          memory: 2Gi
        requests:
          memory: 1Gi
          cpu: 0.2
      annotations:
        karpenter.sh/do-not-disrupt: "true"
      extraEnvironmentVariables:  # Custom environment variables for hub
        - name: SCREEN_WIDTH
          value: "1920"
        - name: SCREEN_HEIGHT
          value: "1080"
        - name: SCREEN_DEPTH
          value: "24"
        - name: SCREEN_DPI
          value: "74"
        - name: SE_SESSION_REQUEST_TIMEOUT # A new incoming session request is added to the queue. Requests sitting in the queue for longer than the configured time will timeout.
          value: "180"
      nodeSelector:
        qa: "true"
      tolerations:
        - key: qa
          value: "true"
          effect: NoSchedule
    ingress:
      className: private-nginx
      annotations:
        nginx.ingress.kubernetes.io/service-upstream: "true"
        nginx.ingress.kubernetes.io/backend-protocol: HTTP
        external-dns.alpha.kubernetes.io/private: "true"
        cert-manager.io/cluster-issuer: letsencrypt
      hostname: "comp-tests-selenium${namespace_suffix}.tools.${cluster_region}.${cluster_domain}"
      tls:
        - secretName: comp-tests-selenium-private-ingress-tls-selenium
          hosts:
            - "comp-tests-selenium${namespace_suffix}.tools.${cluster_region}.${cluster_domain}"
    autoscaling:
      patchObjectFinalizers:
        enabled: true  #https://github.com/SeleniumHQ/docker-selenium/issues/2196
      enabled: false
      enableWithExistingKEDA: true
      scalingType: deployment
      scaledOptions:
        minReplicaCount: 0
        maxReplicaCount: 5
        pollingInterval: 10
      scaledObjectOptions:
        #        triggers: #consider this section when connection to hub is not properly set
        advanced:
          horizontalPodAutoscalerConfig:
            behavior:
              scaleUp:
                stabilizationWindowSeconds: 30
                policies:
                  - type: Pods
                    value: 4
                    periodSeconds: 10
              scaleDown:
                stabilizationWindowSeconds: 360
                policies:
                  - type: Pods
                    value: 1
                    periodSeconds: 150

Relevant log output

Status:                                                                                                                                                                                                   
│   Conditions:                                                                                                                                                                                             
│     Last Transition Time:  2025-07-17T08:39:26Z                                                                                                                                                           
│     Message:               Failed to upgrade after 1 attempt(s)                                                                                                                                           
│     Observed Generation:   54                                                                                                                                                                             
│     Reason:                RetriesExceeded                                                                                                                                                                
│     Status:                True                                                                                                                                                                           
│     Type:                  Stalled                                                                                                                                                                        
│     Last Transition Time:  2025-07-17T08:07:07Z                                                                                                                                                           
│     Message:               Helm upgrade failed for release comp-tests-selenium/comp-tests-selenium with chart selenium-grid@0.45.1: cannot patch "comp-tests-selenium-selenium-node-chrome" with kind Deploy │
│ ment: The order in patch list:                                                                                                                                                                            │
│ [map[name:SE_NODE_STEREOTYPE_EXTRA value:] map[name:SE_DRAIN_AFTER_SESSION_COUNT value:0] map[name:SE_DRAIN_AFTER_SESSION_COUNT value:30] map[name:SE_NODE_BROWSER_VERSION value:] map[name:SE_NODE_PLATF │
│ RM_NAME value:[] map[name:SE_OTEL_RESOURCE_ATTRIBUTES value:app.kubernetes.io/component=selenium-grid-4.34.0-20250707,app.kubernetes.io/instance=comp-tests-selenium,app.kubernetes.io/managed-by=helm,app │
│ .kubernetes.io/version=4.34.0-20250707,helm.sh/chart=selenium-grid-0.45.1]]                                                                                                                               │
│  doesn't match $setElementOrder list:                                                                                                                                                                     │
│ [map[name:KUBERNETES_NODE_HOST_IP] map[name:SE_NODE_MAX_SESSIONS] map[name:SE_NODE_ENABLE_MANAGED_DOWNLOADS] map[name:SE_NODE_STEREOTYPE_EXTRA] map[name:SE_DRAIN_AFTER_SESSION_COUNT] map[name:SE_NODE_B │
│ OWSER_NAME[] map[name:SE_NODE_BROWSER_VERSION] map[name:SE_NODE_PLATFORM_NAME] map[name:SE_NODE_CONTAINER_NAME] map[name:SE_OTEL_SERVICE_NAME] map[name:SE_OTEL_RESOURCE_ATTRIBUTES] map[name:SE_NODE_HOS │
│ [] map[name:SE_NODE_PORT] map[name:SE_NODE_REGISTER_PERIOD] map[name:SE_NODE_REGISTER_CYCLE] map[name:SCREEN_WIDTH] map[name:SCREEN_HEIGHT] map[name:SCREEN_DEPTH] map[name:SCREEN_DPI] map[name:SE_DRAIN │
│ AFTER_SESSION_COUNT[] map[name:SE_NODE_SESSION_TIMEOUT] map[name:SE_NODE_GRID_URL] map[name:SE_EVENT_BUS_HOST]]                                                                                           
│     Observed Generation:   54                                                                                                                                                                             
│     Reason:                UpgradeFailed                                                                                                                                                                  
│     Status:                False                                                                                                                                                                          
│     Type:                  Ready

Operating System

Kubernetes EKS

Docker Selenium version (image tag)

4.34.0-20250707

Selenium Grid chart version (chart version)

0.45.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions