-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What happened?
This ticket is almost identical as my previous ticket:
#2901
but this time I want to emphasize that we need the functionality for the scalingType: deployment
.
So, the logic in this PR is good but we'd like to extend to to handle scalingType: deployment
:
#2902
In brief, the problem is the following:
In case of scalingType: deployment
the default value of "SE_DRAIN_AFTER_SESSION_COUNT" is set to zero (0) and there's still no way to override it (to "30" in our case) because when I try to change it for the Chrome Node in the following way:
extraEnvironmentVariables: # Custom environment variables for chromeNode
- name: SE_DRAIN_AFTER_SESSION_COUNT
value: "30"
then the HelmRelease (managed by FluxCD) complains about that, and cannot perform a patch:

Therefore, each time we upgrade selenium-grid, we have to manually remove all underlying selenium Deployments from all our clusters, and then resume the Helm Release manually so it creates everything (all the resources) from scratch without using "patch" to any existing Deployments.
This is cumbersome and is causing downtimes.
Command used to start Selenium Grid with Docker (or Kubernetes)
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: comp-tests-selenium
spec:
releaseName: comp-tests-selenium
chart:
spec:
chart: selenium-grid
sourceRef:
kind: HelmRepository
name: selenium-grid
version: "0.45.1"
interval: 10m
timeout: 9m30s
install:
remediation:
retries: 3
# https://github.com/SeleniumHQ/docker-selenium/blob/trunk/charts/selenium-grid/values.yaml
values:
global:
seleniumGrid:
imagePullSecret: artifactory
kubectlImage: docker.company.com/bitnami/kubectl:1.31
imageRegistry: docker.company.com/selenium
isolateComponents: false
chromeNode:
scaledObjectOptions:
scaleTargetRef:
name: selenium-chrome-node
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: [ "ALL" ]
seccompProfile:
type: RuntimeDefault
imageName: node-chrome
dshmVolumeSizeLimit: 1.5Gi
replicas: 2
resources:
limits:
cpu: 2 #by default from helm charts defined to 1
memory: 1.5Gi
requests:
memory: 1Gi
cpu: 1
startupProbe:
httpGet:
path: /status
port: 5555
failureThreshold: 120
periodSeconds: 5
terminationGracePeriodSeconds: 90
# Allow pod correctly shutdown
deregisterLifecycle:
preStop:
exec:
command: [ "bash", "-c", "/opt/bin/nodePreStop.sh" ]
extraEnvironmentVariables: # Custom environment variables for chromeNode
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
- name: SCREEN_DEPTH
value: "24"
- name: SCREEN_DPI
value: "74"
- name: SE_DRAIN_AFTER_SESSION_COUNT
value: "30"
- name: SE_NODE_SESSION_TIMEOUT # The Node will automatically kill a session that has not had any activity in the last X seconds. This will release the slot for other tests
value: "60"
- name: SE_NODE_GRID_URL
value: "http://comp-tests-selenium-selenium-hub.comp-tests-selenium${namespace_suffix}.svc:4444" #hrName-selenium-hub.namespace
- name: SE_EVENT_BUS_HOST
value: "comp-tests-selenium-selenium-hub.comp-tests-selenium${namespace_suffix}" #hrName-selenium-hub.namespace
nodeSelector:
qa: "true"
tolerations:
- key: qa
value: "true"
effect: NoSchedule
firefoxNode:
enabled: false
edgeNode:
enabled: false
hub:
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: [ "ALL" ]
seccompProfile:
type: RuntimeDefault
# affinity: consider podAntiAffinity with hub and nodes, from newer versions chart provides this possibility
imageName: hub
serviceType: ClusterIP
resources:
limits:
memory: 2Gi
requests:
memory: 1Gi
cpu: 0.2
annotations:
karpenter.sh/do-not-disrupt: "true"
extraEnvironmentVariables: # Custom environment variables for hub
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
- name: SCREEN_DEPTH
value: "24"
- name: SCREEN_DPI
value: "74"
- name: SE_SESSION_REQUEST_TIMEOUT # A new incoming session request is added to the queue. Requests sitting in the queue for longer than the configured time will timeout.
value: "180"
nodeSelector:
qa: "true"
tolerations:
- key: qa
value: "true"
effect: NoSchedule
ingress:
className: private-nginx
annotations:
nginx.ingress.kubernetes.io/service-upstream: "true"
nginx.ingress.kubernetes.io/backend-protocol: HTTP
external-dns.alpha.kubernetes.io/private: "true"
cert-manager.io/cluster-issuer: letsencrypt
hostname: "comp-tests-selenium${namespace_suffix}.tools.${cluster_region}.${cluster_domain}"
tls:
- secretName: comp-tests-selenium-private-ingress-tls-selenium
hosts:
- "comp-tests-selenium${namespace_suffix}.tools.${cluster_region}.${cluster_domain}"
autoscaling:
patchObjectFinalizers:
enabled: true #https://github.com/SeleniumHQ/docker-selenium/issues/2196
enabled: false
enableWithExistingKEDA: true
scalingType: deployment
scaledOptions:
minReplicaCount: 0
maxReplicaCount: 5
pollingInterval: 10
scaledObjectOptions:
# triggers: #consider this section when connection to hub is not properly set
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 4
periodSeconds: 10
scaleDown:
stabilizationWindowSeconds: 360
policies:
- type: Pods
value: 1
periodSeconds: 150
Relevant log output
Status:
│ Conditions:
│ Last Transition Time: 2025-07-17T08:39:26Z
│ Message: Failed to upgrade after 1 attempt(s)
│ Observed Generation: 54
│ Reason: RetriesExceeded
│ Status: True
│ Type: Stalled
│ Last Transition Time: 2025-07-17T08:07:07Z
│ Message: Helm upgrade failed for release comp-tests-selenium/comp-tests-selenium with chart selenium-grid@0.45.1: cannot patch "comp-tests-selenium-selenium-node-chrome" with kind Deploy │
│ ment: The order in patch list: │
│ [map[name:SE_NODE_STEREOTYPE_EXTRA value:] map[name:SE_DRAIN_AFTER_SESSION_COUNT value:0] map[name:SE_DRAIN_AFTER_SESSION_COUNT value:30] map[name:SE_NODE_BROWSER_VERSION value:] map[name:SE_NODE_PLATF │
│ RM_NAME value:[] map[name:SE_OTEL_RESOURCE_ATTRIBUTES value:app.kubernetes.io/component=selenium-grid-4.34.0-20250707,app.kubernetes.io/instance=comp-tests-selenium,app.kubernetes.io/managed-by=helm,app │
│ .kubernetes.io/version=4.34.0-20250707,helm.sh/chart=selenium-grid-0.45.1]] │
│ doesn't match $setElementOrder list: │
│ [map[name:KUBERNETES_NODE_HOST_IP] map[name:SE_NODE_MAX_SESSIONS] map[name:SE_NODE_ENABLE_MANAGED_DOWNLOADS] map[name:SE_NODE_STEREOTYPE_EXTRA] map[name:SE_DRAIN_AFTER_SESSION_COUNT] map[name:SE_NODE_B │
│ OWSER_NAME[] map[name:SE_NODE_BROWSER_VERSION] map[name:SE_NODE_PLATFORM_NAME] map[name:SE_NODE_CONTAINER_NAME] map[name:SE_OTEL_SERVICE_NAME] map[name:SE_OTEL_RESOURCE_ATTRIBUTES] map[name:SE_NODE_HOS │
│ [] map[name:SE_NODE_PORT] map[name:SE_NODE_REGISTER_PERIOD] map[name:SE_NODE_REGISTER_CYCLE] map[name:SCREEN_WIDTH] map[name:SCREEN_HEIGHT] map[name:SCREEN_DEPTH] map[name:SCREEN_DPI] map[name:SE_DRAIN │
│ AFTER_SESSION_COUNT[] map[name:SE_NODE_SESSION_TIMEOUT] map[name:SE_NODE_GRID_URL] map[name:SE_EVENT_BUS_HOST]]
│ Observed Generation: 54
│ Reason: UpgradeFailed
│ Status: False
│ Type: Ready
Operating System
Kubernetes EKS
Docker Selenium version (image tag)
4.34.0-20250707
Selenium Grid chart version (chart version)
0.45.1