ASG doesn't respect max size when updating EKS managed nodes #3347

guidodobboletta · 2025-04-23T07:15:25Z

Description

When doing a user_data update or an AMI update on my node group which has min: 1, max: 1 and desired_size: 1 the ASG will spin 5 more instances for absolutely no apparent reason.

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.36.0"

  cluster_name    = var.eks_cluster_name
  cluster_version = "1.32"

  cluster_addons = {
    coredns = {
      most_recent = true
    }

    kube-proxy = {
      most_recent = true
    }

    vpc-cni = {
      most_recent = true
    }

    aws-ebs-csi-driver = {
      most_recent = true
    }
  }

  vpc_id     = var.vpc_id
  subnet_ids = var.private_subnets

  create_cni_ipv6_iam_policy = false
  cluster_ip_family          = "ipv4"

  cluster_enabled_log_types              = ["audit", "api", "authenticator", "controllerManager", "scheduler"]
  create_cloudwatch_log_group            = true
  cloudwatch_log_group_retention_in_days = 365

  create_node_security_group = true
  cluster_security_group_additional_rules = {
    tailscale = {
      cidr_blocks = [var.vpc_cidr_block]
      description = "Allow all traffic from private VPC"
      from_port   = 443
      to_port     = 443
      protocol    = "all"
      type        = "ingress"
    }
  }

  cluster_endpoint_private_access = true

  enable_cluster_creator_admin_permissions = true

  access_entries = local.access_entries

  eks_managed_node_groups = {

    (var.eks_cluster_name) = {
      min_size     = 1
      max_size     = 1
      desired_size = 1

      use_name_prefix = true
      instance_types  = [var.instance_type]

      update_config = {
        max_unavailable = 1
      }

      pre_bootstrap_user_data = <<-EOT
      #!/bin/bash
      set -ex
      curl -fsSL https://tailscale.com/install.sh | sh
      tailscale up --accept-dns=false --authkey='${data.aws_secretsmanager_secret_version.tailscale_oauth_eks.secret_string}?ephemeral=true&preauthorized=true'
      tailscale set --ssh
      EOT

      ebs_optimized = true

      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = 100
            volume_type           = "gp3"
            delete_on_termination = true
          }
        }
      }
    }
  }
}

✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

Versions

Module version [Required]: 20.36.0
Terraform version: opentofu 1.9.0
Provider version(s): aws 5.95.0

Reproduction Code [Required]

Pasted above

Steps to reproduce the behavior:

Expected behavior

I'm expecting EKS to create another single node and then delete the old. Not create 6 nodes and then delete 5

Actual behavior

This is from a change in my pre_bootstrap_user_data I'm doing:

Status	Instance ID	UTC Time	Local Start Time	Local End Time	Log Details
✅ Launch	i-07cd11bde9c978d4f	2025-04-23T06:50:49Z	2025 April 23, 01:51:01 PM +07:00	2025 April 23, 01:52:10 PM +07:00	At 2025-04-23T06:50:49Z a user request update of AutoScalingGroup constraints to min: 1, max: 2, desired: 2 changing the desired capacity from 1 to 2. At 2025-04-23T06:50:59Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 1 to 2.
✅ Launch	i-0879b60cf3b843448	2025-04-23T06:52:53Z	2025 April 23, 01:52:55 PM +07:00	2025 April 23, 01:54:07 PM +07:00	At 2025-04-23T06:52:53Z a user request update of AutoScalingGroup constraints to min: 1, max: 3, desired: 3 changing the desired capacity from 2 to 3. At 2025-04-23T06:52:53Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 2 to 3.
✅ Launch	i-008676abad9cb498d	2025-04-23T06:54:56Z	2025 April 23, 01:55:10 PM +07:00	2025 April 23, 01:56:20 PM +07:00	At 2025-04-23T06:54:56Z a user request update of AutoScalingGroup constraints to min: 1, max: 4, desired: 4 changing the desired capacity from 3 to 4. At 2025-04-23T06:55:08Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 3 to 4.
✅ Launch	i-0d91cf39d55ce4819	2025-04-23T06:57:00Z	2025 April 23, 01:57:15 PM +07:00	2025 April 23, 01:58:24 PM +07:00	At 2025-04-23T06:57:00Z a user request update of AutoScalingGroup constraints to min: 1, max: 5, desired: 5 changing the desired capacity from 4 to 5. At 2025-04-23T06:57:13Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 4 to 5.
✅ Launch	i-00522e75bfa472fb5	2025-04-23T06:59:03Z	2025 April 23, 01:59:09 PM +07:00	2025 April 23, 02:00:18 PM +07:00	At 2025-04-23T06:59:03Z a user request update of AutoScalingGroup constraints to min: 1, max: 6, desired: 6 changing the desired capacity from 5 to 6. At 2025-04-23T06:59:07Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 5 to 6.
✅ Terminate	i-07ce15c6c067ce5f7	2025-04-23T07:04:49Z	2025 April 23, 02:04:49 PM +07:00	2025 April 23, 02:05:33 PM +07:00	At 2025-04-23T07:04:49Z instance i-07ce15c6c067ce5f7 was taken out of service in response to a user request, shrinking the capacity from 6 to 5.
✅ Terminate	i-0d91cf39d55ce4819	2025-04-23T07:05:41Z	2025 April 23, 02:05:51 PM +07:00	2025 April 23, 02:07:36 PM +07:00	At 2025-04-23T07:05:41Z a user request update of AutoScalingGroup constraints to min: 1, max: 5, desired: 4 changing the desired capacity from 5 to 4. At 2025-04-23T07:05:51Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 5 to 4. At 2025-04-23T07:05:51Z instance i-0d91cf39d55ce4819 was selected for termination.
✅ Terminate	i-008676abad9cb498d	2025-04-23T07:07:43Z	2025 April 23, 02:07:45 PM +07:00	2025 April 23, 02:09:30 PM +07:00	At 2025-04-23T07:07:43Z a user request update of AutoScalingGroup constraints to min: 1, max: 4, desired: 3 changing the desired capacity from 4 to 3. At 2025-04-23T07:07:45Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 4 to 3. At 2025-04-23T07:07:45Z instance i-008676abad9cb498d was selected for termination.
✅ Terminate	i-00522e75bfa472fb5	2025-04-23T07:09:45Z	2025 April 23, 02:09:50 PM +07:00	2025 April 23, 02:11:35 PM +07:00	At 2025-04-23T07:09:45Z a user request update of AutoScalingGroup constraints to min: 1, max: 3, desired: 2 changing the desired capacity from 3 to 2. At 2025-04-23T07:09:50Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 3 to 2. At 2025-04-23T07:09:50Z instance i-00522e75bfa472fb5 was selected for termination.
✅ Terminate	i-0879b60cf3b843448	2025-04-23T07:11:46Z	2025 April 23, 02:11:55 PM +07:00	(not specified)	At 2025-04-23T07:11:46Z a user request update of AutoScalingGroup constraints to min: 1, max: 2, desired: 1 changing the desired capacity from 2 to 1. At 2025-04-23T07:11:55Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 2 to 1. At 2025-04-23T07:11:55Z instance i-0879b60cf3b843448 was selected for termination.

The text was updated successfully, but these errors were encountered:

bryantbiggs · 2025-04-23T12:57:57Z

Please familiarize yourself with the service https://docs.aws.amazon.com/eks/latest/userguide/managed-node-update-behavior.html

guidodobboletta · 2025-04-23T13:26:41Z

oh I see. I didn't know that detail. Is there a way to tune it to make user data deployments faster? If not that's fine I was just wondering what was going on

bryantbiggs · 2025-04-25T15:50:00Z

Is there a way to tune it to make user data deployments faster?

Not that I am aware of - however, I haven't come across folks who are making a lot of changes to the user data

bryantbiggs added the question label Apr 23, 2025

bryantbiggs closed this as completed Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASG doesn't respect max size when updating EKS managed nodes #3347

ASG doesn't respect max size when updating EKS managed nodes #3347

guidodobboletta commented Apr 23, 2025

bryantbiggs commented Apr 23, 2025

guidodobboletta commented Apr 23, 2025

bryantbiggs commented Apr 25, 2025

ASG doesn't respect max size when updating EKS managed nodes #3347

ASG doesn't respect max size when updating EKS managed nodes #3347

Comments

guidodobboletta commented Apr 23, 2025

Description

⚠️ Note

Versions

Reproduction Code [Required]

Expected behavior

Actual behavior

bryantbiggs commented Apr 23, 2025

guidodobboletta commented Apr 23, 2025

bryantbiggs commented Apr 25, 2025