Skip to content

Disable EKS auto mode fails #3273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
project0 opened this issue Jan 10, 2025 · 18 comments
Open

Disable EKS auto mode fails #3273

project0 opened this issue Jan 10, 2025 · 18 comments

Comments

@project0
Copy link

project0 commented Jan 10, 2025

Description

Disabling eks auto mode on an existing cluster fails. Apparently there are some related changes and issues that claimed to solve and reverted it again. As all related issues are closed i am have decided to raise a new issues.

Versions

  • Module version [Required]:

  • Terraform version:
    Terraform v1.10.2

  • Provider version(s):

14:44:13.210 STDOUT terraform: Providers required by configuration:
14:44:13.210 STDOUT terraform: .
14:44:13.210 STDOUT terraform: ├── provider[registry.terraform.io/hashicorp/aws] >= 5.9.0
14:44:13.210 STDOUT terraform: ├── module.irsa_external-dns
14:44:13.210 STDOUT terraform: │   └── provider[registry.terraform.io/hashicorp/aws] >= 4.0.0
14:44:13.210 STDOUT terraform: ├── module.karpenter
14:44:13.210 STDOUT terraform: │   └── provider[registry.terraform.io/hashicorp/aws] >= 5.81.0
14:44:13.210 STDOUT terraform: ├── module.vpc
14:44:13.210 STDOUT terraform: │   └── provider[registry.terraform.io/hashicorp/aws] >= 5.46.0
14:44:13.210 STDOUT terraform: ├── module.eks
14:44:13.210 STDOUT terraform: │   ├── provider[registry.terraform.io/hashicorp/aws] >= 5.81.0
14:44:13.210 STDOUT terraform: │   ├── provider[registry.terraform.io/hashicorp/tls] >= 3.0.0
14:44:13.210 STDOUT terraform: │   ├── provider[registry.terraform.io/hashicorp/time] >= 0.9.0
14:44:13.210 STDOUT terraform: │   ├── module.eks_managed_node_group
14:44:13.210 STDOUT terraform: │       ├── provider[registry.terraform.io/hashicorp/aws] >= 5.81.0
14:44:13.210 STDOUT terraform: │       └── module.user_data
14:44:13.210 STDOUT terraform: │           ├── provider[registry.terraform.io/hashicorp/cloudinit] >= 2.0.0
14:44:13.210 STDOUT terraform: │           └── provider[registry.terraform.io/hashicorp/null] >= 3.0.0
14:44:13.210 STDOUT terraform: │   ├── module.fargate_profile
14:44:13.210 STDOUT terraform: │       └── provider[registry.terraform.io/hashicorp/aws] >= 5.81.0
14:44:13.210 STDOUT terraform: │   ├── module.kms
14:44:13.210 STDOUT terraform: │       └── provider[registry.terraform.io/hashicorp/aws] >= 4.33.0
14:44:13.210 STDOUT terraform: │   └── module.self_managed_node_group
14:44:13.210 STDOUT terraform: │       ├── provider[registry.terraform.io/hashicorp/aws] >= 5.81.0
14:44:13.210 STDOUT terraform: │       └── module.user_data
14:44:13.210 STDOUT terraform: │           ├── provider[registry.terraform.io/hashicorp/cloudinit] >= 2.0.0
14:44:13.210 STDOUT terraform: │           └── provider[registry.terraform.io/hashicorp/null] >= 3.0.0
14:44:13.210 STDOUT terraform: ├── module.irsa_argocd
14:44:13.210 STDOUT terraform: │   └── provider[registry.terraform.io/hashicorp/aws] >= 4.0.0
14:44:13.210 STDOUT terraform: └── module.irsa_aws-load-balancer-controller
14:44:13.210 STDOUT terraform:     └── provider[registry.terraform.io/hashicorp/aws] >= 4.0.0
14:44:13.210 STDOUT terraform: Providers required by state:
14:44:13.210 STDOUT terraform:     provider[registry.terraform.io/hashicorp/aws]
14:44:13.210 STDOUT terraform:     provider[registry.terraform.io/hashicorp/time]
14:44:13.210 STDOUT terraform:     provider[registry.terraform.io/hashicorp/tls]

Reproduction Code [Required]

working with auto mode

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.31"

  cluster_name    = local.name
  cluster_version = "1.31"

  # auto mode
  cluster_compute_config = {
    enabled = true
    # see custom node pools in manifests/modules/cluster-aws-eks/
    node_pools = ["system"]
  }

  cluster_endpoint_public_access = true

  vpc_id            = module.vpc.vpc_id
  subnet_ids        = module.vpc.private_subnets
  cluster_ip_family = "ipv6"

  create_cni_ipv6_iam_policy = true
  iam_role_additional_policies = {
    "policy-eks-cluster" = aws_iam_policy.iam_cluster_policy.arn
  }
}

changing to:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.31"

  cluster_name    = local.name
  cluster_version = "1.31"

  # auto mode
  #cluster_compute_config = {
  #  # disable auto mode
  #  enabled = false
  #  # see custom node pools in manifests/modules/cluster-aws-eks/
  ##  node_pools = ["system"]
  #}


  cluster_endpoint_public_access = true

  vpc_id            = module.vpc.vpc_id
  subnet_ids        = module.vpc.private_subnets
  cluster_ip_family = "ipv6"

  create_cni_ipv6_iam_policy = true
  iam_role_additional_policies = {
    "policy-eks-cluster" = aws_iam_policy.iam_cluster_policy.arn
  }


  eks_managed_node_group_defaults = {
    iam_role_additional_policies = {
      AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
    }
  }

  eks_managed_node_groups = {
    system = {
      # https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html
      ami_type       = "BOTTLEROCKET_ARM_64"
      instance_types = ["t4g.large"]
      capacity_type  = "ON_DEMAND"
      min_size       = 1
      max_size       = 2
      desired_size   = 1

      labels = {
        # Used to ensure Karpenter runs on nodes that it does not manage
        "karpenter.sh/controller" = "true"
        "CriticalAddonsOnly"      = "true"
      }

      taints = {
        # The pods that do not tolerate this taint should run on nodes
        CriticalAddonsOnly = {
          key    = "CriticalAddonsOnly"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }
    }
  }

}

Steps to reproduce the behavior:

Expected behavior

no error

Actual behavior

Error: compute_config.enabled, kubernetes_networking_config.elastic_load_balancing.enabled, and storage_config.block_storage.enabled must all be set to either true or false

Terminal Output Screenshot(s)

  │ Error: compute_config.enabled, kubernetes_networking_config.elastic_load_balancing.enabled, and storage_config.block_storage.enabled must all be set to either true or false
  │ 
  │   with module.eks.aws_eks_cluster.this[0],
  │   on .terraform/modules/eks/main.tf line 35, in resource "aws_eks_cluster" "this":
  │   35: resource "aws_eks_cluster" "this" {

Additional context

@colinjohnsonbeyond
Copy link

Just hit this issue myself

@mbmblbelt
Copy link

Same here

There are three structures in the cluster config that are created/modifed as a result of enabling auto mode.

  • kubernetesNetworkConfig
  • computeConfig
  • storageConfig

Prior to enabling auto mode, kubernetesNetworkConfig does not have an elasticLoadBalancing setting and the computeConfig and storageConfig structures do not exist in the cluster config at all.

❯ aws eks describe-cluster --name <cluster name> | jq '.cluster'
{
  "name": <redacted>,
  "arn": <redacted>,
  "createdAt": "2022-11-30T09:31:35.889000-05:00",
  "version": "1.30",
  "endpoint": <redacted>,
  "roleArn": <redacted>,
  "resourcesVpcConfig": {
    "subnetIds": [
      <redacted>
    ],
    "securityGroupIds": [
      <redacted>
    ],
    "clusterSecurityGroupId": <redacted>,
    "vpcId": <redacted>,
    "endpointPublicAccess": true,
    "endpointPrivateAccess": true,
    "publicAccessCidrs": [
      <redacted>
    ]
  },
  "kubernetesNetworkConfig": {
    "serviceIpv4Cidr": <redacted>,
    "ipFamily": "ipv4"
  },
  "logging": {
    "clusterLogging": [
      {
        "types": [
          "api",
          "audit",
          "authenticator"
        ],
        "enabled": true
      },
      {
        "types": [
          "controllerManager",
          "scheduler"
        ],
        "enabled": false
      }
    ]
  },
  "identity": {
    "oidc": {
      "issuer": <redacted>
    }
  },
  "status": "ACTIVE",
  "certificateAuthority": {
    "data": <redacted>
  },
  "platformVersion": "eks.23",
  "tags": {
   <redacted>
  },
  "health": {
    "issues": []
  },
  "accessConfig": {
    "authenticationMode": "API_AND_CONFIG_MAP"
  },
  "upgradePolicy": {
    "supportType": "EXTENDED"
  }
}

After enabling auto mode, the elasticLoadBalancing setting is added to the kubernetesNetworkConfig structure and the computeConfig and storageConfig structures are created/added.

❯ aws eks describe-cluster --name <cluster name> | jq '.cluster'
{
  "name": <redacted>,
  "arn": <redacted>,
  "createdAt": "2022-11-30T09:31:35.889000-05:00",
  "version": "1.30",
  "endpoint": <redacted>,
  "roleArn": <redacted>,
  "resourcesVpcConfig": {
    "subnetIds": [
      <redacted>
    ],
    "securityGroupIds": [
      <redacted>
    ],
    "clusterSecurityGroupId": <redacted>,
    "vpcId": <redacted>,
    "endpointPublicAccess": true,
    "endpointPrivateAccess": true,
    "publicAccessCidrs": [
      <redacted>
    ]
  },
  "kubernetesNetworkConfig": {
    "serviceIpv4Cidr": <redacted>,
    "ipFamily": "ipv4",
    "elasticLoadBalancing": {
      "enabled": true
    }
  },
  "logging": {
    "clusterLogging": [
      {
        "types": [
          "api",
          "audit",
          "authenticator"
        ],
        "enabled": true
      },
      {
        "types": [
          "controllerManager",
          "scheduler"
        ],
        "enabled": false
      }
    ]
  },
  "identity": {
    "oidc": {
      "issuer": <redacted>
    }
  },
  "status": "ACTIVE",
  "certificateAuthority": {
    "data": <redacted>
  },
  "platformVersion": "eks.23",
  "tags": {
   <redacted>
  },
  "health": {
    "issues": []
  },
  "accessConfig": {
    "authenticationMode": "API_AND_CONFIG_MAP"
  },
  "upgradePolicy": {
    "supportType": "EXTENDED"
  },
  "computeConfig": {
    "enabled": true,
    "nodePools": ["system", "general-purpose"]
  },
  "storageConfig": {
    "blockStorage": {
      "enabled": true
    }
  }
}

These structures can not be removed once they have been added so simply commenting out the cluster_compute_config

# cluster_compute_config = {
#     enabled    = true
#     node_pools = ["system", "general-purpose"]
# }

or disabling it

cluster_compute_config = {
    enabled    = false
    node_pools = []
}

results in the module trying to set the storageConfig to null, which is not allowed now that the storageConfig structure exists in the cluster config

- storage_config {
          - block_storage {
              - enabled = false -> null
            }
        }

@reggora-mmatney
Copy link

@bryantbiggs I see that you were attempting to solve this problem in #3253 and had to revert in #3255

Are you still working on this problem? If so, do you have an estimate on when a fix will be released? Thanks

@bryantbiggs
Copy link
Member

This has to be fixed in the provider first

@project0
Copy link
Author

this issue looks pretty similar, but the error message is somehow different: hashicorp/terraform-provider-aws#40582

@bryantbiggs
Copy link
Member

track hashicorp/terraform-provider-aws#41155

lorengordon added a commit to lorengordon/terraform-aws-eks that referenced this issue Feb 22, 2025
The API has a number of constraints that make this a litle difficult,
but it does work.

1.  The node_role_arn cannot be changed without recreating the cluster.
    This is an API limitation, not a terraform provider bug.
    https://docs.aws.amazon.com/eks/latest/APIReference/API_ComputeConfigRequest.html

2.  If node_pools is not empty, then the node_role_arn *must* be provided.

3.  If the node_role_arn is provided, then node_pools must not be empty!

If you enable Auto Mode and set node_pools to a non-empty value, which requires
setting node_role_arn, you cannot later change node_pools to an empty value,
unless you also unset the node_role_arn. But if you unset the node_role_arn,
then the cluster will be recreated due to the first point above.

So, we *can* support disabling auto-mode without recreating a cluster, but only
if node_pools is not empty and if node_role_arn does not change.

Fixes terraform-aws-modules#3273
@lorengordon
Copy link
Contributor

lorengordon commented Feb 22, 2025

I've posted another attempt to address this problem, see #3308. I've confirmed it works to enable and disable Auto Mode, and enable it again. There are some limitations imposed by the API (not the provider), such that you can't change from a non-empty node_pools config to an empty node_pools config. But other than that, works fine.

@lorengordon
Copy link
Contributor

Well I guess the PR doesn't update once closed, but the branch has the fix. You can review the changeset here, instead, master...lorengordon:terraform-aws-eks:fix/compute-config-false-or-null

@cmam-programize
Copy link

@bryantbiggs I get that one should track the issue on the provider, just a question though, how is this module compatible with what the latest published version of the provider documents? I mean that in the docs, it says:

When using EKS Auto Mode compute_config.enabled, kubernetes_network_config.elastic_load_balancing.enabled, and storage_config.block_storage.enabled must *ALL be set to true. Likewise for disabling EKS Auto Mode, all three arguments must be set to false. Enabling EKS Auto Mode also requires that bootstrap_self_managed_addons is set to false

However, in the module's code, this is not respected unless I'm missing something:

BTW, the use case I'm seeing this is when creating a new cluster.

Thanks!

@bryantbiggs
Copy link
Member

What's not respected?

@cmam-programize
Copy link

cmam-programize commented Feb 27, 2025

The code generates the following blocks if auto_mode_enabled is false:

      + compute_config {
          + enabled = false
        }

      + kubernetes_network_config {
          + ip_family         = "ipv4"
          + service_ipv4_cidr = (known after apply)
          + service_ipv6_cidr = (known after apply)
        }
      # No storage_config block at all

I would expect the following, based on what I understand in the provider's doc though:

      compute_config {
          enabled = false
      }

      kubernetes_network_config {
           ...
           elastic_load_balancing {
               enabled = false
           }
      }

      storage_config {
          block_storage {
            enabled = false
          }
      }

Right?

(Edit: Added some missing brackets)

@bryantbiggs
Copy link
Member

Right?

No

@cmam-programize
Copy link

Right?

No

I see, would you share (or point towards) why though? Am I missing something in the provider's doc or in this module's implementation?

@lorengordon
Copy link
Contributor

lorengordon commented Mar 13, 2025

Right?

No

I see, would you share (or point towards) why though? Am I missing something in the provider's doc or in this module's implementation?

@cmam-programize I believe you actually are correct, and that is exactly why my changeset works, where the current release and prior attempted fixes fail.

@MDBeudekerCN
Copy link

I wanted to add this in my own terraform setup to just add a variable to enable/ disable auto mode,

Like this

  # Enable EKS auto
  cluster_compute_config = var.eks_auto_enabled ? {
    enabled    = true
    node_pools = ["general-purpose", "system"]
    } : {
    enabled    = false
    node_pools = []
  }

So when I set var.eks_auto_enabled to false on an existing cluster I get this terraform plan

[
  {
    "enabled": false,
    "node_pools": null,
    "node_role_arn": null
  }
]

I get a similar sort of error

│ Error: creating EKS Cluster (cluster-01): operation error EKS: CreateCluster, https response error StatusCode: 400, RequestID: , InvalidParameterException: For EKS Auto Mode, please ensure that all required configs, including computeConfig, kubernetesNetworkConfig, and blockStorage are all either fully enabled or fully disabled.
│ 
│   with module.eks.aws_eks_cluster.this[0],
│   on .terraform/modules/eks/main.tf line 35, in resource "aws_eks_cluster" "this":
│   35: resource "aws_eks_cluster" "this" {
│ 

Even if I create a cluster with a managed nodepool with EKS enabled False, things start to break, sadly.

It feels like AWS is pushing EKS Auto but in the current state it is giving me a bit of a headache

Will keep a sharp eye on the provider issue though 👀

@swirle13
Copy link

@MDBeudekerCN I had a similar solution that unfortunately also caused me to run into this issue. I, too, set up a variable to enable/disable auto mode which was implemented like this:

  # Enable EKS auto
  cluster_compute_config = {
    enabled    = var.eks_auto_mode
    node_pools = var.eks_auto_mode ? ["general-purpose", "system"] : []
  }

Going to try updating our private version of this module with the fixes mentioned by @lorengordon just above your comment

@lorengordon
Copy link
Contributor

lorengordon commented Mar 25, 2025

@swirle13 Be sure to read the full message in my last commit on that changeset. It describes a current limitation of the provider resource, but otherwise does work fine for the use case you mentioned.

Edit: Actually that commit message is a little outdated. I tested a few things in the console, and posted the findings to the provider issue ...

hashicorp/terraform-provider-aws#40582 (comment)

Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants