Skip to content

Release 18+ Upgrade Guide Breaks Existing Deployments #1744

@jseiser

Description

@jseiser

Description

Attempted to follow the upgrade guide to get to 18+. Our Terraform deployments generally run from a Jenkins worker pod, that exists ON the same cluster that we are upgrading. The pod has a service account on it, using the IRSA setup which grants it access to the cluster.
This all works/worked before the upgrade.

Reproduction

Attempt to follow the upgrade guide for 18.

Code Snippet to Reproduce

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.0.1"

  cluster_name    = format("eks-%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env)
  cluster_version = var.cluster_version

  subnet_ids = data.aws_subnet_ids.private.ids
  vpc_id     = data.terraform_remote_state.vpc.outputs.vpc_id

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = false

  cluster_security_group_additional_rules = {
    admin_access = {
      description = "Admin ingress to Kubernetes API"
      cidr_blocks = [data.terraform_remote_state.vpc.outputs.vpc_cidr_block]
      protocol    = "tcp"
      from_port   = 443
      to_port     = 443
      type        = "ingress"
    }
  }

  eks_managed_node_group_defaults = {
    ami_type                   = "AL2_x86_64"
    disk_size                  = var.node_group_default_disk_size
    enable_bootstrap_user_data = true
    pre_bootstrap_user_data    = templatefile("${path.module}/templates/userdata.tpl", {})
    desired_size               = lower(var.platform_env) == "prod" ? 3 : 2
    max_size                   = lower(var.platform_env) == "prod" ? 6 : 3
    min_size                   = lower(var.platform_env) == "prod" ? 3 : 1
    instance_types             = lower(var.platform_env) == "prod" ? var.prod_instance_types : var.dev_instance_types
    capacity_type              = "ON_DEMAND"
    additional_tags = {
      Name = format("%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env)
    }
    update_config = {
      max_unavailable_percentage = 50
    }
    update_launch_template_default_version = true
    create_launch_template                 = true
    create_iam_role                        = true
    iam_role_name                          = format("iam-%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env)
    iam_role_use_name_prefix               = false
    iam_role_description                   = "EKS managed node group Role"
    iam_role_tags                          = local.tags
    iam_role_additional_policies = [
      "arn:aws-us-gov:iam::aws:policy/AmazonSSMManagedInstanceCore"
    ]
  }
  eks_managed_node_groups = {
    private1 = {
      subnet_ids = [tolist(data.aws_subnet_ids.private.ids)[0]]
    }
    private2 = {
      subnet_ids = [tolist(data.aws_subnet_ids.private.ids)[1]]
    }
    private3 = {
      subnet_ids = [tolist(data.aws_subnet_ids.private.ids)[2]]
    }
  }

  cluster_enabled_log_types              = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  cloudwatch_log_group_retention_in_days = 7

  enable_irsa = true

  cluster_encryption_config = [
    {
      provider_key_arn = aws_kms_key.eks.arn
      resources        = ["secrets"]
    }
  ]

  tags = merge(
    local.tags,
    {
      "Name"        = format("eks-%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env),
      "EKS_VERSION" = var.cluster_version
    }
  )

}

resource "null_resource" "patch" {
  triggers = {
    kubeconfig = base64encode(local.kubeconfig)
    cmd_patch  = "kubectl patch configmap/aws-auth --patch \"${local.aws_auth_configmap_yaml}\" -n kube-system --kubeconfig <(echo $KUBECONFIG | base64 --decode)"
  }

  provisioner "local-exec" {
    interpreter = ["/bin/bash", "-c"]
    environment = {
      KUBECONFIG = self.triggers.kubeconfig
    }
    command = self.triggers.cmd_patch
  }
}

locals

locals {

  kubeconfig = yamlencode({
    apiVersion      = "v1"
    kind            = "Config"
    current-context = "terraform"
    clusters = [{
      name = module.eks.cluster_id
      cluster = {
        certificate-authority-data = module.eks.cluster_certificate_authority_data
        server                     = module.eks.cluster_endpoint
      }
    }]
    contexts = [{
      name = "terraform"
      context = {
        cluster = module.eks.cluster_id
        user    = "terraform"
      }
    }]
    users = [{
      name = "terraform"
      user = {
        token = data.aws_eks_cluster_auth.eks.token
      }
    }]
  })

  aws_auth_configmap_yaml = <<-EOT
  ${chomp(module.eks.aws_auth_configmap_yaml)}
      - rolearn: arn:${var.iam_partition}:iam::${data.aws_caller_identity.current.account_id}:role/role-gitlab-runner-eks-${var.platform_env}
        username: gitlab:{{SessionName}}
        groups:
          - system:masters
      - rolearn: arn:${var.iam_partition}:iam::${data.aws_caller_identity.current.account_id}:role/role-jenkins-worker-eks-${var.platform_env}
        username: jenkins:{{SessionName}}
        groups:
          - system:masters
      - rolearn: arn:${var.iam_partition}:iam::${data.aws_caller_identity.current.account_id}:role/AWSReservedSSO_AdministratorAccess_f50fcd43baf05a89
        username: AWSAdministratorAccess:{{SessionName}}
        groups:
          - system:masters
  EOT
}

Expected behavior

Module will run to completion

Actual behavior

Current aws-auth

sh-4.2$ kubectl get configmap aws-auth -n kube-system -o yaml
apiVersion: v1
data:
  mapAccounts: |
    []
  mapRoles: |
    - "groups":
      - "system:bootstrappers"
      - "system:nodes"
      "rolearn": "arn:aws-us-gov:iam:::role/eks-ops-eks-dev20211104211936784200000009"
      "username": "system:node:{{EC2PrivateDNSName}}"
    - "groups":
      - "system:masters"
      "rolearn": "arn:aws-us-gov:iam:::role/role-gitlab-runner-eks-dev"
      "username": "gitlab-runner-dev"
    - "groups":
      - "system:masters"
      "rolearn": "arn:aws-us-gov:iam:::role/role-jenkins-worker-eks-dev"
      "username": "jenkins-dev"
  mapUsers: |
    - "groups":
      - "system:masters"
      "userarn": "arn:aws-us-gov:iam:::user/justin.seiser"
      "username": "jseiser"
kind: ConfigMap

The SA on the pod, that terraform is running from.

sh-4.2$ kubectl get sa jenkins-worker -n jenkins -o yaml
apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws-us-gov:iam:::role/role-jenkins-worker-eks-dev

The error terraform returns

module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]

Error: configmaps "aws-auth" is forbidden: User "system:serviceaccount:jenkins:jenkins-worker" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

Additional context

I do not doubt that im missing something, but that something does not appear to be covered in the documentation that I can find.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions