Skip to content

bug: Google Cloud VM creates fine in coder create but fails with same template immediately in UI - then bricked #3621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sharkymark opened this issue Aug 22, 2022 · 6 comments

Comments

@sharkymark
Copy link
Contributor

Version: v0.8.5+95f26f7

I have verified I can create a workspace fine using coder create but if I use the same template and attempt to create in the UI, it fails immediately.

The workspace is bricked; delete does not work; I have to manually (after SSHing into the Coder host):

  1. sudo -u postgres psql authenticate to Postgres DB
  2. \l coder connect to coder database
  3. lookup the workspace ID using the name I used e.g., select * from workspaces where name like '%workspacename%';
  4. delete the workspace row e.g., delete from workspaces where id='guid from above select';
  5. Refresh the UI, and the workspace is gone

screenshot of failed workspace
image

screenshot of successful local creation with CLI
image
image

@ericpaulsen Maybe this is what you ran into. Please try in CLI, then UI.

@sharkymark
Copy link
Contributor Author

Template using 0.4.9 Coder Terraform provider

@ericpaulsen
Copy link
Member

yes - I've experienced this, and I've just reproduced the behavior. additionally, there are no logs shown when pulling up the logs page.

Screen Shot 2022-08-22 at 9 39 03 AM

@kylecarbs
Copy link
Member

@sharkymark I'm not able to reproduce this. @ericpaulsen can you add steps then re-open this issue if it still exists?

@kylecarbs kylecarbs closed this as not planned Won't fix, can't repro, duplicate, stale Aug 24, 2022
@ericpaulsen
Copy link
Member

@kylecarbs I'm still able to reproduce on 0.8.6. no logs are shown upon the immediate failure. here's the template:

terraform {
  required_providers {
    coder = {
      source  = "coder/coder"
      version = "0.4.2"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 4.15"
    }
  }
}

variable "project_id" {
  description = "Which Google Compute Project should your workspace live in?"
}

variable "zone" {
  description = "What region should your workspace live in?"
  default     = "us-central1-a"
  validation {
    condition     = contains(["northamerica-northeast1-a", "us-central1-a", "us-west2-c", "europe-west4-b", "southamerica-east1-a"], var.zone)
    error_message = "Invalid zone!"
  }
}

data "template_file" "sa_token" {
  template = file("gcp-default-key.json")
}

provider "google" {
  zone    = var.zone
  project = var.project_id
  credentials = "${file("gcp-default-key.json")}"
}

data "google_compute_default_service_account" "default" {
}

variable "dotfiles_uri" {
  description = <<-EOF
  Dotfiles repo URI (optional)

  see https://dotfiles.github.io
  EOF
  default = ""
}

data "coder_workspace" "me" {
}

resource "google_compute_disk" "root" {
  name  = "coder-${data.coder_workspace.me.owner}-${data.coder_workspace.me.name}-root"
  type  = "pd-ssd"
  zone  = var.zone
  #image = "debian-cloud/debian-9"
  image = "projects/coder-demo-1/global/images/coder-ubuntu-2004-lts-with-docker-engine"
  lifecycle {
    ignore_changes = [image]
  }
}

resource "coder_agent" "dev" {
  auth = "google-instance-identity"
  arch = "amd64"
  os   = "linux"
  startup_script = <<EOT
#!/bin/bash

# use coder CLI to clone and install dotfiles

coder dotfiles -y ${var.dotfiles_uri} 2>&1 > ~/dotfiles.log

# install and start code-server
curl -fsSL https://code-server.dev/install.sh | sh
code-server --auth none --port 13337 &

EOT
}  

# code-server
resource "coder_app" "code-server" {
  agent_id      = coder_agent.dev.id
  name          = "code-server"
  icon          = "/icon/code.svg"
  url           = "http://localhost:13337?folder=/home/coder"
  relative_path = true  
}

resource "google_compute_instance" "dev" {
  zone         = var.zone
  count        = data.coder_workspace.me.start_count
  name         = "coder-${data.coder_workspace.me.owner}-${data.coder_workspace.me.name}"
  machine_type = "e2-micro"
  network_interface {
    network = "default"
    access_config {
      // Ephemeral public IP
    }
  }
  boot_disk {
    auto_delete = false
    source      = google_compute_disk.root.name
  }
  service_account {
    email  = data.google_compute_default_service_account.default.email
    scopes = ["cloud-platform"]
  }
  # The startup script runs as root with no $HOME environment set up, which can break workspace applications, so
  # instead of directly running the agent init script, setup the home directory, write the init script, and then execute
  # it.
  metadata_startup_script = <<EOMETA
#!/usr/bin/env sh
set -eux pipefail

mkdir /root || true
cat <<'EOCODER' > /root/coder_agent.sh
${coder_agent.dev.init_script}
EOCODER
chmod +x /root/coder_agent.sh

export HOME=/root
/root/coder_agent.sh

EOMETA
}

@BrunoQuaresma
Copy link
Collaborator

@ericpaulsen your fix on #3743 fixed this issue?

@ericpaulsen
Copy link
Member

@BrunoQuaresma yes! I will close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants