Skip to content

Bug: Obscure error when using Docker template #1198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ammario opened this issue Apr 27, 2022 · 16 comments
Closed

Bug: Obscure error when using Docker template #1198

ammario opened this issue Apr 27, 2022 · 16 comments
Assignees
Labels
api Area: HTTP API
Milestone

Comments

@ammario
Copy link
Member

ammario commented Apr 27, 2022

Steps to reproduce

Environment: M1 MacOS, v0.0.0-devel+8661f92

  1. Turn on Docker Daemon
  2. Start coder with coder server --dev
  3. Create a default Docker template via templates init and templates create

Observe the obscure error on the client side:
image

Observe the obscure error in the server log:
image

Also: we emit job has already been marked as failed which is redundant with failing running job.


This issue is important since we plan for the Docker-based install to be default.

@ammario ammario changed the title Bug: Obscure error when Docker daemon is off Bug: Obscure error when using Docker template Apr 27, 2022
@misskniss misskniss added this to the Community MVP milestone May 2, 2022
@ammario ammario modified the milestones: Community MVP, V2 Beta May 4, 2022
@misskniss
Copy link

Same with this one @ammario With this sprint so packed to get to Beta and that we have a sprint before the switch happens, I am going to align this with the first sprint of community. That does not mean it can't be picked up before that though. Let me know if you still want to inject this into Beta. CC @tjcran

@misskniss misskniss modified the milestones: V2 Beta, Community MVP May 5, 2022
@misskniss misskniss added the api Area: HTTP API label May 5, 2022
@misskniss
Copy link

Hey team! Please add your planning poker estimate with ZenHub @jsjoeio @bpmct

@ammario
Copy link
Member Author

ammario commented May 5, 2022

I think this needs some discovery work before it can be estimated. The bug could be a quick 20m fix or a multi-day long architectural revision. It's impossible to tell from the error which is closer.

@bpmct bpmct removed their assignment May 5, 2022
@bpmct
Copy link
Member

bpmct commented May 5, 2022

Removed myself from this one, it's a bit outside my wheelhouse

@jsjoeio
Copy link
Contributor

jsjoeio commented May 6, 2022

I'm also not too sure, I took a wild guess but don't have a ton of context on this. I would probably groom this with the Backend team.

@bpmct
Copy link
Member

bpmct commented May 9, 2022

I ran into this while troubleshooting an issue with Docker on armv7. On my end, the issue was an architecture mismatch between the system (arm) and the enterprise-image (amd64).

When running the image in Docker directly, the error is clear:

benpotter@elijo:~ $ docker run codercom/enterprise-base:ubuntu
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm/v7) and no specific platform was requested
standard_init_linux.go:228: exec user process caused: exec format error

However, in Coder(/Terraform?), this error is not displayed:

docker_container.workspace[0]: Creation errored after 2s
Error: container exited immediately

I did a tiny bit of digging, but didn't try directly provisioning in TF to see if more errors/details are shared. I agree this is a confusing debugging experience

@im-coder-lg
Copy link
Contributor

Just a recommendation (feature): add docker logs to Coder Terraform logs so that you can debug easily than ever.

@misskniss
Copy link

Please add your planning poker estimate with ZenHub @kylecarbs

@misskniss
Copy link

@johnstcn and @mafredri timebox this investigation to see if you can find root cause and get an estimate for points and then we will evaluate if this is still a switch blocker

@misskniss
Copy link

Hey team! Please add your planning poker estimate with ZenHub @johnstcn @mafredri

@johnstcn
Copy link
Member

johnstcn commented May 18, 2022

Paired with @mafredri -- spent a couple hours and was able to partially reproduce the issue, although there have been several improvements since then.

Also noting I am able to run through the docker-local flow on my M1 mac on latest master (using Colima instead of Docker Desktop). So I'm not sure if this can still be called a switch-blocker.

Steps to reproduce

  • Input invalid Terraform to cause the import job to fail (for example, reference a missing variable or provide an invalid parameter).
  • Terraform v1.1.9 has some fixes for CLI output (we currently embed 1.1.7).
  • We didn't have arm64 scripts for coder agent at the time; I believe this might have been the root cause of the missing variable in the terraform input.
  • job has already been marked as failed doesn't appear to happen any more.
  • We do now have the Terraform output in the CLI so that at least provides more detail
$ coder templates create
> Create and upload "/tmp/docker-local"? (yes/no)  
✔ Queued [28ms]
✔ Setting up [3ms]
✔ Adding README.md... [0ms]
✔ Parse parameters [2ms]
⧗  Detecting persistent resources 
  Terraform 1.1.7
  Error: Unsupported attribute
  This object has no argument, nested block, or exported attribute named "init_scripttt". Did you mean "init_script"?
✘ Detecting persistent resources [4694ms]
template import provision for start: recv import provision: plan terraform: exit status 1

Fixes

Notes

  • Unable to reproduce issue on rPi 3 with latest version (arm64)
  • Unable to reproduce with auto-installed terraform on version 8661f92 -- coder server --dev fails to start with the error install terraform: unexpected Content-Type: "application/vnd+hashicorp.releases-api.v0+json"

CC @tjcran @misskniss

@jsjoeio
Copy link
Contributor

jsjoeio commented May 20, 2022

@johnstcn chiming in to see if I can help move this forward. Should we turn your other fixes(first 3) into issues and close this?

@johnstcn
Copy link
Member

@jsjoeio I don't think that much overhead is necessary! I was thinking of just attaching PRs to this ticket for each of those.

@dwahler
Copy link
Contributor

dwahler commented May 24, 2022

One thing that helps a bit with troubleshooting this kind of problem: the default Docker logging driver deletes logs as soon as a container is deleted. But if you use the journald driver instead, you can view logs even after container deletion, using journalctl -u docker.service.

@johnstcn
Copy link
Member

Ended up making a separate issue for provisioner job output, as we need to ensure adminstrators can set its verbosity independently #1733

@johnstcn
Copy link
Member

Closing this issue out, as:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Area: HTTP API
Projects
None yet
Development

No branches or pull requests

8 participants