-
Notifications
You must be signed in to change notification settings - Fork 881
Warn (or prevent) volumes from being deleted unintentionally #3386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An alternative idea to solve this would be to introduce the concept of "workspace features". This could be a relatively flexible way to introduce new features without risk of breakage. If we take Scenario 1 for example:
If we take Scenario 2:
Alternatively, if we implement immutable volume names, then Scenario 2 would look like:
|
I wonder if the names expose a larger problem with our usage of Terraform. There isn't a sure-fire way to ensure a persistent volume isn't deleted unless we plan, and prompt the user to confirm the action. That'd be my preferred route - when you update a template, explicitly prompt the user how this will affect the infrastructure. |
I think that could be an acceptable solution to the issue. I do think the deletion of a volume needs to be prominent though, if we train users that the output of "plan" (e.g. update workspace) is usually safe, eventually they will start ignoring it (this is my hunch anyway) and won't notice the potential destructive action. How would planning work for different states of a workspace? For instance, do we plan start or stop for a workspace depending on its state? What if the we only plan stop, but start has the destructive action? Can we plan both? |
Coder templates are a powerful abstraction that means that ordinary users don't usually need to care what is happening in the infrastructure. They press Start and get a workspace. In complex scenarios, end users will likely be unable to fully understand infrastructure level information because it is deliberately hidden from them by their platform teams. I also think we should try to not give template authors a loaded gun that they can shoot themselves in the foot with. In my mind, this means we either shouldn't pass in any unstable names/identifiers at all (i.e. deprecate |
I agree with that take @spikecurtis, what Terraform does (or the resulting state) is only really meaningful to operators. I see no reason that an end user would even want to understand what is going on (in most cases) and this would somewhat take away from the usefulness of showing
Disallowing If we don't make it hard (or rather extremely hard) to lose data -- data will be lost. Ultimately it won't be the template authors fault, it will be "Coder lost my/our data". I talked a bit with @Emyrk yesterday and he pointed out that perhaps workspace parameters (#802) could somehow be utilized for this. Perhaps a volume name could be stored as a workspace parameter and the template format only utilized on workspace creation. It could still be editable on a per-workspace basis, but it could be treated special (i.e. warning: data loss imminent). |
Terraform's prevent_destroy meta-argument was created as a "catch-all" to prevent an important resource from being destroyed. There is also the ignore_changes meta-argument that ensures the resource will not update even if there were changes to a variable/data source/output. I agree with a more comprehensive approach that lets a template admin indicate "this resource should not be deleted" instead of entirely relying on the admin to use best practices with parameters, and naming conventions (use uid instead of username, etc). We might not be able to predict all the ways a Leveraging We also have a way to apply arbitrary Coder-specific data to resources with coder_metadata. Perhaps it could warn the user (#3384) if delete is about to occur (or prevent it altogether) if we add a
I really like this solution for naming-related changes! I know other changes might also destroy a resource before re-building. For example, setting a Kubernetes volume size to a lower-than-previous value.
This seems like a good use case for Terraform's ignore_changes meta argument, to ensure the volume name does not change even if there were changes to the data source. I do think we'll need to have other stopgaps (see above) to ensure other changes don't result in data loss. |
Thanks for your research @bpmct! The It might still be problematic in the Another option discussed with @deansheather would be to use immutable variables, say we had a new resource named resource "coder_immutable_value" "home_volume_name" {
value = "coder-${resource.coder_workspace.me.owner}-${resource.coder_workspace.me.name}-home"
}
resource "docker_volume" "home_volume" {
name = "${resource.coder_immutable_value.home_volume_name.value}"
} Both options still require our templates to set best practices, and template authors to follow them (which is unfortunate). I was hoping we'd be able to leverage the output from Terraform ( This could work fine with metadata attached to the volume though, informing Coder that it must be protected. |
I think #3663 partially addresses this, but think we should think we should have a more declarative way to "protect" resources (prevent_destroy or via metadata). |
How would |
Yeah, it'd definitely be an improvement, but I'm not very happy about the fact operator error can still result in data loss (i.e. rename the volume name format, add prefix, etc.)
I'm not sure how we'd specifically do that, I just have some vague ideas. For instance maybe we could add a |
I've been thinking about a potential solution for this that allows guiding template authors and informing users. This solution would introduce a new resource,
resource "docker_volume" "home_volume" {
name = "coder-${data.coder_workspace.me.owner}-${data.coder_workspace.me.name}-home"
# Added automagically by coder_protected_volume or enforced via checks
# when validating template or provisoning workspace.
lifecycle {
prevent_destroy = true
ignore_changes = [name]
}
}
resource "coder_protected_volume" "home_volume" {
volume_id = docker_volume.home_volume.id
unique = ["name"] # Optional? Some resources may not require this.
} Since I believe this use-case (protecting user data) could be unique enough to warrant the addition of the explicit resource "coder_protected_resource" "home_volume" {
resource_id = docker_volume.home_volume.id
type = volume # Supported types e.g. [volume, generic]
unique = ["name"]
} Questions I don't (yet) have the answer to:
Thoughts @bpmct @kylecarbs? |
I'd be in favor of a more generic resource that doesn't mention "volume" since I may want to protect an entire VM, dedicated machine, API key instead of a volume 👍🏼
What do you think about using resource "coder_metadata" "volume" {
resource_id = resource.docker_volume.homedir
protected = true
unique = ["name"]
} type = volume # Supported types e.g. [volume, generic]` Would the # Added automagically by coder_protected_volume or enforced via checks
# when validating template or provisoning workspace.
lifecycle {
prevent_destroy = true
ignore_changes = [name]
item {
key = "whatever"
value = "here"
}
} What is the rationale for introducing a new resource type vs authors using generic Terraform My understanding is that the resource Without a unique resource type, is it difficult for Coder to recognize which values are unique + warn admins if their templates aren't protected? I don't know we parse the Terraform, so genuinely don't know.
I think it's fine to surface a warning here. My main concern is data loss which I think |
It can probably work, but I wanted to differentiate between these to keep metadata cleaner and to make a difference between "attaching information" vs "creating a constraint".
Yes, exactly so (and also the user). We can't easily know if a resource is a volume or not otherwise (we could program in knowledge for common providers, but that would have its limitations).
The rationale is that if an author adds a Regarding putting variables or If I'm not entirely mistaken, manual parsing of a
To my understanding, yes. The problem is that we don't know the meaning of arguments/values for different providers. For instance, |
This issue is becoming stale. In order to keep the tracker readable and actionable, I'm going close to this issue in 7 days if there isn't more activity. |
I'd really love to see this implemented. Several issues point at this issue, but it's closed as not planned due to going stale :( |
Scenario 1: Currently there is one way for a volume to be deleted unintentionally (with #3000 there are two):
Scenario 2: When #3000 is implemented, our current users will run into this scenario:
To avoid #3000, template authors should move to using
data.coder_workspace.me.id
instead ofdata.coder_workspace.me.name
, which will prevent volume deletion due to a naming change. However, the migration path will mean that all existing coder workspaces will lose their volumes either due to the former case (rename in template + update), or due to workspace rename.There are a few ways we could dampen this impact:
It could be argued that renaming volumes in the template is an operator error and we shouldn't care about it. On the flip-side, operator error or not, it's something we could've guarded against and prevented potential loss in developer productivity.
I think option 4 would be best, as it lets us guard against data loss in all cases. It's also the one requiring least changes to our code to handle the edge cases.
At minimum, we should implement 1 and 2 so that operators and users are aware of the impact. It'd be great to have 3 too in this case but not mandatory if we consider all our users should migrate to using IDs.
I think this is something of a feature-bug, so I've labeled it as such. Depending on solutions it will also require CLI and frontend work (but I've omitted those tags for now).
The text was updated successfully, but these errors were encountered: