Skip to content

docs: add oom/ood to notifications #16582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions docs/admin/monitoring/notifications/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,14 @@ These notifications are sent to the workspace owner:

### User Events

These notifications sent to users with **owner** and **user admin** roles:
These notifications are sent to users with **owner** and **user admin** roles:

- User account created
- User account deleted
- User account suspended
- User account activated

These notifications sent to users themselves:
These notifications are sent to users themselves:

- User account suspended
- User account activated
Expand All @@ -48,6 +48,8 @@ These notifications are sent to users with **template admin** roles:

- Template deleted
- Template deprecated
- Out of memory (OOM) / Out of disk (OOD)
- [Configure](#configure-oomood-notifications) in the template `main.tf`.
- Report: Workspace builds failed for template
- This notification is delivered as part of a weekly cron job and summarizes
the failed builds for a given template.
Expand All @@ -63,6 +65,16 @@ flags.
| ✔️ | `--notifications-method` | `CODER_NOTIFICATIONS_METHOD` | `string` | Which delivery method to use (available options: 'smtp', 'webhook'). See [Delivery Methods](#delivery-methods) below. | smtp |
| -️ | `--notifications-max-send-attempts` | `CODER_NOTIFICATIONS_MAX_SEND_ATTEMPTS` | `int` | The upper limit of attempts to send a notification. | 5 |

### Configure OOM/OOD notifications

You can monitor out of memory (OOM) and out of disk (OOD) errors and alert users
when they overutilize memory and disk.

This can help prevent agent disconnects due to OOM/OOD issues.

To enable OOM/OOD notifications on a template, follow the steps in the
[resource monitoring guide](../../templates/extending-templates/resource-monitoring.md).

## Delivery Methods

Notifications can currently be delivered by either SMTP or webhook. Each message
Expand Down Expand Up @@ -135,7 +147,7 @@ for more options.

After setting the required fields above:

1. Setup an account on Microsoft 365 or outlook.com
1. Set up an account on Microsoft 365 or outlook.com
1. Set the following configuration options:

```text
Expand Down
47 changes: 47 additions & 0 deletions docs/admin/templates/extending-templates/resource-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Resource monitoring

Use the
[`resources_monitoring`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#resources_monitoring-1)
block on the
[`coder_agent`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent)
resource in our Terraform provider to monitor out of memory (OOM) and out of
disk (OOD) errors and alert users when they overutilize memory and disk.

This can help prevent agent disconnects due to OOM/OOD issues.

You can specify one or more volumes to monitor for OOD alerts.
OOM alerts are reported per-agent.

## Prerequisites

Notifications are sent through SMTP.
Configure Coder to [use an SMTP server](../../monitoring/notifications/index.md#smtp-email).

## Example

Add the following example to the template's `main.tf`.
Change the `90`, `80`, and `95` to a threshold that's more appropriate for your
deployment:

```hcl
resource "coder_agent" "main" {
arch = data.coder_provisioner.dev.arch
os = data.coder_provisioner.dev.os
resources_monitoring {
memory {
enabled = true
threshold = 90
}
volume {
path = "/volume1"
enabled = true
threshold = 80
}
volume {
path = "/volume2"
enabled = true
threshold = 95
}
}
}
```
5 changes: 5 additions & 0 deletions docs/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,11 @@
"description": "Display resource state in the workspace dashboard",
"path": "./admin/templates/extending-templates/resource-metadata.md"
},
{
"title": "Resource Monitoring",
"description": "Monitor resources in the workspace dashboard",
"path": "./admin/templates/extending-templates/resource-monitoring.md"
},
{
"title": "Resource Ordering",
"description": "Design the UI of workspaces",
Expand Down
Loading