Skip to content

feat: improve resources_monitoring for OOM & OOD monitoring #16241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Feb 4, 2025

Conversation

defelmnq
Copy link
Contributor

As requested for this issue we need to have a new resource resources_monitoring in the agent.

It needs to be parsed from the provisioner and inserted into a new db table.

@defelmnq defelmnq self-assigned this Jan 23, 2025
@defelmnq defelmnq linked an issue Jan 23, 2025 that may be closed by this pull request
@defelmnq defelmnq changed the title feat: work on resources_monitoring for OOM & OOD monitoring feat: improve resources_monitoring for OOM & OOD monitoring Jan 27, 2025
@defelmnq defelmnq force-pushed the agent_resource_monitoring branch from 0b76f49 to 68b022f Compare January 30, 2025 21:35
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly fine to me, but missing some test cases around the unhappy code paths added to ConvertState().

Comment on lines 1 to 2
DROP TABLE IF EXISTS workspace_agent_memory_resource_monitors;
DROP TABLE IF EXISTS workspace_agent_volume_resource_monitors;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obligatory reminder to double-check migration number before merging!

@@ -0,0 +1,14 @@
CREATE TABLE workspace_agent_memory_resource_monitors (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider...

CREATE TYPE workspace_agent_resource_monitor AS ENUM ('memory', 'volume');

CREATE TABLE workspace_agent_resource_monitors (
  id uuid not null primary key,
  agent_id uuid not null foreign key ...,
  type workspace_agent_resource_monitor not null,
  enabled bool not null,
  threshold integer not null,
  path text not null,
  created_at ...
)

This would obviously be less ideal if we plan to add other attributes later.

AuthorizeMap: map[bool][]hasAuthSubjects{
true: {owner},
false: {
memberMe, orgMemberMe, otherOrgMember,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not authorizing memberMe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure how to configure RBAC for my needs - I'll let the PR opened until I have enough in the agent one so I can validate RBAC are well configured if good for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need memberMe and orgMemberMe here, although it will become apparent once we add an API endpoint or similar that tries to read this data.

Adding or extending some existing tests should make it apparent what we need here.

@defelmnq
Copy link
Contributor Author

defelmnq commented Feb 3, 2025

Fixes :

  • Added foreign key & primary key @ db.
  • changed RBAC entities to have one for the resources_monitor instead of one for memory and one for volume.
  • Removed validation that will be done in tf-provider instead.
  • Improved testing coverage on provisionerserver

enabled boolean NOT NULL,
threshold integer NOT NULL,
created_at timestamp with time zone NOT NULL,
PRIMARY KEY (agent_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhmm, I don't think this will work? The primary key needs to be unique.

https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-PRIMARY-KEYS

A primary key constraint indicates that a column, or group of columns, can be used as a unique identifier for rows in the table.


In general, when you're creating database migrations, it's good practice to perform a few basic operations against them to validate that your design works.

I see what you're trying to do, but I think you're try to be overly efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to me how we can have duplicates here ? I maybe am missing some logic.

As we authorize only one resource monitor of type memory here - and the resource monitor is linked to an agent id - I feel like it is doing what we want ?

If this one breaks it means that there is multiple memory resource monitors declared in the terraform configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhmm, fair point. It's true that each agent should only have one memory resource monitor.

Please leave a comment in the migration explaining this constraint for future readers without context 👍
I should not review PRs before bed 🫤

@defelmnq defelmnq requested a review from dannykopping February 3, 2025 22:21
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the existing tests pass, I'm in favour of merging this and addressing further issues as they arise (unless @dannykopping has objections). The existing PR has become large enough as well (discounting the changes to generated resources) that further changes here may become unwieldy to review.

Don't forget to check the migration number before merging!

@defelmnq defelmnq merged commit 7cbd77f into main Feb 4, 2025
33 of 35 checks passed
@defelmnq defelmnq deleted the agent_resource_monitoring branch February 4, 2025 17:45
@github-actions github-actions bot locked and limited conversation to collaborators Feb 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Template configuration - new coder_agent resource
3 participants