Skip to content

UI memory leak on a workspace detail page #15921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
raphaelfff opened this issue Dec 18, 2024 · 48 comments
Closed

UI memory leak on a workspace detail page #15921

raphaelfff opened this issue Dec 18, 2024 · 48 comments
Assignees
Labels
customer-reported Bugs reported by enterprise customers. Only humans may set this. s2 Broken use cases or features (with a workaround). Only humans may set this. site Area: frontend dashboard

Comments

@raphaelfff
Copy link

I dont have much details, other than when opening the workspace details page, the page freezes up, and eventually crashes (the Oh snap! page)

That makes it pretty tricky to capture the issue...

@coder-labeler coder-labeler bot added bug risk Prone to bugs needs-triage Issue that require triage labels Dec 18, 2024
@raphaelfff
Copy link
Author

I managed to capture a heapprofile, but its too large to send here

@raphaelfff
Copy link
Author

Image Seems like its infinite rendering some react component ?

@matifali
Copy link
Member

@raphaelfff What is your Coder version?

@raphaelfff
Copy link
Author

raphaelfff commented Dec 18, 2024

2.18.1

@raphaelfff
Copy link
Author

this seems to be happening on one specific workspace only, other workspaces are fine

@matifali
Copy link
Member

Is there anything specific on that workspace? Could you share a template that can reproduce the workspace causing memory leak?

@matifali matifali added the need-help Assign this label prompts an engineer to check the issue. Only humans may set this. label Dec 18, 2024
@raphaelfff
Copy link
Author

other ws with same template work fine

@raphaelfff
Copy link
Author

Almost sounds like something derived from the telemetry causing this ? (could it be the build timeline ? its a ws created under 2.17)

@mcm
Copy link

mcm commented Dec 22, 2024

I've been having this issue too, and it seems to happen more often (or perhaps faster?) on workspaces that have been running longer or have more logs in their history. It just happened to me (page froze, eventually "Aw, snap!" page) on a workspace that had been up for 180h, I restarted it, and now the page is fairly snappy.

@matifali matifali removed the needs-triage Issue that require triage label Dec 23, 2024
@archef2000
Copy link

archef2000 commented Jan 2, 2025

I also had one workspace (only using one actively) that crashed my chrome browser every 2 minutes. I restarted everything (coder server and workspace) and now it works again. I did not see this in firefox before, but needed to switch to chrome for a project.

Coder v2.18.1

But even after that loading logs chrases the window because of "Out of Memory"

@BrunoQuaresma
Copy link
Collaborator

Hey folks, thanks for reporting this bug! Let’s work together to figure out the cause. My initial impression is that the workspace timeline component might be causing the issue. To debug this, I’d need you to share the response from the following API endpoint for the workspace you’re experiencing problems with:

https://<coder-host>/api/v2/workspaces/<workspace-id>/timings

How to Find the Workspace ID:

  1. Inspect the page in your browser.
  2. Go to the Network tab.
  3. Search for your workspace name.
  4. Click on Preview and locate the id attribute.

Example:

Image

Expected Response:

The response should look something like this:

Image

You can share the response with me via your preferred method. While this endpoint shouldn’t return sensitive information, please review it carefully before sharing. If you’d prefer to send it privately, I’m available on Coder’s Discord server.

@BrunoQuaresma BrunoQuaresma added the site Area: frontend dashboard label Jan 2, 2025
@mcm
Copy link

mcm commented Jan 2, 2025

That returned instantly for me, I confirmed that this workspace is currently experiencing the issue before testing. I'm not actively using it at the moment, so I'm happy to do whatever testing is needed.

{
  "provisioner_timings": [
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:10.993365Z",
      "ended_at": "2024-10-21T19:23:14.224937Z",
      "stage": "init",
      "source": "terraform",
      "action": "initializing terraform",
      "resource": "state file"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.36471Z",
      "ended_at": "2024-10-21T19:23:15.369283Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_parameter.git_user_name"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.364828Z",
      "ended_at": "2024-10-21T19:23:15.369904Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_workspace.me"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.366319Z",
      "ended_at": "2024-10-21T19:23:15.369427Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_parameter.git_email"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.367005Z",
      "ended_at": "2024-10-21T19:23:15.370664Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_parameter.home_disk_size"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.367015Z",
      "ended_at": "2024-10-21T19:23:15.370526Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_parameter.distro_release"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.368156Z",
      "ended_at": "2024-10-21T19:23:15.372657Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_parameter.memory"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.368178Z",
      "ended_at": "2024-10-21T19:23:15.370817Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_parameter.gpu"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.369082Z",
      "ended_at": "2024-10-21T19:23:15.372028Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_parameter.cpu"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.376383Z",
      "ended_at": "2024-10-21T19:23:15.397921Z",
      "stage": "plan",
      "source": "kubernetes",
      "action": "state refresh",
      "resource": "kubernetes_persistent_volume_claim.home"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.379204Z",
      "ended_at": "2024-10-21T19:23:15.38209Z",
      "stage": "plan",
      "source": "coder",
      "action": "state refresh",
      "resource": "coder_agent.main"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.39115Z",
      "ended_at": "2024-10-21T19:23:15.39265Z",
      "stage": "plan",
      "source": "coder",
      "action": "state refresh",
      "resource": "coder_app.code-server"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:15.521289Z",
      "ended_at": "2024-10-21T19:23:17.100033Z",
      "stage": "graph",
      "source": "terraform",
      "action": "building terraform dependency graph",
      "resource": "state file"
    },
    {
      "job_id": "9af0530a-2c43-4e75-a2b4-7c4eacca76d1",
      "started_at": "2024-10-21T19:23:18.010893Z",
      "ended_at": "2024-10-21T19:23:18.125306Z",
      "stage": "apply",
      "source": "kubernetes",
      "action": "create",
      "resource": "kubernetes_deployment.main[0]"
    }
  ],
  "agent_script_timings": [
    {
      "started_at": "2024-12-06T02:01:42.038604Z",
      "ended_at": "2024-12-06T02:01:48.735748Z",
      "exit_code": 0,
      "stage": "start",
      "status": "ok",
      "display_name": "Startup Script",
      "workspace_agent_id": "d43654a6-002c-42e3-8829-c330facc024b",
      "workspace_agent_name": "main"
    },
    {
      "started_at": "2024-12-13T18:15:16.603645Z",
      "ended_at": "2024-12-13T18:15:16.6258Z",
      "exit_code": 0,
      "stage": "start",
      "status": "ok",
      "display_name": "Startup Script",
      "workspace_agent_id": "d43654a6-002c-42e3-8829-c330facc024b",
      "workspace_agent_name": "main"
    }
  ],
  "agent_connection_timings": [
    {
      "started_at": "2024-10-21T19:23:20.244182Z",
      "ended_at": "2024-10-21T19:24:43.831238Z",
      "stage": "connect",
      "workspace_agent_id": "d43654a6-002c-42e3-8829-c330facc024b",
      "workspace_agent_name": "main"
    }
  ]
}

@archef2000
Copy link

I am not experiencing it anymore (after workspace update) and it was my only workspace, but here is my timings anyway:
It was also when using the code-server in the browser so not only the coder workspace log page.

{
  "provisioner_timings": [
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:22.799815Z",
      "ended_at": "2025-01-02T17:25:28.292937Z",
      "stage": "init",
      "source": "terraform",
      "action": "initializing terraform",
      "resource": "state file"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.700169Z",
      "ended_at": "2025-01-02T17:25:28.702566Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_provisioner.me"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.700267Z",
      "ended_at": "2025-01-02T17:25:28.702558Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_workspace_owner.me"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.70247Z",
      "ended_at": "2025-01-02T17:25:28.704096Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_workspace.me"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.706564Z",
      "ended_at": "2025-01-02T17:25:28.709574Z",
      "stage": "plan",
      "source": "coder",
      "action": "state refresh",
      "resource": "coder_agent.main"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.706839Z",
      "ended_at": "2025-01-02T17:25:28.726667Z",
      "stage": "plan",
      "source": "docker",
      "action": "state refresh",
      "resource": "docker_image.main"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.717254Z",
      "ended_at": "2025-01-02T17:25:28.718379Z",
      "stage": "plan",
      "source": "coder",
      "action": "state refresh",
      "resource": "coder_app.code-server"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.738701Z",
      "ended_at": "2025-01-02T17:25:28.879501Z",
      "stage": "plan",
      "source": "docker",
      "action": "state refresh",
      "resource": "docker_container.workspace[0]"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:28.977971Z",
      "ended_at": "2025-01-02T17:25:29.427195Z",
      "stage": "graph",
      "source": "terraform",
      "action": "building terraform dependency graph",
      "resource": "state file"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:29.731249Z",
      "ended_at": "2025-01-02T17:25:29.732097Z",
      "stage": "apply",
      "source": "coder",
      "action": "delete",
      "resource": "coder_app.code-server"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:29.743888Z",
      "ended_at": "2025-01-02T17:25:29.745745Z",
      "stage": "apply",
      "source": "coder",
      "action": "create",
      "resource": "coder_app.code-server"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:29.746273Z",
      "ended_at": "2025-01-02T17:25:31.354019Z",
      "stage": "apply",
      "source": "docker",
      "action": "delete",
      "resource": "docker_container.workspace[0]"
    },
    {
      "job_id": "dd7af91f-7908-498f-a1d0-42af56de1d06",
      "started_at": "2025-01-02T17:25:31.401457Z",
      "ended_at": "2025-01-02T17:25:31.998322Z",
      "stage": "apply",
      "source": "docker",
      "action": "create",
      "resource": "docker_container.workspace[0]"
    }
  ],
  "agent_script_timings": [
    {
      "started_at": "2025-01-02T17:25:32.914668Z",
      "ended_at": "2025-01-02T17:25:37.055461Z",
      "exit_code": 0,
      "stage": "start",
      "status": "ok",
      "display_name": "Startup Script",
      "workspace_agent_id": "eb7fcbb7-4007-4b49-9e3d-019e645d3c03",
      "workspace_agent_name": "main"
    }
  ],
  "agent_connection_timings": [
    {
      "started_at": "2025-01-02T17:25:32.612251Z",
      "ended_at": "2025-01-02T17:25:32.858563Z",
      "stage": "connect",
      "workspace_agent_id": "eb7fcbb7-4007-4b49-9e3d-019e645d3c03",
      "workspace_agent_name": "main"
    }
  ]
}

@BrunoQuaresma BrunoQuaresma self-assigned this Jan 2, 2025
@BrunoQuaresma
Copy link
Collaborator

Thank you, @mcm and @archef2000—this is super helpful! I’ll use this data to mock the API endpoint and see what happens.

@mcm
Copy link

mcm commented Jan 2, 2025

@BrunoQuaresma I just joined the Coder Discord, feel free to reach out there if you need anything as well. My coder instance is very much non-production so there's not really anything sensitive in there, and I can run live tests etc.

@raphaelfff
Copy link
Author

Thats what i get on the broken ws:

{
  "provisioner_timings": [],
  "agent_script_timings": [
    {
      "started_at": "2024-12-18T15:04:24.010926Z",
      "ended_at": "2024-12-18T15:04:24.087918Z",
      "exit_code": 0,
      "stage": "start",
      "status": "ok",
      "display_name": "Setup",
      "workspace_agent_id": "5b6acefd-24e6-4a6b-8eee-d31a6ecf823d",
      "workspace_agent_name": "main"
    },
    {
      "started_at": "2024-12-18T15:04:24.010925Z",
      "ended_at": "2024-12-18T15:04:36.019626Z",
      "exit_code": 0,
      "stage": "start",
      "status": "ok",
      "display_name": "code-server",
      "workspace_agent_id": "5b6acefd-24e6-4a6b-8eee-d31a6ecf823d",
      "workspace_agent_name": "main"
    }
  ],
  "agent_connection_timings": [
    {
      "started_at": "2024-10-07T08:48:20.437214Z",
      "ended_at": "2024-10-07T08:50:14.136753Z",
      "stage": "connect",
      "workspace_agent_id": "5b6acefd-24e6-4a6b-8eee-d31a6ecf823d",
      "workspace_agent_name": "main"
    }
  ]
}

@BrunoQuaresma
Copy link
Collaborator

@raphaelfff, was your workspace running when you got the response? I’m assuming the issue occurs when the workspace is up and running, correct? I’m asking because, typically, when a workspace is running, it returns provisioner timings, but in the response you shared, that field is empty.

@raphaelfff
Copy link
Author

Workspace is running, but it was created ages ago, before the timeline stuff was in place

@archef2000
Copy link

Don't know if helps, but again my timings after running it for some time and the code-server browser page crashing in unter a minute.

{
  "provisioner_timings": [
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:14.819777Z",
      "ended_at": "2025-01-05T12:47:18.654211Z",
      "stage": "init",
      "source": "terraform",
      "action": "initializing terraform",
      "resource": "state file"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.003863Z",
      "ended_at": "2025-01-05T12:47:19.004863Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_workspace.me"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.004077Z",
      "ended_at": "2025-01-05T12:47:19.004819Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_provisioner.me"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.004141Z",
      "ended_at": "2025-01-05T12:47:19.004989Z",
      "stage": "plan",
      "source": "coder",
      "action": "read",
      "resource": "data.coder_workspace_owner.me"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.00933Z",
      "ended_at": "2025-01-05T12:47:19.030038Z",
      "stage": "plan",
      "source": "docker",
      "action": "state refresh",
      "resource": "docker_image.main"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.01011Z",
      "ended_at": "2025-01-05T12:47:19.013026Z",
      "stage": "plan",
      "source": "coder",
      "action": "state refresh",
      "resource": "coder_agent.main"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.019742Z",
      "ended_at": "2025-01-05T12:47:19.020578Z",
      "stage": "plan",
      "source": "coder",
      "action": "state refresh",
      "resource": "coder_app.code-server"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.056489Z",
      "ended_at": "2025-01-05T12:47:19.494507Z",
      "stage": "graph",
      "source": "terraform",
      "action": "building terraform dependency graph",
      "resource": "state file"
    },
    {
      "job_id": "49fe19b4-19e9-4320-8e17-1a63164453da",
      "started_at": "2025-01-05T12:47:19.815179Z",
      "ended_at": "2025-01-05T12:47:20.238378Z",
      "stage": "apply",
      "source": "docker",
      "action": "create",
      "resource": "docker_container.workspace[0]"
    }
  ],
  "agent_script_timings": [],
  "agent_connection_timings": [
    {
      "started_at": "2025-01-05T12:47:20.782132Z",
      "ended_at": "2025-01-05T12:47:21.05562Z",
      "stage": "connect",
      "workspace_agent_id": "27941bd8-2f3b-4c0a-ad1d-46ea90cca242",
      "workspace_agent_name": "main"
    }
  ]
}

@BrunoQuaresma
Copy link
Collaborator

@raphaelfff @mcm @archef2000 Would it be possible for one of you to add me as a user in your deployment so I can debug this issue directly? I’ve been trying to replicate it but haven’t had any success so far 😞.

@raphaelfff
Copy link
Author

raphaelfff commented Jan 7, 2025

@BrunoQuaresma I m unable to add you to the deployment/workspace, but i'm happy to hop on a zoom call to debug, whats your email so I can schedule a call ?

@mcm
Copy link

mcm commented Jan 7, 2025

@BrunoQuaresma yeah absolutely, i am iamthemcmaster on discord if you want to reach out and i'll set it up.

@BrunoQuaresma
Copy link
Collaborator

Hey @raphaelfff, sorry for the delay. I’m going to check with @mcm to see if I can directly access the deployment as a user and try to figure out the problem there. If I can’t resolve it, I’ll schedule a call with you for sure. Does that work for you?

@raphaelfff
Copy link
Author

raphaelfff commented Feb 3, 2025

I m afraid this issue needs to be reoppened, i just upgraded to 2.18.5, and the issue still arises

@matifali
Copy link
Member

matifali commented Feb 3, 2025

Thank you @raphaelfff. I will let @BrunoQuaresma look into this.
To confirm, did you also update/restart your workspaces after upgrading to v2.18.5? A workspace upgrade is required to upgrade the agent running in the workspace to the latest version.

@matifali matifali reopened this Feb 3, 2025
@matifali
Copy link
Member

matifali commented Feb 3, 2025

cc: @DanielleMaywood

@raphaelfff
Copy link
Author

raphaelfff commented Feb 3, 2025

I took some more snapshots, here are a couple findings:
between two snapshot (t+1s and 1+10s), here are a bunch of things that were allocated over a million times:

  1. the string:
display:flex;justify-content:center;flex-shrink:0;color:#a1a1aa;;width:calc(var(--x-axis-width) * 2);&:not(:first-child){margin-left:calc(-1 * var(--x-axis-width));};
  1. an object called Ute in the minimized form
  2. A bunch of li with classes css-t7cbn9,
  3. A bunch of Text with content such as 1,894,295,000ms

Could it be that the timeline view becomes stupidly long ans tried to render all ticks for the duration (21 days in that case) ?

@raphaelfff
Copy link
Author

Image

Pretty sure my crazy assumption is right...

@matifali matifali added s2 Broken use cases or features (with a workaround). Only humans may set this. and removed need-help Assign this label prompts an engineer to check the issue. Only humans may set this. bug risk Prone to bugs labels Feb 3, 2025
@BrunoQuaresma
Copy link
Collaborator

@raphaelfff could you please share with me how you network request for this endpoint looks like please? 🙏 #15921 (comment)

@raphaelfff
Copy link
Author

I already did: #15921 (comment)

@BrunoQuaresma
Copy link
Collaborator

I already did: #15921 (comment)

@raphaelfff I mean, after updating to the latest Coder version.

@raphaelfff
Copy link
Author

{
	"provisioner_timings": [],
	"agent_script_timings": [
		{
			"started_at": "2024-12-18T15:04:24.010926Z",
			"ended_at": "2024-12-18T15:04:24.087918Z",
			"exit_code": 0,
			"stage": "start",
			"status": "ok",
			"display_name": "Setup",
			"workspace_agent_id": "5b6acefd-24e6-4a6b-8eee-d31a6ecf823d",
			"workspace_agent_name": "main"
		},
		{
			"started_at": "2024-12-18T15:04:24.010925Z",
			"ended_at": "2024-12-18T15:04:36.019626Z",
			"exit_code": 0,
			"stage": "start",
			"status": "ok",
			"display_name": "code-server",
			"workspace_agent_id": "5b6acefd-24e6-4a6b-8eee-d31a6ecf823d",
			"workspace_agent_name": "main"
		}
	],
	"agent_connection_timings": [
		{
			"started_at": "2024-10-07T08:48:20.437214Z",
			"ended_at": "2024-10-07T08:50:14.136753Z",
			"stage": "connect",
			"workspace_agent_id": "5b6acefd-24e6-4a6b-8eee-d31a6ecf823d",
			"workspace_agent_name": "main"
		}
	]
}

@BrunoQuaresma
Copy link
Collaborator

What's happening is, for some reason, the agent connection timings are returning a very old date. So, when the component tries to calculate the range and ticks for the chart, it blows up because there are too many ticks. Since the component expects the range to be in milliseconds, not days, it generates millions of ticks, causing a memory explosion.

@DanielleMaywood I think we talked about this, but I forgot—why are the agent connection timings returning a very old date compared to the dates returned by the agent script timings?

@archef2000
Copy link

The problem is also with the code server that is running on a workspace so when using that is also crashes after some time.

@matifali matifali added the customer-reported Bugs reported by enterprise customers. Only humans may set this. label Feb 7, 2025
@BrunoQuaresma
Copy link
Collaborator

I think I have an idea of what’s going on. This happens for workspaces with non-ephemeral resources like storage. So, if the storage is created today and a workspace build is triggered in one month, the timing returned will point one month back, creating a large time range that breaks the UI.

@DanielleMaywood @dannykopping Should we just ignore these kinds of resources when returning the build timings?

@raphaelfff
Copy link
Author

We are indeed using storage

@dannykopping
Copy link
Contributor

So, if the storage is created today and a workspace build is triggered in one month, the timing returned will point one month back, creating a large time range that breaks the UI.

Do we have a reproduction of this behaviour?

@DanielleMaywood @dannykopping Should we just ignore these kinds of resources when returning the build timings?

I don't think this is the correct approach. If there's indeed a bug leading to incorrect timeouts, we must fix it.

Additionally, we should either add 1) scaling and/or 2) maximum sizes for timing spans.

There's no value in having the user scroll beyond one or two "pages" (i.e. widths) of the graph. We don't need precision here, we just need an indicative view of where the time was spent. For this reason I think scaling is probably the cleanest solution - but it has the obvious downside that short timings would become impossibly narrow; I think that's OK because you only really care about the lengthy span.

@BrunoQuaresma
Copy link
Collaborator

I don't think this is the correct approach. If there's indeed a bug leading to incorrect timeouts, we must fix it.

Hum... I don't think incorrect timeouts are the issue here 🤔. What is causing the memory leak is the FE trying to render the chart for a long range as one week, when it is expected to be in seconds or minutes. Which leads me to a question about how we calculate build times. For me, does not make sense to include non ephemeral resources in the estimation since it is not related to the build specifically like a storage that will be created on the first build and reused for the subsequent ones.

There's no value in having the user scroll beyond one or two "pages" (i.e. widths) of the graph. We don't need precision here, we just need an indicative view of where the time was spent. For this reason I think scaling is probably the cleanest solution - but it has the obvious downside that short timings would become impossibly narrow; I think that's OK because you only really care about the lengthy span

I also think precision is not a huge thing here but more important, for me, is which type of resources we consider during the build time measurement.

@dannykopping
Copy link
Contributor

@BrunoQuaresma I'm in total agreement about sending back the correct measurements, which is why I asked if you have a reproduction so we can validate this. I can't see how the situation you described would occur.

@BrunoQuaresma
Copy link
Collaborator

@dannykopping I will try to reproduce this on dev.coder.com and share the workspace with you 🙏

@julianpoy
Copy link

Experiencing this as well.

@BrunoQuaresma
Copy link
Collaborator

Maybe some of these issues are going to be fixed by #17514

@archef2000
Copy link

What about the vscode workspace crashing?

@BrunoQuaresma
Copy link
Collaborator

BrunoQuaresma commented Apr 30, 2025

Updates!

Please, reach out to us if you still facing this issue after this get released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Bugs reported by enterprise customers. Only humans may set this. s2 Broken use cases or features (with a workaround). Only humans may set this. site Area: frontend dashboard
Projects
None yet
Development

No branches or pull requests

8 participants