Skip to content

chore: use a cache to select best latency #7879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 7, 2023

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Jun 6, 2023

Problem

Currently the latency report for workspace proxies is occasionally "slow", making it inconsistent. Some page loads have a very high latency on the preferred proxy closest to the user, instead selecting a slower proxy.

Debugging this has proved difficult.

There are 2 core issues with the current implementation:

  1. Networks are inherently unreliable, so our latency check will always have the potential to misreport a proxy as "high latency".
  2. We currently take the last latency report as fact, having no way to remove outliers.

This PR only intends to begin to address # 2. Collecting data on this has been challenging because it is all client side.

What this does

What this does is each time we load proxy latencies, we store the fetched latencies in local storage.

We always use the minimum latency from the cached latencies + the latest. We use the minimum as that best describes the best case round trip time. We might want to later change this to some sort of "median", or "average without outliers".

We store at most 8 cached latencies, and remove any latencies gathered over 1 week ago.

Debugging + Gathering Data

Honestly, I am unsure what the magic solution is here. Is minimum really the best? Is the average better? Etc etc.

Without some data, it is hard to know. And unfortunately my test deployments behave differently and appear to behave more consistently. So it is hard to gather data outside of production. And all this latency data is clientside, so we cannot push it into our telemetry or metrics.

To get some data to make these decisions, I added a local storage value called: workspace-proxy-latencies-max. This value can be set to any number, and your browser will save all latency checks made up to that amount into local storage. We can set it to something like 300, dogfood the product, and then download the data and see how inconsistent our latencies really are.

Performance

Performance on this is probably not the best and can be improved. Local storage is the user's disk, so access is not as fast as memory. We can optimize this later if this proves to be a useful strategy.

@Emyrk Emyrk changed the title chore: Testing using a cache to choose the best latency chore: use a cache to select best latency Jun 6, 2023
@Emyrk Emyrk marked this pull request as ready for review June 6, 2023 20:20
@spikecurtis
Copy link
Contributor

We always use the minimum latency from the cached latencies + the latest. We use the minimum as that best describes the best case round trip time. We might want to later change this to some sort of "median", or "average without outliers".

Generally, the thing to do is make multiple measurements and compute a weighted average. For example, you might measure every 5 seconds, and give any measurements in the last minute a weight of 1, anything older than a minute a weight of zero (i.e. a sliding window). Other weight functions are possible, like ones that smoothly decay, but in practice you can keep it pretty simple.

We can add more statistical sophistication later if needed. There are edge cases that can be annoying, like having two proxies that are very nearly isochronous from the client, and you flap back and forth randomly, but I suggest we ignore this until we actually observe it in practice.

@BrunoQuaresma
Copy link
Collaborator

I like weight average but also, and maybe more simply, would be to use a regular median.

Copy link
Collaborator

@BrunoQuaresma BrunoQuaresma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE good, but I would like to wait until we have discussed the "calculate latency methodology".

@Emyrk
Copy link
Member Author

Emyrk commented Jun 7, 2023

@spikecurtis my fear of the weighted average is that the occasional outlier is drastically different.

The default proxy is ~32ms for me most of the time. When it is slow, it could be as high as 130ms.

Maybe a weighted average + some outlier detection with std deviations to ignore the occasional? But I don't know exactly, so hopefully I can get some actionable data from this PR 🤞

like having two proxies that are very nearly isochronous from the client, and you flap back and forth randomly

I had not considered this. We should probably weight the previously selected one then to keep it consistent. 🤔

@Emyrk Emyrk merged commit 1376ba3 into main Jun 7, 2023
@Emyrk Emyrk deleted the stevenmasley/latency_consistent branch June 7, 2023 13:29
@github-actions github-actions bot locked and limited conversation to collaborators Jun 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants