chore: use a cache to select best latency #7879

Emyrk · 2023-06-06T19:11:55Z

Problem

Currently the latency report for workspace proxies is occasionally "slow", making it inconsistent. Some page loads have a very high latency on the preferred proxy closest to the user, instead selecting a slower proxy.

Debugging this has proved difficult.

There are 2 core issues with the current implementation:

Networks are inherently unreliable, so our latency check will always have the potential to misreport a proxy as "high latency".
We currently take the last latency report as fact, having no way to remove outliers.

This PR only intends to begin to address # 2. Collecting data on this has been challenging because it is all client side.

What this does

What this does is each time we load proxy latencies, we store the fetched latencies in local storage.

We always use the minimum latency from the cached latencies + the latest. We use the minimum as that best describes the best case round trip time. We might want to later change this to some sort of "median", or "average without outliers".

We store at most 8 cached latencies, and remove any latencies gathered over 1 week ago.

Debugging + Gathering Data

Honestly, I am unsure what the magic solution is here. Is minimum really the best? Is the average better? Etc etc.

Without some data, it is hard to know. And unfortunately my test deployments behave differently and appear to behave more consistently. So it is hard to gather data outside of production. And all this latency data is clientside, so we cannot push it into our telemetry or metrics.

To get some data to make these decisions, I added a local storage value called: workspace-proxy-latencies-max. This value can be set to any number, and your browser will save all latency checks made up to that amount into local storage. We can set it to something like 300, dogfood the product, and then download the data and see how inconsistent our latencies really are.

Performance

Performance on this is probably not the best and can be improved. Local storage is the user's disk, so access is not as fast as memory. We can optimize this later if this proves to be a useful strategy.

spikecurtis · 2023-06-07T10:59:17Z

We always use the minimum latency from the cached latencies + the latest. We use the minimum as that best describes the best case round trip time. We might want to later change this to some sort of "median", or "average without outliers".

Generally, the thing to do is make multiple measurements and compute a weighted average. For example, you might measure every 5 seconds, and give any measurements in the last minute a weight of 1, anything older than a minute a weight of zero (i.e. a sliding window). Other weight functions are possible, like ones that smoothly decay, but in practice you can keep it pretty simple.

We can add more statistical sophistication later if needed. There are edge cases that can be annoying, like having two proxies that are very nearly isochronous from the client, and you flap back and forth randomly, but I suggest we ignore this until we actually observe it in practice.

BrunoQuaresma · 2023-06-07T12:31:03Z

I like weight average but also, and maybe more simply, would be to use a regular median.

BrunoQuaresma

FE good, but I would like to wait until we have discussed the "calculate latency methodology".

Emyrk · 2023-06-07T13:28:15Z

@spikecurtis my fear of the weighted average is that the occasional outlier is drastically different.

The default proxy is ~32ms for me most of the time. When it is slow, it could be as high as 130ms.

Maybe a weighted average + some outlier detection with std deviations to ignore the occasional? But I don't know exactly, so hopefully I can get some actionable data from this PR 🤞

like having two proxies that are very nearly isochronous from the client, and you flap back and forth randomly

I had not considered this. We should probably weight the previously selected one then to keep it consistent. 🤔

Emyrk added 2 commits June 6, 2023 13:53

chore: Testing using a cache to choose the best latency

b909e82

Allow storing more latencies if needed

84bc00f

github-actions bot assigned Emyrk Jun 6, 2023

Emyrk changed the title ~~chore: Testing using a cache to choose the best latency~~ chore: use a cache to select best latency Jun 6, 2023

Make fmt

1ab0758

Emyrk marked this pull request as ready for review June 6, 2023 20:20

BrunoQuaresma approved these changes Jun 7, 2023

View reviewed changes

Emyrk merged commit 1376ba3 into main Jun 7, 2023

Emyrk deleted the stevenmasley/latency_consistent branch June 7, 2023 13:29

github-actions bot locked and limited conversation to collaborators Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: use a cache to select best latency #7879

chore: use a cache to select best latency #7879

Uh oh!

Emyrk commented Jun 6, 2023 •

edited

Loading

Uh oh!

spikecurtis commented Jun 7, 2023

Uh oh!

BrunoQuaresma commented Jun 7, 2023

Uh oh!

BrunoQuaresma left a comment

Uh oh!

Emyrk commented Jun 7, 2023

Uh oh!

Uh oh!

chore: use a cache to select best latency #7879

chore: use a cache to select best latency #7879

Uh oh!

Conversation

Emyrk commented Jun 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What this does

Debugging + Gathering Data

Performance

Uh oh!

spikecurtis commented Jun 7, 2023

Uh oh!

BrunoQuaresma commented Jun 7, 2023

Uh oh!

BrunoQuaresma left a comment

Choose a reason for hiding this comment

Uh oh!

Emyrk commented Jun 7, 2023

Uh oh!

Uh oh!

Emyrk commented Jun 6, 2023 •

edited

Loading