Page MenuHomePhabricator

Avoid extra HTTPS connections for most Event Platform beacons
Open, LowPublic

Description

Writing this in a fairly general form assuming the title/description will be edited to be more general since this doesn't exactly relate to Event Platform stuff, even though that's where it arose.

https://phabricator.wikimedia.org/T261340, proposes to resolve some URLs (in this specific case, https://intake-logging.wikimedia.org) to a different datacenter than normal. In particular here, different from the legacy /beacon endpoint, or the https://intake-analytics.wikimedia.org endpoint, etc.

This ticket is based on an observation @Krinkle made in https://phabricator.wikimedia.org/T226986#6467370

For EventLogging, we specifically moved away from separate domains to using /beacon so that the majority of non-deferred events that are sent off during user interactions don't require separate connections to be established etc. It has been a while since we last quantified the benefits of this choice, so it's certainly worth revisiting.

I see that EventGate, which is not used yet for most events, uses a separate domain again at https://intake-logging.wikimedia.org. That'll involve a DNS query, but given it points to dyna.wikimedia.org same as text-lb, I'm assuming this means it is handled through the same connection and traffic layer as other requests.

Again, per @Krinkle:

Starting to establish more than one primary connection on a majority of page views is something we phased out 5+ years ago and would be great not to bring back without further research and consideration first.

Event Timeline

In https://phabricator.wikimedia.org/T226986#6467482 I wrote:

Huh, I'm pretty sure I discussed this with @BBlack or @ema when we were first setting up eventgate-analytics-external, and they preferred that the intake service got its own unique URL, rather than serving it in the wiki domains.

I can't find a public discussion of this in Phab, just https://phabricator.wikimedia.org/T233629#557637, so we must have discussed it in IRC or elsewhere.

Also relevant: T261340: 'skip_first' feature flag for gdnsd GeoIP plugin
Chris is making use of the fact that there is a separate endpoint to route logging events to the next nearest datacenter in case there is something wrong with the route to the nearest datacenter.

To instrument this, and gauge any background/side impact during page load, I'd recommend creating two speed-tests scenarios under wikipedia.org/speed-tests/ where one is simple like the current Banksy page, and another that performs a handful of sendBeacon() calls from inline scripts against a domain that requires a separate DNS resolution and TLS/TCP connection. Either intake-logging is it has already been configured this way by now, or a any different production URL that doesn't sharre the same dns target and tls cert with the canonical wiki domains.

Then to add both scenarios to our synthetic test config for a few days to compare them side-by-side:
https://wikitech.wikimedia.org/wiki/Performance/WebPageTest#Add_a_new_URL_to_test

Krinkle renamed this task from Research and consider network connections made due to Event Platform to Avoid extra HTTPS connections for most Event Platform beacons.Oct 3 2022, 9:25 PM

Updated title to reflect to recognise that the original one of these (NEL: Network Error Logging) was intentionally done as a separate domain and DC for logicial seperation of networking problems when trying to have the browser notify us of networking problems.

However, for all first-party and in-page use of EventLogging/EventGate, this doesn't apply.