Page MenuHomePhabricator

Implement SUL3 central autologin
Open, In Progress, Needs TriagePublic

Description

Enable central autologin (in its various forms - edge login, top-level autologin etc) for SUL3 so the SUL3 shared login domain can serve as a source of identity. This probably just involves changing the URL generation rules for autologin + whitelisting the relevant endpoint in the SSO hook handler.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DAlangi_WMF changed the task status from Open to In Progress.Oct 7 2024, 11:09 AM
DAlangi_WMF claimed this task.
DAlangi_WMF moved this task from Soon to Current Sprint on the MediaWiki-Platform-Team board.
DAlangi_WMF moved this task from Ready to In progress on the SUL3 board.

Change #1078400 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@master] Special: Add support for autologin in SUL3 mode

https://gerrit.wikimedia.org/r/1078400

For autologin, we want the SUL3 flag to be sticky through a single autologin process so we need to add a query flag to SpecialCentralAutologin (and SpecialCentralLogin if we end up using that somehow for SUL3).

I forgot to add it to this task, but we'll want to pass through the SUL3 feature flag during SUL3 central login / central autologin.

The control flow of central autologin is something like this:

CentralAuth autologin redirect flow (1).png (600×1 px, 97 KB)

It is triggered either by JS code loaded by ResourceLoader, getting the central domain URL as ResourceLoader JSON data; or by embedding an invisible pixel in the page; or by a top-level redirect when the user visits Special:UserLogin. Some of the states are reachable by anonymous users as well (the red arrows on the diagram), which is something like 1000x the traffic of logged-in users, and needs to be cached.

(Specifically: the JS entry point might come from cache because it is coming from a ResourceLoader module that's loaded on every pageview. The invisible pixel used for autologin that's set in PageDisplayHookHandler::onBeforePageDisplay() will be present in any anonymous pageview in a <noscript> tag, and such pageviews might come from cache. The /start and /checkLoggedIn subpages of Special:CentralAutoLogin need to be cached because anonymous users pass through them. The rest of the entry points don't matter that much - edge login is triggered by a session flag, so will only happen when the user is bypassing cache at least for the local wiki. Top-level autologin is triggered on Special:Userlogin which is never cached.)

For the SUL3 rollout, we want to do a staged rollout where we enable SUL3 for some user cohorts but not others. That means we need to use the SUL2 central login wiki for autologin for some users, and the SUL3 shared domain for others. This needs to work well with caching. I think the most feasible approach is:

  • Include the /start step in the two flows that need to be cacheable (the JS and <noscript> fallback). That way, we always use the same URL in the ResourceLoader module and in cached pageviews (since /start is on the local wiki). /start doesn't really do anything, just issues a redirect to the central wiki, so this should be fine.
    • This also means we can't pass a usesul3 parameter to /start - we need to modify the start step to add that to the list of parameters.
  • To ensure /start is cacheable, we need to split the cache on everything that is relevant for determining whether the user is opted in to SUL3. Currently that's just the sul3OptIn cookie. That has to be done somewhere in puppet, I think here, here and here.

Once we do the user preference option from T375954#10185685, we'll use the <wikiid>UserName cookie to determine whether the user is in the SUL2 or SUL3 cohort. It's not trivial whether the same approach works there. We can split the cache on the sul3OptIn cookie because it can only have one value (two, maybe, if we allow using it as an opt-out). But <wikiid>UserName will have a different value for every user and Varnish can't map it to a finite set of groups. Maybe that's fine, effectively the presence of the cookie will disable caching and it will be infrequent enough not to matter? The difference between *UserName and *Session/*Token cookies is that the latter are removed on logout. So the username cookie is present for everyone who has logged in in the last 365 days (or the last 30 days and didn't use the "keep me logged in" checkbox).
If that's a problem, we can just make sure that cookie is automatically converted by MediaWiki to sul3OptIn. We only really need to rely on it on the login page.

@Krinkle if you have some time to review this, I'd appreciate your thoughts.

Change #1084123 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@master] CentralAuthHooks: Point resource loader autologin URL to `/start`

https://gerrit.wikimedia.org/r/1084123

Change #1084123 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Point autologin resourceloader module URL to `/start` endpoint

https://gerrit.wikimedia.org/r/1084123

Change #1092323 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[operations/puppet@production] [SUL3] varnish: Split frontend cache on `sul3OptIn` cookie

https://gerrit.wikimedia.org/r/1092323