Page MenuHomePhabricator

grabText.php - non-English rename causes actor id exception
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):
Operating System Ubuntu 20.04
Software installed: Apache, MariaDB

  • Install mediawiki 1.39.5 or 1.39.6
  • Install Grabbers
  • run grabText.php on remote wikis

remote wikis: https://nerfpedialegacy.fandom.com, https://nerf.fandom.com

What happens?:
under namespace 3 aka User Talk, the following failure occurs

Inserting revision 78347
Notice: We encountered an user rename on ID 11530769, SSéÿßlâÐëRèvêñgë => ßéÿßlâÐëRèvêñgë
CannotCreateActorException from line 431 of /var/www/html/w/includes/user/ActorStore.php: Failed to create actor ID for user_id=11530769 user_name="SSéÿßlâÐëRèvêñgë"
#0 /var/www/html/w/grabbers/includes/ExternalWikiGrabber.php(185): MediaWiki\User\ActorStore->acquireActorId()
#1 /var/www/html/w/grabbers/includes/TextGrabber.php(157): ExternalWikiGrabber->getUserIdentity()
#2 /var/www/html/w/grabbers/grabText.php(283): TextGrabber->processRevision()
#3 /var/www/html/w/grabbers/grabText.php(131): GrabText->processPage()
#4 /var/www/html/w/grabbers/grabText.php(94): GrabText->processPagesFromNamespace()
#5 /var/www/html/w/maintenance/includes/MaintenanceRunner.php(309): GrabText->execute()
#6 /var/www/html/w/maintenance/doMaintenance.php(85): MediaWiki\Maintenance\MaintenanceRunner->run()
#7 /var/www/html/w/grabbers/grabText.php(344): require_once('/var/www/html/w...')

What should have happened instead?:
This should not have occurred and all text revisions should have copied correctly without error. Is not handling non-English characters correctly.
Should renameuser extension have been installed to avoid this error?

Software version (skip for WMF-hosted wikis like Wikipedia): 1.39.5 or 1.39.6

Event Timeline

Change 991976 had a related patch set uploaded (by Martineznovo; author: Martineznovo):

[mediawiki/tools/grabbers@master] Check for user name normalization differencies

https://gerrit.wikimedia.org/r/991976

Ciencia_Al_Poder subscribed.

Apparently, you can't have a user name "ßéÿßlâÐëRèvêñgë". The first character "LATIN SMALL LETTER SHARP S" is detected by MediaWiki as a lowercase character and the user name converted to "SSéÿßlâÐëRèvêñgë" automatically by MediaWiki classes handling revisions and inserting the actor name.

I think that's a bug in MediaWiki core. However, to avoid failing, the patch checks if the user name is canonicalized, and when it isn't, acts as if the user is imported from an external source.

Ciencia_Al_Poder changed the task status from Open to In Progress.Jan 21 2024, 8:32 PM
Ciencia_Al_Poder triaged this task as Medium priority.

Looking at T353766, even if I have no details about the cause, I've amended the patch to check for general validity, which should also work for that other task

SpookyGhost8 updated the task description. (Show Details)

I have tested the patch and confirm it resolves the bug report.

I was able to discover a workaround for this and that was to generate the impacted user using the maintenance script createAndPromote.php with the --force option

Change 991976 merged by Martineznovo:

[mediawiki/tools/grabbers@master] Check for invalid user names with id

https://gerrit.wikimedia.org/r/991976