Page MenuHomePhabricator

PageImages is causing core parser test failures in other extensions, e.g. GrowthExperiments
Closed, ResolvedPublic

Description

PageImages is causing core parser test failures in other extensions, e.g. GrowthExperiments.

Caused by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/743062

Example failing change: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/752711

03:21:16 There was 1 failure:
03:21:16 
03:21:16 1) ParserIntegrationTest::testParse with data set "parserTests.txt: Page status indicators: Torture test" ('legacy parser')
03:21:16 Failed asserting that two strings are equal.
03:21:16 --- Expected
03:21:16 +++ Actual
03:21:16 @@ @@
03:21:16  '01=hello world\n
03:21:16  02=<a href="/wiki/Main_Page" title="Main Page">Main Page</a>\n
03:21:16 -03=<span typeof="mw:Image"><span><img alt="Foobar.jpg" resource="/wiki/File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/25px-Foobar.jpg" decoding="async" width="25" height="3" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/38px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/50px-Foobar.jpg 2x" /></span></span>\n
03:21:16 -04=<span typeof="mw:Image"><a href="/wiki/File:Foobar.jpg"><img alt="Foobar.jpg" resource="/wiki/File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/25px-Foobar.jpg" decoding="async" width="25" height="3" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/38px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/50px-Foobar.jpg 2x" /></a></span>\n
03:21:16 +03=<span typeof="mw:Image"><span><img alt="Foobar.jpg" resource="/wiki/File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/25px-Foobar.jpg" decoding="async" width="25" height="3" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/38px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/50px-Foobar.jpg 2x" /></span></span><!--MW-PAGEIMAGES-CANDIDATE-0-->\n
03:21:16 +04=<span typeof="mw:Image"><a href="/wiki/File:Foobar.jpg"><img alt="Foobar.jpg" resource="/wiki/File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/25px-Foobar.jpg" decoding="async" width="25" height="3" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/38px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/50px-Foobar.jpg 2x" /></a></span><!--MW-PAGEIMAGES-CANDIDATE-1-->\n
03:21:16  05=<ul><li>foo</li>\n
03:21:16  <li>bar</li></ul>\n
03:21:16  06=foo\n
03:21:16 
03:21:16 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:64
03:21:16 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:63
03:21:16 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:63

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 752787 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/PageImages@master] Disable parser HTML changes during parser tests, they cause failures

https://gerrit.wikimedia.org/r/752787

Change 752787 merged by jenkins-bot:

[mediawiki/extensions/PageImages@master] Disable parser HTML changes during parser tests, they cause failures

https://gerrit.wikimedia.org/r/752787

CI is still broken, somehow, even though locally this fix seemed to work for me.

The hack I added only works when running tests using parserTests.php, and we're running them in a different way in CI :/

I started reading the PageImages code more closely, and realized hat the HTML comment causing the failures isn't supposed to appear in the final output. The failing test is "Page status indicators: Torture test", and indicators are a little special. There might be an actual bug here somewhere.

I don't think it's a parser test issue. Those comments are leaking into the page indicators and need to be stripped out.

Change 752813 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/PageImages@master] Strip comments from indicators

https://gerrit.wikimedia.org/r/752813

The relevant test is

Page status indicators: Torture test
!! options
showindicators
!! config
wgParserEnableLegacyMediaDOM=false
!! wikitext
...
<indicator name="03">[[File:Foobar.jpg|25px|link=]]</indicator>
<indicator name="04">[[File:Foobar.jpg|25px]]</indicator>
...
!! html/php
...
03=<span typeof="mw:Image"><span><img alt="Foobar.jpg" resource="/wiki/File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/25px-Foobar.jpg" decoding="async" width="25" height="3" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/38px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/50px-Foobar.jpg 2x" /></span></span>
04=<span typeof="mw:Image"><a href="/wiki/File:Foobar.jpg"><img alt="Foobar.jpg" resource="/wiki/File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/25px-Foobar.jpg" decoding="async" width="25" height="3" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/38px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/50px-Foobar.jpg 2x" /></a></span>
...
!! end

PageImages removes these !--MW-PAGEIMAGES-CANDIDATE-N--> markers in ParserAfterTidy. Indicators (CoreTagHooks::indicator()) are tag hooks - they receive wikitext, and run it through Parser::recursiveTagParseFully() which is similar to Parser::parse() but specifically skips ParserAfterTidy. AIUI the idea here is that the output of the tag hooks will be somewhere inside the HTML blob output by the parser anyway, so when the main parse process runs ParserAfterTidy, that will cover the output of the tag hook as well. But that's not actually the case for indicators.

Running ParserAfterTidy on indicators doesn't seem feasible; various extensions assume it's going to be run once and on the main content, and they append markup etc. All other hooks run before unstripping strip markers so wouldn't clean up the comments that end up inside strip markers. I guess there could either be a new hook that's specific to indicators, or PageImages could make two passes for comment removal - the first in ParserSectionCreate (seems like the only other hook that runs after sections have been identified, which is needed for $wgPageImagesLeadSectionOnly), the second in ParserAfterTidy.

For the time being, let's just disable that one test. It's for page indicator edge cases, not that important.

Eh. Parsoid has a copy of the test. We'd have to disable that too, then make a new release, update core Composer config, update mediawiki/vendor...
Let's just revert the PageImages patch for now.

Oops, sorry, completely missed @tstarling's comment and patch above. I didn't realize indicators can be changed after they are set. Thanks for the fix!

Change 752813 merged by jenkins-bot:

[mediawiki/extensions/PageImages@master] Strip comments from indicators

https://gerrit.wikimedia.org/r/752813

Change 752770 had a related patch set uploaded (by Gergő Tisza; author: Tim Starling):

[mediawiki/extensions/PageImages@wmf/1.38.0-wmf.17] Strip comments from indicators

https://gerrit.wikimedia.org/r/752770

Change 752770 merged by jenkins-bot:

[mediawiki/extensions/PageImages@wmf/1.38.0-wmf.17] Strip comments from indicators

https://gerrit.wikimedia.org/r/752770

Tgr claimed this task.

Confirmed CI is working.

What I don't get is why it didn't break PageImages' own tests. Core parser tests are run in every extension's CI pipeline, right?