Page MenuHomePhabricator

Stack overflow in AbuseFilter when using AbuseFilterCachingParser
Closed, ResolvedPublic

Description

Seen on bgwiki after enabling AbuseFilterCachingParser:

2016-10-19 17:35:19 [WAeu1gpAAEgAABAxmVUAAABD] mw1277 bgwiki 1.28.0-wmf.22 fatal ERROR: [b213742e] PHP Fatal Error: Stack overflow {"exception_id":"b213742e"}
[Exception ErrorException] (/srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php:926) PHP Fatal Error: Stack overflow
  #0 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): NO_FUNCTION_GIVEN()
  #1 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #2 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #3 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #4 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #5 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #6 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #7 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #8 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #9 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)
  #10 /srv/mediawiki/php-1.28.0-wmf.22/extensions/AbuseFilter/AbuseFilter.parser.new.php(926): AbuseFilterCachingParser->evalNode(AFPTreeNode)

Event Timeline

Change 316825 had a related patch set uploaded (by Ori.livneh):
Disable AbuseFilterCachingParser on bgwiki

https://gerrit.wikimedia.org/r/316825

Change 316825 merged by jenkins-bot:
Disable AbuseFilterCachingParser on bgwiki

https://gerrit.wikimedia.org/r/316825

Mentioned in SAL (#wikimedia-operations) [2016-10-19T17:41:03Z] <ori@mira> Synchronized wmf-config/InitialiseSettings.php: I6d28e534: Disable AbuseFilterCachingParser on bgwiki (T148660) (duration: 00m 50s)

So, my current understanding is that the reason why we are hitting this is
(1) There is no recursion limit in the new parser (definitely a bug),
(2) Previously the tree for same operations were flat, meaning that the trees we now get are much deeper. I'd need to think how to work this fact around.

Change 321890 had a related patch set uploaded (by Ori.livneh):
Don't use AbuseFilterCachingParser on bgwiki

https://gerrit.wikimedia.org/r/321890

Change 321890 merged by Ori.livneh:
Don't use AbuseFilterCachingParser on bgwiki

https://gerrit.wikimedia.org/r/321890

Mentioned in SAL (#wikimedia-operations) [2016-11-16T15:56:18Z] <ori@tin> Synchronized wmf-config/InitialiseSettings.php: I506f17f6: Don't use AbuseFilterCachingParser on bgwiki (T148660) (duration: 00m 49s)

Sorry that our filter has caused trouble on your side. I've split it in two parts as kindly suggested by @ori. Just one question though: is there any way to know what is the maximum regex length? The thing is, this filter may be extended in the future, and we might hit the same problem. Of course, we may just split it into three regexes right away to have larger margin for error, but still some clear indication of when a regex is too large would be nice. Could for instance a check be made when the filter is saved so that whoever was updating it would know immediately that they need to split it into even smaller fragments?

BTW, since this is a regex for websites, I might actually split it in 28 parts: one for each starting letter of the alphabet and one each for when the domain name starts with a number or with a Cyrillic letter, i.e. forbidden_regex_A, forbidden_regex_B, ... forbidden_regex_Z, forbidden_regex_NUM, forbidden_regex_CYR. Then obviously it'll be something like added_links irlike forbidden_regex_A | added_links irlike forbidden_regex_B | ... | added_links irlike forbidden_regex_CYR. Which actually would be more efficient server-wise: as large as possible regexes, but not too large to cause breakage, or many smaller OR-ed regexes?

Thanks, kerberizer!

Strictly speaking, it is not the regular expression that is the problem. The stack overflow is encountered when the new AbuseFilter parser (which was introduced in aa399da279) tries to parse the expression that constructs the regex by concatenating 500+ string fragments. If the regex was declared as a single string literal, this wouldn't be an issue.

You might not need a regex at all. AIUI you should be able to express the rule as

contains_any(added_links, "b19min.bg", "b1kam1.eu", "b1tv.ru", etc...)

@ori, the contains_any approach is arguably simpler, but wouldn't it lead to false positives, e.g., if I understand correctly how it works, contains_any(added_links, "my-site.com", ...) would fire also on yummy-site.com.

As for the concatenation, it was a rather crude method to insert comments for each domain (as it needs an explanation why it was added). I'll think of an alternative way to do this, so that the regex could become a single string.

Change 322289 had a related patch set uploaded (by Ori.livneh):
Revert "Don't use AbuseFilterCachingParser on bgwiki"

https://gerrit.wikimedia.org/r/322289

Change 322289 merged by jenkins-bot:
Revert "Don't use AbuseFilterCachingParser on bgwiki"

https://gerrit.wikimedia.org/r/322289

Mentioned in SAL (#wikimedia-operations) [2016-11-18T17:34:14Z] <ori@tin> Synchronized wmf-config/InitialiseSettings.php: If3b80b1a: Revert "Don't use AbuseFilterCachingParser on bgwiki" (T148660) (duration: 00m 50s)

Krinkle removed a project: Patch-For-Review.

So, this one is the only non-trivial bug left before we can re-enable the CachingParser. As explained in T148660#2775366, the problem here is that we changed an ordered list of tokens to a tree. So instead of

Tok( 'x' ) -> Tok( '+' ) -> Tok( 'y' ) -> Tok( '+' )  -> Tok( 'z' ) -> Tok( '+' )  -> Tok( 'n' )

we now have

        Node( '+' )
        /        \
       /          \
Node( 'x' )    Node( '+' )
                /        \
               /          \
        Node( 'y' )    Node( '+' )
                        /        \
                       /          \
                Node( 'z' )    Node( 'n' )

or something like that, which will take up way more memory.

Now, TBH, If I take bgwiki's filter 12 as it was in september 2016, and try to evaluate it with CachingParser, I don't get a stack overflow. And it's the same if I add more and more concatenation. At a certain point, I get a syntax error instead, but that's another matter.
That could be due to some improvement either in the Parser itself, or in PHP, but I wouldn't rely on that. Nevertheless, I also don't know how to fix this issue reliably. Maybe add a recursion limit? If so, we'd need to start with a deprecation phase.

Oh, and given that:

Now, TBH, If I take bgwiki's filter 12 as it was in september 2016, and try to evaluate it with CachingParser, I don't get a stack overflow. And it's the same if I add more and more concatenation. At a certain point, I get a syntax error instead, but that's another matter.
That could be due to some improvement either in the Parser itself, or in PHP, but I wouldn't rely on that. Nevertheless, I also don't know how to fix this issue reliably. Maybe add a recursion limit? If so, we'd need to start with a deprecation phase.

I'm also uncertain whether this should be considered a blocker to the re-deployment of the CachingParser. @Krinkle Any thoughts?

Daimona lowered the priority of this task from High to Medium.Aug 23 2019, 7:13 PM

IMHO this is not blocking the deployment of the new parser anymore, and can be resolved later. Unless we find this error in production again after enabling the CachingParser, of course.

Daimona claimed this task.

Closing per my previous comments. Can be reopened if the issue appears again, although it's not clear what a solution might look like.