Page MenuHomePhabricator

Gerrit Reviewer Bot is unreliable
Closed, ResolvedPublic

Description

For a significant fraction of patches, Gerrit Reviewer Bot just doesn't seem to work. This is not a new issue, but neither something that has been broken forever; I don't have good recollections but I think it started becoming unreliable some time in the last year?

IMO the bot is de facto core infrastructure by now; it should either be made more reliable, or it should be acknowledged on the wiki page that it doesn't always work and readers should be advised to use Gerrit's built-in feature instead.

Event Timeline

I've been wanting to kill off the terrible bot for ages but people bewilderingly seem to prefer it to the built-in feature. Of course, with the move to GitLab it'll need substantial re-working or retirement anyway. Maybe now's the time to just bit the bullet?

Do people prefer it because of the regexp support? In theory Gerrit's built-in notifications can do that (and more) but in practice I rarely manage to get it to work.

(Also I think there's value in having a publicly visible record of which repos are watched, although of course not as much value as having the watching mechanism actually work.)

The bot's issue tracker is https://github.com/valhallasw/gerrit-reviewer-bot/issues - I don't see anyone having reported any issues recently?

I'd assume it's more of an infrastructure issue than an issue with the bot code as I haven't seen any pattern in when the bot does/doesn't work. Are there any logs that can be checked for how a specific Gerrit changeset got handled?

https://gerrit-reviewer-bot.toolforge.org/ contains the last 50-or-so log lines; if you have access to tool labs I'm fairly certain you can just read the .out and .err logs in the home dir. The logging is not super extensive but should give a decent idea of what's happening.

The rough architecture is as follows:

  • Gerrit emails mediawiki-commits
  • This ends up in a mailbox as queue for Gerrit-Reviewer-Bot
  • GRB grabs the emails one by one and processes them. If an exception occurs, the email is kept and retried, otherwise it is deleted.

If a significant set of patches go missing, this suggests there's an issue in how emails are parsed (e.g. they are not recognised as PS1 emails). But then the question is why it's only a subset that go wrong...

Thanks @valhallasw! What's a reasonable time for the bot to process the queue? (Ie. when should I assume there was an error if I don't see the bot adding reviewers?)

https://gerrit-reviewer-bot.toolforge.org/ contains the last 50-or-so log lines; if you have access to tool labs I'm fairly certain you can just read the .out and .err logs in the home dir. The logging is not super extensive but should give a decent idea of what's happening.

The rough architecture is as follows:

  • Gerrit emails mediawiki-commits
  • This ends up in a mailbox as queue for Gerrit-Reviewer-Bot
  • GRB grabs the emails one by one and processes them. If an exception occurs, the email is kept and retried, otherwise it is deleted.

So I guess we could look in the mailbox to see if the email exists still? Here's one where Reviewer-Bot didn't subscribe reviewers https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/715345 Are the emails deleted or put in trash and only removed X number of days later?

If a significant set of patches go missing, this suggests there's an issue in how emails are parsed (e.g. they are not recognised as PS1 emails). But then the question is why it's only a subset that go wrong...

Maybe using Gerrit's stream-events would be more reliable? You could subscribe to https://gerrit-review.googlesource.com/Documentation/cmd-stream-events.html#_patchset_created events. https://github.com/wikimedia/labs-tools-gerrit-newcomer-bot and https://gitlab.wikimedia.org/KostaHarlan/fix-suggester-bot are both using that. The newcomer bot has example code in python already.

I think I know what's going on -- it's an issue I fixed earlier for ReleaseTaggerBot (which uses the same email based system), but forgot to also apply here: T284587: https://github.com/wikimedia/labs-tools-forrestbot/commit/76a76db66a2fa6a54493b402bd9c68376f0d70e0

tl;dr: either Mailman 3+ or a recent Gerrit update results in emails to contain a weird mix of LF and CRLF.

Unfortunately I don't have access to Toolforge right now -- I can apply the fix in about a week.

tools.gerrit-reviewer-bot@tools-sgebastion-07:~$ grep gerrit_reviewer_bot.out -e 'Ie4d0ef0495b40abeff825d83bf1c28303e81c848' -C5
0 e-mails to process (0 kB)
Done.

Mon Aug 30 02:55:06 UTC 2021: Running as task 8443838 @ tools-sgeexec-0950.tools.eqiad.wmflabs
2 e-mails to process (68 kB)
[MediaWiki-commits] [Gerrit] ...GrowthExperiments[master]: [WIP] Add image recommendations to task type A/B filter =?UTF-8?Q?Gerg=C5=91_Tisza_=28Code_Review=29?= <gerrit@wikimedia.org> Ie4d0ef0495b40abeff825d83bf1c28303e81c848
skipping PS1
[MediaWiki-commits] [Gerrit] ...GrowthExperiments[master]: [WIP] Add mock frontend for Add Image =?UTF-8?Q?Gerg=C5=91_Tisza_=28Code_Review=29?= <gerrit@wikimedia.org> If4483fd3b22635ecc2d76456d4c4dd83f7ee4a7f
skipping PS1
Done.

Why does it skip PS1? Well, it's actually skipping PS1\r:

tools.gerrit-reviewer-bot@tools-sgebastion-07:~$ grep gerrit_reviewer_bot.out -e 'Ie4d0ef0495b40abeff825d83bf1c28303e81c848' -C5 | hd
[...]
00000180  38 0a 73 6b 69 70 70 69  6e 67 20 50 53 31 0d 0a  |8.skipping PS1..|

note the 0d 0a -- the 0d is considered part of the patch set number.

I've backported the ReleaseTaggerBot fix to the reviewer bot, which hopefully will keep it alive until Gerrit is replaced by GitLab.

hashar subscribed.

Thank you for the debugging @valhallasw . Do I understand with the backport fix the issue has been fixed?

Thank you for the debugging @valhallasw . Do I understand with the backport fix the issue has been fixed?

Likewise, thank you @valhallasw for debugging and working to fix it.

I've still seen the problem occur in the last week; the next time I notice it happen with a patch, I will comment here.

The immediate issue has been fixed on spot.

We can later look at moving all the content from https://www.mediawiki.org/wiki/Git/Reviewers toward the Gerrit equivalent.

Config format is held in Gerrit repos under refs/meta/config in a file named reviewers.config https://gerrit.wikimedia.org/r/plugins/reviewers/Documentation/config.md

There is also a REST API https://gerrit.wikimedia.org/r/plugins/reviewers/Documentation/rest-api.md

But I guess that would be another task. I am closing this one since the original issue has been solved.