Page MenuHomePhabricator

Lock wait timeout exceeded; try restarting transaction in User::addToDatabase()
Closed, ResolvedPublic

Description

The CentralAuth maintenance createLocalAccount.php calls User:addToDatabase(), but the script errors out due to MariaDB lock errors:
Mon Aug 24 12:58:56 UTC 2015 mw1 delete3wiki User::addToDatabase 185.52.1.77 1205 Lock wait timeout exceeded; try restarting transaction (185.52.1.77) INSERT IGNORE INTO user (user_id,user_name,user_password,user_newpassword,user_newpass_time,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched) VALUES (NULL,'John','','',NULL,'',NULL,'','[REDACTED]','20150824125805','0','20150824125810')

Timeout:

MariaDB [(none)]> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50    |
+--------------------------+-------+
1 row in set (0.00 sec)

How can a timeout of 50 seconds still not be enough?

Event Timeline

Southparkfan raised the priority of this task from to Needs Triage.
Southparkfan updated the task description. (Show Details)

What else is going on in the database?

What else is going on in the database?

MariaDB [(none)]> show full processlist;
+--------+-----------+-------------------+-------------+---------+------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Id     | User      | Host              | db          | Command | Time | State  | Info                                                                                                                                                                                                                                                                                                                                                              | Progress |
+--------+-----------+-------------------+-------------+---------+------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 146694 | root      | localhost         | NULL        | Query   |    0 | init   | show full processlist                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 146745 | mediawiki | 185.52.1.75:38964 | delete3wiki | Sleep   |   37 |        | NULL                                                                                                                                                                                                                                                                                                                                                              |    0.000 |
| 146746 | mediawiki | 185.52.1.75:39184 | delete3wiki | Query   |   37 | update | INSERT /* User::addToDatabase  */ IGNORE INTO `user` (user_id,user_name,user_password,user_newpassword,user_newpass_time,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched) VALUES (NULL,'Southparkfan','','',NULL,'',NULL,'','[REDACTED]','20150825144634','0','20150825144639') |    0.000 |
+--------+-----------+-------------------+-------------+---------+------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+

@Aklapper: Why do you post that link, do you think that incident is related (I don't think so - but perhaps that's not true...)?

@Aklapper: Why do you post that link

Just a wild guess that might actually be unrelated. Sorry.

What else is going on in the database?

MariaDB [(none)]> show full processlist;
+--------+-----------+-------------------+-------------+---------+------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Id     | User      | Host              | db          | Command | Time | State  | Info                                                                                                                                                                                                                                                                                                                                                              | Progress |
+--------+-----------+-------------------+-------------+---------+------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 146694 | root      | localhost         | NULL        | Query   |    0 | init   | show full processlist                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 146745 | mediawiki | 185.52.1.75:38964 | delete3wiki | Sleep   |   37 |        | NULL                                                                                                                                                                                                                                                                                                                                                              |    0.000 |
| 146746 | mediawiki | 185.52.1.75:39184 | delete3wiki | Query   |   37 | update | INSERT /* User::addToDatabase  */ IGNORE INTO `user` (user_id,user_name,user_password,user_newpassword,user_newpass_time,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched) VALUES (NULL,'Southparkfan','','',NULL,'',NULL,'','[REDACTED]','20150825144634','0','20150825144639') |    0.000 |
+--------+-----------+-------------------+-------------+---------+------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+

From https://dev.mysql.com/doc/refman/5.0/en/show-processlist.html :

If you have the PROCESS privilege, you can see all threads. Otherwise, you can see only your own threads (that is, threads associated with the MySQL account that you are using).

Are you sure you're seeing all processes of the server?

I think (But not really sure, haven't tried it), that

SHOW ENGINE INNODB STATUS\G

can show information about which transactions have what locks. That might be useful.

Discard my comment about the PROCESS privilege, since it also shows the root user, which obviously has the right since is who's issuing the show full processlist command

Personally, It doesn't really seem possible that you could have deadlock when there is only 1 transaction going. What else could possibly be contending for access? (Although I'm not all that familiar with mysql locking)

Maybe something is messed with your mariadb install.

"Lock wait timeout exceeded" means that InnoDB could not lock the rows that have to be written (beforehand). Looking at your other process, it is clear that 146745 is the cause, even if it is not executing nothing right now it probably wrote to those table but didn't commit the changes, so it continues holding the changes. If you have a permission limitation, it could be other process, and both of those are blocked.

To clarify which processes have rows locked, use "SHOW ENGINE INNODB STATUS\G" on the server while the block is ongoing to understand the reason. Most of mediawiki, at least on WMF, assumes no gap locking, so READ COMMITTED or lower consistency level. I cannot assure that REPEATABLE READS with gap locking does not work, and it could be the cause (but only because that is not unit-tested), but that is something that I cannot say for sure without knowing your configuration. Check our production my.cnf innodb configuration for details.

Will leave it up to SPF to close and give the reason.

Thanks for reopening I guess. Anyway, resolved by following a bit of @jcrespo's advice which can be seen in the commit.

I'm wondering if REPEATABLE READ is not working but READ COMMITTED is, should MediaWiki establish explicitly its transaction isolation level when making the connection instead of relying on the default one, to prevent errors like this?

For mediawiki devs: I would open an upstream bug to check if mediawiki works with the default MySQL isolation level, I cannot confirm that the bug is real unless we have more testing (it could be a race condition, and they just think that I helped them with confirmation bias :-P).

If someones has the time, it should be tested and either fix this particular script (preferred) or documented.

For mediawiki devs: I would open an upstream bug to check if mediawiki works with the default MySQL isolation level, I cannot confirm that the bug is real unless we have more testing (it could be a race condition, and they just think that I helped them with confirmation bias :-P).

If someones has the time, it should be tested and either fix this particular script (preferred) or documented.

My test wiki uses the default mysql "REPEATABLE READ" isolation level. I've never had deadlock issues (Of course, there's only 1 user using my wiki at a time, but the description of the problem above made it sound like there was only a single user, also my testwiki is not using central auth, but still, I find it hard to believe that the default value would cause large problems, when 90% of non-wmf wikis out there use it)

The processlist displays 2 processes, one idle and another performing the update, both from MediaWiki. I guess the script is opening 2 connections, the first one was to check if the user exists, and the other is to insert the user in the database.

I've tried that operation using 2 different connections with transactions and I haven't found any locking issue. The first select doesn't create a lock that prevents the insert from the second connection. Instead, it creates a snapshot, so performing the first select after the insert doesn't display the inserted row. I've tried REPEATABLE READ and SERIALIZABLE (for both connections) and no lock wait timeout happened...

Guess what I've found in the logs:

Sun Mar 13 15:26:38 UTC 2016    mw1     metawiki        User::addToDatabase     185.52.1.77     1205    Lock wait timeout exceeded; try restarting transaction (185.52.1.77)    INSERT IGNORE INTO `user` (user_id,user_name,user_password,user_newpassword,user_newpass_time,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched) VALUES (NULL,'Z','','',NULL,'',NULL,'','[REDACTED]','20160313152546','0','20160313152552')
Sun Mar 13 15:38:22 UTC 2016    mw1     poserdazfreebieswiki    User::addGroup  185.52.1.77     1205    Lock wait timeout exceeded; try restarting transaction (185.52.1.77)    INSERT IGNORE INTO `user_groups` (ug_user,ug_group) VALUES ('13','autopatrolled')

(the User::addGroup() might need a separate ticket, though) This bug causes T115198 ('Could not find local user data') for me (and probably for WMF (production) too, although you should look at your log files which I have no access to), and a lot of users are complaining about all kinds of MWExceptions upon login and such.

MediaWiki1.26.2 (7a623f9)
HHVM3.6.6 (srv)
MariaDB10.0.23-MariaDB-0+deb8u1-log
MariaDB configurationhttps://github.com/miraheze/puppet/blob/master/modules/mariadb/templates/config/mw.cnf.erb

Help is appreciated. Feel free to ask if more information is needed.

Shouldn't be assigned to me.

Krinkle claimed this task.
Krinkle subscribed.

No longer seen in the last 30 days in Wikimedia production logstash.