Skip to content

Commit be9954a

Browse files
Bug #32800020 EXPLORATORY - INVESTIGATE ODD NON-DATA DEADLOCK
Problem: The server terminates abnormally because of a deadlock between user thread (UPDATE) & purge thread. Analysis: 1. In 5.6, all undo logs reside either in system tablespace or in dedicated undo tablespaces. 2. In 5.7, new type of undo logs were introduced for temporary tables. One that is not redo logged and resides in the temp-tablespace. we reserved [1-32] rollback segment slots for temp-tables. (See [http://wl.mysql.oraclecorp.com/worklog/InnoDB-Sprint/?tid=6915]) 3. Upgrade from 5.6 ->5.7, server notices valid rsegs slots in range [1..32] and tries to move these rsegs from in slots 1..32 of `rseg_array[]` to `pending_purge_rseg_array[]`, so the slots can be used for non-redo rsegs (temp-table). 4. This is just an in-memory movement, as the relevant on-disk trx system page is not updated. So when the next time the server will start, will repeat [step-3]. 5. Till this point, there is no problem because all rsegs in `rseg_array[]` or in `pending_purge_rseg_array[]` are attached with unique <space_id, Page_no>. 6. Now when undo tablespace truncate is triggered: a. UNDO tablespace truncate collects all rsegs in a undo tablespace into truncate vector `m_rseg_for_trunc`. b. Scan over each rseg in `m_rseg_for_trunc` and ensure that it doesn't hold any active trx. c. If there is any active trx then stop truncation otherwise truncate undo tablespace file. d. Again scan over each rseg in `m_rseg_for_trunc` and reinitialize rollback segment and assign page start with number 3. 7. Here comes the problem. we forgot to reinitialize rollback segments in `trx_sys->pending_purge_rseg_array`. These rsegs are still attached with <space_id, Page_no>. So now two rollback segments are attched with same <Space_id, Page_no>. One rseg in `rseg_array[]` and the other in `pending_purge_rseg_array[]`. 8. And regs slots in trx system page are still have <space_id, Page_no> info for rsegs in `trx_sys->pending_purge_rseg_array` so even after restart this problem exist. [mentioned in step-7]. 9. In this situation, deadlock might occur between user thread & purge thread. when a. User thread (during trx_commit(), adding undo log to history list) has acquired x-latch over undo log page and waiting for x-latch over rseg page. b. Purge thread (during truncate undo log & history list for `pending_purge_rseg_array`) has acquired x-latch over rseg page and waiting for x-latch over undo log page. c. If user thread & purge thread are in race for same undo log page & rseg page then they can deadlock either other. Fix: During undo tablespace truncate, scan all rollback segments in `pending_purge_rseg_array[]` and free all rsegs, those exist in same undo tablespace. RB: 26376 Reviewed by : Jakub Lopuszanski <jakub.lopuszanski@oracle.com>
1 parent 073cb05 commit be9954a

File tree

4 files changed

+22
-7
lines changed

4 files changed

+22
-7
lines changed

storage/innobase/include/trx0sys.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -627,7 +627,7 @@ struct trx_sys_t {
627627
transactions), protected by
628628
rseg->mutex */
629629

630-
trx_rseg_t* const pending_purge_rseg_array[TRX_SYS_N_RSEGS];
630+
trx_rseg_t* pending_purge_rseg_array[TRX_SYS_N_RSEGS];
631631
/*!< Pointer array to rollback segments
632632
between slot-1..slot-srv_tmp_undo_logs
633633
that are now replaced by non-redo

storage/innobase/trx/trx0rseg.cc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -310,11 +310,9 @@ trx_rseg_schedule_pending_purge(
310310

311311
ut_ad(found);
312312

313-
trx_rseg_t** rseg_array =
314-
((trx_rseg_t**) trx_sys->pending_purge_rseg_array);
315313
rseg = trx_rseg_mem_create(
316314
slot, space, page_no, page_size,
317-
purge_queue, rseg_array, mtr);
315+
purge_queue, trx_sys->pending_purge_rseg_array, mtr);
318316

319317
ut_a(rseg->id == slot);
320318
}

storage/innobase/trx/trx0sys.cc

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1208,15 +1208,14 @@ trx_sys_close(void)
12081208
}
12091209
}
12101210

1211-
rseg_array = ((trx_rseg_t**) trx_sys->pending_purge_rseg_array);
1212-
12131211
for (ulint i = 0; i < TRX_SYS_N_RSEGS; ++i) {
12141212
trx_rseg_t* rseg;
12151213

12161214
rseg = trx_sys->pending_purge_rseg_array[i];
12171215

12181216
if (rseg != NULL) {
1219-
trx_rseg_mem_free(rseg, rseg_array);
1217+
trx_rseg_mem_free(rseg,
1218+
trx_sys->pending_purge_rseg_array);
12201219
}
12211220
}
12221221

storage/innobase/trx/trx0undo.cc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2181,6 +2181,24 @@ trx_undo_truncate_tablespace(
21812181
rseg->last_trx_no = 0;
21822182
rseg->last_del_marks = FALSE;
21832183
}
2184+
2185+
/* During Upgrade, existing rsegs in range from slot-1....slot-32
2186+
were added into the array pending_purge_rseg_array[]. These rsegs also
2187+
reside in system or undo tablespace. */
2188+
trx_sysf_t* sys_header = trx_sysf_get(&mtr);
2189+
for (ulint i = 0; i < TRX_SYS_N_RSEGS; ++i) {
2190+
trx_rseg_t* rseg = trx_sys->pending_purge_rseg_array[i];
2191+
if(rseg != NULL
2192+
&& rseg->space == undo_trunc->get_marked_space_id()) {
2193+
/* Reset the rollback segment slot in the trx
2194+
system header */
2195+
trx_sysf_rseg_set_page_no(
2196+
sys_header, rseg->id, FIL_NULL, &mtr);
2197+
/* Free a pending rollback segment instance in memory */
2198+
trx_rseg_mem_free(rseg,
2199+
trx_sys->pending_purge_rseg_array);
2200+
}
2201+
}
21842202
mtr_commit(&mtr);
21852203

21862204
return(success);

0 commit comments

Comments
 (0)