Skip to content

Commit d98547e

Browse files
committed
Protect against SnapshotNow race conditions in pg_tablespace scans.
Use of SnapshotNow is known to expose us to race conditions if the tuple(s) being sought could be updated by concurrently-committing transactions. CREATE DATABASE and DROP DATABASE are particularly exposed because they do heavyweight filesystem operations during their scans of pg_tablespace, so that the scans run for a very long time compared to most. Furthermore, the potential consequences of a missed or twice-visited row are nastier than average: * createdb() could fail with a bogus "file already exists" error, or silently fail to copy one or more tablespace's worth of files into the new database. * remove_dbtablespaces() could miss one or more tablespaces, thus failing to free filesystem space for the dropped database. * check_db_file_conflict() could likewise miss a tablespace, leading to an OID conflict that could result in data loss either immediately or in future operations. (This seems of very low probability, though, since a duplicate database OID would be unlikely to start with.) Hence, it seems worth fixing these three places to use MVCC snapshots, even though this will someday be superseded by a generic solution to SnapshotNow race conditions. Back-patch to all active branches. Stephen Frost and Tom Lane
1 parent 8d1fbf9 commit d98547e

File tree

1 file changed

+54
-3
lines changed

1 file changed

+54
-3
lines changed

src/backend/commands/dbcommands.c

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ createdb(const CreatedbStmt *stmt)
129129
int notherbackends;
130130
int npreparedxacts;
131131
createdb_failure_params fparms;
132+
Snapshot snapshot;
132133

133134
/* Extract options from the statement node tree */
134135
foreach(option, stmt->options)
@@ -532,6 +533,29 @@ createdb(const CreatedbStmt *stmt)
532533
*/
533534
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
534535

536+
/*
537+
* Take an MVCC snapshot to use while scanning through pg_tablespace. For
538+
* safety, register the snapshot (this prevents it from changing if
539+
* something else were to request a snapshot during the loop).
540+
*
541+
* Traversing pg_tablespace with an MVCC snapshot is necessary to provide
542+
* us with a consistent view of the tablespaces that exist. Using
543+
* SnapshotNow here would risk seeing the same tablespace multiple times,
544+
* or worse not seeing a tablespace at all, if its tuple is moved around
545+
* by a concurrent update (eg an ACL change).
546+
*
547+
* Inconsistency of this sort is inherent to all SnapshotNow scans, unless
548+
* some lock is held to prevent concurrent updates of the rows being
549+
* sought. There should be a generic fix for that, but in the meantime
550+
* it's worth fixing this case in particular because we are doing very
551+
* heavyweight operations within the scan, so that the elapsed time for
552+
* the scan is vastly longer than for most other catalog scans. That
553+
* means there's a much wider window for concurrent updates to cause
554+
* trouble here than anywhere else. XXX this code should be changed
555+
* whenever a generic fix is implemented.
556+
*/
557+
snapshot = RegisterSnapshot(GetLatestSnapshot());
558+
535559
/*
536560
* Once we start copying subdirectories, we need to be able to clean 'em
537561
* up if we fail. Use an ENSURE block to make sure this happens. (This
@@ -549,7 +573,7 @@ createdb(const CreatedbStmt *stmt)
549573
* each one to the new database.
550574
*/
551575
rel = heap_open(TableSpaceRelationId, AccessShareLock);
552-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
576+
scan = heap_beginscan(rel, snapshot, 0, NULL);
553577
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
554578
{
555579
Oid srctablespace = HeapTupleGetOid(tuple);
@@ -653,6 +677,9 @@ createdb(const CreatedbStmt *stmt)
653677
}
654678
PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
655679
PointerGetDatum(&fparms));
680+
681+
/* Free our snapshot */
682+
UnregisterSnapshot(snapshot);
656683
}
657684

658685
/*
@@ -1698,9 +1725,20 @@ remove_dbtablespaces(Oid db_id)
16981725
Relation rel;
16991726
HeapScanDesc scan;
17001727
HeapTuple tuple;
1728+
Snapshot snapshot;
1729+
1730+
/*
1731+
* As in createdb(), we'd better use an MVCC snapshot here, since this
1732+
* scan can run for a long time. Duplicate visits to tablespaces would be
1733+
* harmless, but missing a tablespace could result in permanently leaked
1734+
* files.
1735+
*
1736+
* XXX change this when a generic fix for SnapshotNow races is implemented
1737+
*/
1738+
snapshot = RegisterSnapshot(GetLatestSnapshot());
17011739

17021740
rel = heap_open(TableSpaceRelationId, AccessShareLock);
1703-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
1741+
scan = heap_beginscan(rel, snapshot, 0, NULL);
17041742
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
17051743
{
17061744
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1746,6 +1784,7 @@ remove_dbtablespaces(Oid db_id)
17461784

17471785
heap_endscan(scan);
17481786
heap_close(rel, AccessShareLock);
1787+
UnregisterSnapshot(snapshot);
17491788
}
17501789

17511790
/*
@@ -1767,9 +1806,19 @@ check_db_file_conflict(Oid db_id)
17671806
Relation rel;
17681807
HeapScanDesc scan;
17691808
HeapTuple tuple;
1809+
Snapshot snapshot;
1810+
1811+
/*
1812+
* As in createdb(), we'd better use an MVCC snapshot here; missing a
1813+
* tablespace could result in falsely reporting the OID is unique, with
1814+
* disastrous future consequences per the comment above.
1815+
*
1816+
* XXX change this when a generic fix for SnapshotNow races is implemented
1817+
*/
1818+
snapshot = RegisterSnapshot(GetLatestSnapshot());
17701819

17711820
rel = heap_open(TableSpaceRelationId, AccessShareLock);
1772-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
1821+
scan = heap_beginscan(rel, snapshot, 0, NULL);
17731822
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
17741823
{
17751824
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1795,6 +1844,8 @@ check_db_file_conflict(Oid db_id)
17951844

17961845
heap_endscan(scan);
17971846
heap_close(rel, AccessShareLock);
1847+
UnregisterSnapshot(snapshot);
1848+
17981849
return result;
17991850
}
18001851

0 commit comments

Comments
 (0)