Skip to content

Commit 92c1f91

Browse files
committed
Protect against SnapshotNow race conditions in pg_tablespace scans.
Use of SnapshotNow is known to expose us to race conditions if the tuple(s) being sought could be updated by concurrently-committing transactions. CREATE DATABASE and DROP DATABASE are particularly exposed because they do heavyweight filesystem operations during their scans of pg_tablespace, so that the scans run for a very long time compared to most. Furthermore, the potential consequences of a missed or twice-visited row are nastier than average: * createdb() could fail with a bogus "file already exists" error, or silently fail to copy one or more tablespace's worth of files into the new database. * remove_dbtablespaces() could miss one or more tablespaces, thus failing to free filesystem space for the dropped database. * check_db_file_conflict() could likewise miss a tablespace, leading to an OID conflict that could result in data loss either immediately or in future operations. (This seems of very low probability, though, since a duplicate database OID would be unlikely to start with.) Hence, it seems worth fixing these three places to use MVCC snapshots, even though this will someday be superseded by a generic solution to SnapshotNow race conditions. Back-patch to all active branches. Stephen Frost and Tom Lane
1 parent 801babf commit 92c1f91

File tree

1 file changed

+54
-3
lines changed

1 file changed

+54
-3
lines changed

src/backend/commands/dbcommands.c

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,7 @@ createdb(const CreatedbStmt *stmt)
132132
int notherbackends;
133133
int npreparedxacts;
134134
createdb_failure_params fparms;
135+
Snapshot snapshot;
135136

136137
/* Extract options from the statement node tree */
137138
foreach(option, stmt->options)
@@ -587,6 +588,29 @@ createdb(const CreatedbStmt *stmt)
587588
*/
588589
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
589590

591+
/*
592+
* Take an MVCC snapshot to use while scanning through pg_tablespace. For
593+
* safety, register the snapshot (this prevents it from changing if
594+
* something else were to request a snapshot during the loop).
595+
*
596+
* Traversing pg_tablespace with an MVCC snapshot is necessary to provide
597+
* us with a consistent view of the tablespaces that exist. Using
598+
* SnapshotNow here would risk seeing the same tablespace multiple times,
599+
* or worse not seeing a tablespace at all, if its tuple is moved around
600+
* by a concurrent update (eg an ACL change).
601+
*
602+
* Inconsistency of this sort is inherent to all SnapshotNow scans, unless
603+
* some lock is held to prevent concurrent updates of the rows being
604+
* sought. There should be a generic fix for that, but in the meantime
605+
* it's worth fixing this case in particular because we are doing very
606+
* heavyweight operations within the scan, so that the elapsed time for
607+
* the scan is vastly longer than for most other catalog scans. That
608+
* means there's a much wider window for concurrent updates to cause
609+
* trouble here than anywhere else. XXX this code should be changed
610+
* whenever a generic fix is implemented.
611+
*/
612+
snapshot = RegisterSnapshot(GetLatestSnapshot());
613+
590614
/*
591615
* Once we start copying subdirectories, we need to be able to clean 'em
592616
* up if we fail. Use an ENSURE block to make sure this happens. (This
@@ -604,7 +628,7 @@ createdb(const CreatedbStmt *stmt)
604628
* each one to the new database.
605629
*/
606630
rel = heap_open(TableSpaceRelationId, AccessShareLock);
607-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
631+
scan = heap_beginscan(rel, snapshot, 0, NULL);
608632
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
609633
{
610634
Oid srctablespace = HeapTupleGetOid(tuple);
@@ -708,6 +732,9 @@ createdb(const CreatedbStmt *stmt)
708732
}
709733
PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
710734
PointerGetDatum(&fparms));
735+
736+
/* Free our snapshot */
737+
UnregisterSnapshot(snapshot);
711738
}
712739

713740
/* Error cleanup callback for createdb */
@@ -1692,9 +1719,20 @@ remove_dbtablespaces(Oid db_id)
16921719
Relation rel;
16931720
HeapScanDesc scan;
16941721
HeapTuple tuple;
1722+
Snapshot snapshot;
1723+
1724+
/*
1725+
* As in createdb(), we'd better use an MVCC snapshot here, since this
1726+
* scan can run for a long time. Duplicate visits to tablespaces would be
1727+
* harmless, but missing a tablespace could result in permanently leaked
1728+
* files.
1729+
*
1730+
* XXX change this when a generic fix for SnapshotNow races is implemented
1731+
*/
1732+
snapshot = RegisterSnapshot(GetLatestSnapshot());
16951733

16961734
rel = heap_open(TableSpaceRelationId, AccessShareLock);
1697-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
1735+
scan = heap_beginscan(rel, snapshot, 0, NULL);
16981736
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
16991737
{
17001738
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1740,6 +1778,7 @@ remove_dbtablespaces(Oid db_id)
17401778

17411779
heap_endscan(scan);
17421780
heap_close(rel, AccessShareLock);
1781+
UnregisterSnapshot(snapshot);
17431782
}
17441783

17451784
/*
@@ -1761,9 +1800,19 @@ check_db_file_conflict(Oid db_id)
17611800
Relation rel;
17621801
HeapScanDesc scan;
17631802
HeapTuple tuple;
1803+
Snapshot snapshot;
1804+
1805+
/*
1806+
* As in createdb(), we'd better use an MVCC snapshot here; missing a
1807+
* tablespace could result in falsely reporting the OID is unique, with
1808+
* disastrous future consequences per the comment above.
1809+
*
1810+
* XXX change this when a generic fix for SnapshotNow races is implemented
1811+
*/
1812+
snapshot = RegisterSnapshot(GetLatestSnapshot());
17641813

17651814
rel = heap_open(TableSpaceRelationId, AccessShareLock);
1766-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
1815+
scan = heap_beginscan(rel, snapshot, 0, NULL);
17671816
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
17681817
{
17691818
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1789,6 +1838,8 @@ check_db_file_conflict(Oid db_id)
17891838

17901839
heap_endscan(scan);
17911840
heap_close(rel, AccessShareLock);
1841+
UnregisterSnapshot(snapshot);
1842+
17921843
return result;
17931844
}
17941845

0 commit comments

Comments
 (0)