Skip to content

Commit 5dcaefc

Browse files
committed
Fix creation of partition descriptor during concurrent detach
When a partition is being detached in concurrent mode, it is possible for find_inheritance_children_extended() to return that partition in the list, and immediately after that receive an invalidation message that sets its relpartbound to NULL just before we read it. (This can happen because table_open() reads invalidation messages.) Currently we raise an error ERROR: missing relpartbound for relation %u about the situation, but that's bogus because the table is no longer a partition, so we shouldn't be complaining about it. A better reaction is to retry the find_inheritance_children_extended call to get a new list, which will no longer have the partition being detached. Noticed while investigating bug #18377. Backpatch to 14, where DETACH CONCURRENTLY appeared. Discussion: https://postgr.es/m/202405201616.y4ht2qe5ihoy@alvherre.pgsql
1 parent 5f200ab commit 5dcaefc

File tree

1 file changed

+40
-13
lines changed

1 file changed

+40
-13
lines changed

src/backend/partitioning/partdesc.c

Lines changed: 40 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -146,16 +146,19 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
146146
ListCell *cell;
147147
int i,
148148
nparts;
149+
bool retried = false;
149150
PartitionKey key = RelationGetPartitionKey(rel);
150151
MemoryContext new_pdcxt;
151152
MemoryContext oldcxt;
152153
int *mapping;
153154

155+
retry:
156+
154157
/*
155158
* Get partition oids from pg_inherits. This uses a single snapshot to
156159
* fetch the list of children, so while more children may be getting added
157-
* concurrently, whatever this function returns will be accurate as of
158-
* some well-defined point in time.
160+
* or removed concurrently, whatever this function returns will be
161+
* accurate as of some well-defined point in time.
159162
*/
160163
detached_exist = false;
161164
detached_xmin = InvalidTransactionId;
@@ -198,18 +201,23 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
198201
}
199202

200203
/*
201-
* The system cache may be out of date; if so, we may find no pg_class
202-
* tuple or an old one where relpartbound is NULL. In that case, try
203-
* the table directly. We can't just AcceptInvalidationMessages() and
204-
* retry the system cache lookup because it's possible that a
205-
* concurrent ATTACH PARTITION operation has removed itself from the
206-
* ProcArray but not yet added invalidation messages to the shared
207-
* queue; InvalidateSystemCaches() would work, but seems excessive.
204+
* Two problems are possible here. First, a concurrent ATTACH
205+
* PARTITION might be in the process of adding a new partition, but
206+
* the syscache doesn't have it, or its copy of it does not yet have
207+
* its relpartbound set. We cannot just AcceptInvalidationMessages(),
208+
* because the other process might have already removed itself from
209+
* the ProcArray but not yet added its invalidation messages to the
210+
* shared queue. We solve this problem by reading pg_class directly
211+
* for the desired tuple.
208212
*
209-
* Note that this algorithm assumes that PartitionBoundSpec we manage
210-
* to fetch is the right one -- so this is only good enough for
211-
* concurrent ATTACH PARTITION, not concurrent DETACH PARTITION or
212-
* some hypothetical operation that changes the partition bounds.
213+
* The other problem is that DETACH CONCURRENTLY is in the process of
214+
* removing a partition, which happens in two steps: first it marks it
215+
* as "detach pending", commits, then unsets relpartbound. If
216+
* find_inheritance_children_extended included that partition but we
217+
* below we see that DETACH CONCURRENTLY has reset relpartbound for
218+
* it, we'd see an inconsistent view. (The inconsistency is seen
219+
* because table_open below reads invalidation messages.) We protect
220+
* against this by retrying find_inheritance_children_extended().
213221
*/
214222
if (boundspec == NULL)
215223
{
@@ -233,6 +241,25 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
233241
boundspec = stringToNode(TextDatumGetCString(datum));
234242
systable_endscan(scan);
235243
table_close(pg_class, AccessShareLock);
244+
245+
/*
246+
* If we still don't get a relpartbound value, then it must be
247+
* because of DETACH CONCURRENTLY. Restart from the top, as
248+
* explained above. We only do this once, for two reasons: first,
249+
* only one DETACH CONCURRENTLY session could affect us at a time,
250+
* since each of them would have to wait for the snapshot under
251+
* which this is running; and second, to avoid possible infinite
252+
* loops in case of catalog corruption.
253+
*
254+
* Note that the current memory context is short-lived enough, so
255+
* we needn't worry about memory leaks here.
256+
*/
257+
if (!boundspec && !retried)
258+
{
259+
AcceptInvalidationMessages();
260+
retried = true;
261+
goto retry;
262+
}
236263
}
237264

238265
/* Sanity checks. */

0 commit comments

Comments
 (0)