Skip to content

Commit fb0fb07

Browse files
committed
Fix partition pruning setup during DETACH CONCURRENTLY
When detaching partition in concurrent mode, it's possible for partition descriptors to not match the set that was recently seen when the plan was made, causing an assertion failure or (in production builds) failure to construct a working plan. The case that was reported involves prepared statements, but I think it may be possible to hit this bug without that too. The problem is that CreatePartitionPruneState is constructing a PartitionPruneState under the assumption that new partitions can be added, but never removed, but it turns out that this isn't true: a prepared statement gets replanned when the DETACH CONCURRENTLY session sends out its invalidation message, but if the invalidation message arrives after ExecInitAppend started, we would build a partition descriptor without the partition, and then CreatePartitionPruneState would refuse to work with it. CreatePartitionPruneState already contains code to deal with the new descriptor having more partitions than before (and behaving for the extra partitions as if they had been pruned), but doesn't have code to deal with less partitions than before, and it is naïve about the case where the number of partitions is the same. We could simply add that a new stanza for less partitions than before, and in simple testing it works to do that; but it's possible to press the test scripts even further and hit the case where one partition is added and a partition is removed quickly enough that we see the same number of partitions, but they don't actually match, causing hangs during execution. To cope with both these problems, we now memcmp() the arrays of partition OIDs, and do a more elaborate mapping (relying on the fact that both OID arrays are in partition-bounds order) if they're not identical. Backpatch to 14, where DETACH CONCURRENTLY appeared. Reported-by: yajun Hu <1026592243@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
1 parent a99b2cc commit fb0fb07

File tree

1 file changed

+60
-50
lines changed

1 file changed

+60
-50
lines changed

src/backend/executor/execPartition.c

Lines changed: 60 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1786,67 +1786,58 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
17861786
/*
17871787
* Initialize the subplan_map and subpart_map.
17881788
*
1789-
* Because we request detached partitions to be included, and
1790-
* detaching waits for old transactions, it is safe to assume that
1791-
* no partitions have disappeared since this query was planned.
1789+
* The set of partitions that exist now might not be the same that
1790+
* existed when the plan was made. The normal case is that it is;
1791+
* optimize for that case with a quick comparison, and just copy
1792+
* the subplan_map and make subpart_map point to the one in
1793+
* PruneInfo.
17921794
*
1793-
* However, new partitions may have been added.
1795+
* For the case where they aren't identical, we could have more
1796+
* partitions on either side; or even exactly the same number of
1797+
* them on both but the set of OIDs doesn't match fully. Handle
1798+
* this by creating new subplan_map and subpart_map arrays that
1799+
* corresponds to the ones in the PruneInfo where the new
1800+
* partition descriptor's OIDs match. Any that don't match can be
1801+
* set to -1, as if they were pruned. Both arrays must be in
1802+
* numerical OID order.
17941803
*/
1795-
Assert(partdesc->nparts >= pinfo->nparts);
17961804
pprune->nparts = partdesc->nparts;
17971805
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
1798-
if (partdesc->nparts == pinfo->nparts)
1806+
1807+
if (partdesc->nparts == pinfo->nparts &&
1808+
memcmp(partdesc->oids, pinfo->relid_map,
1809+
sizeof(int) * partdesc->nparts) == 0)
17991810
{
1800-
/*
1801-
* There are no new partitions, so this is simple. We can
1802-
* simply point to the subpart_map from the plan, but we must
1803-
* copy the subplan_map since we may change it later.
1804-
*/
18051811
pprune->subpart_map = pinfo->subpart_map;
18061812
memcpy(pprune->subplan_map, pinfo->subplan_map,
18071813
sizeof(int) * pinfo->nparts);
1808-
1809-
/*
1810-
* Double-check that the list of unpruned relations has not
1811-
* changed. (Pruned partitions are not in relid_map[].)
1812-
*/
1813-
#ifdef USE_ASSERT_CHECKING
1814-
for (int k = 0; k < pinfo->nparts; k++)
1815-
{
1816-
Assert(partdesc->oids[k] == pinfo->relid_map[k] ||
1817-
pinfo->subplan_map[k] == -1);
1818-
}
1819-
#endif
18201814
}
18211815
else
18221816
{
18231817
int pd_idx = 0;
18241818
int pp_idx;
18251819

18261820
/*
1827-
* Some new partitions have appeared since plan time, and
1828-
* those are reflected in our PartitionDesc but were not
1829-
* present in the one used to construct subplan_map and
1830-
* subpart_map. So we must construct new and longer arrays
1831-
* where the partitions that were originally present map to
1832-
* the same sub-structures, and any added partitions map to
1833-
* -1, as if the new partitions had been pruned.
1821+
* When the partition arrays are not identical, there could be
1822+
* some new ones but it's also possible that one was removed;
1823+
* we cope with both situations by walking the arrays and
1824+
* discarding those that don't match.
18341825
*
1835-
* Note: pinfo->relid_map[] may contain InvalidOid entries for
1836-
* partitions pruned by the planner. We cannot tell exactly
1837-
* which of the partdesc entries these correspond to, but we
1838-
* don't have to; just skip over them. The non-pruned
1839-
* relid_map entries, however, had better be a subset of the
1840-
* partdesc entries and in the same order.
1826+
* If the number of partitions on both sides match, it's still
1827+
* possible that one partition has been detached and another
1828+
* attached. Cope with that by creating a map that skips any
1829+
* mismatches.
18411830
*/
18421831
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
1832+
18431833
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
18441834
{
18451835
/* Skip any InvalidOid relid_map entries */
18461836
while (pd_idx < pinfo->nparts &&
18471837
!OidIsValid(pinfo->relid_map[pd_idx]))
18481838
pd_idx++;
18491839

1840+
recheck:
18501841
if (pd_idx < pinfo->nparts &&
18511842
pinfo->relid_map[pd_idx] == partdesc->oids[pp_idx])
18521843
{
@@ -1856,24 +1847,43 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
18561847
pprune->subpart_map[pp_idx] =
18571848
pinfo->subpart_map[pd_idx];
18581849
pd_idx++;
1850+
continue;
18591851
}
1860-
else
1852+
1853+
/*
1854+
* There isn't an exact match in the corresponding
1855+
* positions of both arrays. Peek ahead in
1856+
* pinfo->relid_map to see if we have a match for the
1857+
* current partition in partdesc. Normally if a match
1858+
* exists it's just one element ahead, and it means the
1859+
* planner saw one extra partition that we no longer see
1860+
* now (its concurrent detach finished just in between);
1861+
* so we skip that one by updating pd_idx to the new
1862+
* location and jumping above. We can then continue to
1863+
* match the rest of the elements after skipping the OID
1864+
* with no match; no future matches are tried for the
1865+
* element that was skipped, because we know the arrays to
1866+
* be in the same order.
1867+
*
1868+
* If we don't see a match anywhere in the rest of the
1869+
* pinfo->relid_map array, that means we see an element
1870+
* now that the planner didn't see, so mark that one as
1871+
* pruned and move on.
1872+
*/
1873+
for (int pd_idx2 = pd_idx + 1; pd_idx2 < pinfo->nparts; pd_idx2++)
18611874
{
1862-
/* this partdesc entry is not in the plan */
1863-
pprune->subplan_map[pp_idx] = -1;
1864-
pprune->subpart_map[pp_idx] = -1;
1875+
if (pd_idx2 >= pinfo->nparts)
1876+
break;
1877+
if (pinfo->relid_map[pd_idx2] == partdesc->oids[pp_idx])
1878+
{
1879+
pd_idx = pd_idx2;
1880+
goto recheck;
1881+
}
18651882
}
1866-
}
18671883

1868-
/*
1869-
* It might seem that we need to skip any trailing InvalidOid
1870-
* entries in pinfo->relid_map before checking that we scanned
1871-
* all of the relid_map. But we will have skipped them above,
1872-
* because they must correspond to some partdesc->oids
1873-
* entries; we just couldn't tell which.
1874-
*/
1875-
if (pd_idx != pinfo->nparts)
1876-
elog(ERROR, "could not match partition child tables to plan elements");
1884+
pprune->subpart_map[pp_idx] = -1;
1885+
pprune->subplan_map[pp_idx] = -1;
1886+
}
18771887
}
18781888

18791889
/* present_parts is also subject to later modification */

0 commit comments

Comments
 (0)