Skip to content

Commit 525392d

Browse files
committed
Don't lock partitions pruned by initial pruning
Before executing a cached generic plan, AcquireExecutorLocks() in plancache.c locks all relations in a plan's range table to ensure the plan is safe for execution. However, this locks runtime-prunable relations that will later be pruned during "initial" runtime pruning, introducing unnecessary overhead. This commit defers locking for such relations to executor startup and ensures that if the CachedPlan is invalidated due to concurrent DDL during this window, replanning is triggered. Deferring these locks avoids unnecessary locking overhead for pruned partitions, resulting in significant speedup, particularly when many partitions are pruned during initial runtime pruning. * Changes to locking when executing generic plans: AcquireExecutorLocks() now locks only unprunable relations, that is, those found in PlannedStmt.unprunableRelids (introduced in commit cbc1279), to avoid locking runtime-prunable partitions unnecessarily. The remaining locks are taken by ExecDoInitialPruning(), which acquires them only for partitions that survive pruning. This deferral does not affect the locks required for permission checking in InitPlan(), which takes place before initial pruning. ExecCheckPermissions() now includes an Assert to verify that all relations undergoing permission checks, none of which can be in the set of runtime-prunable relations, are properly locked. * Plan invalidation handling: Deferring locks introduces a window where prunable relations may be altered by concurrent DDL, invalidating the plan. A new function, ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle invalidation caused by deferred locking. If invalidation occurs, ExecutorStartCachedPlan() updates CachedPlan using the new UpdateCachedPlan() function and retries execution with the updated plan. To ensure all code paths that may be affected by this handle invalidation properly, all callers of ExecutorStart that may execute a PlannedStmt from a CachedPlan have been updated to use ExecutorStartCachedPlan() instead. UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new CachedPlan.stmt_context, created as a child of CachedPlan.context, allows freeing old PlannedStmts while preserving the CachedPlan structure and its statement list. This ensures that loops over statements in upstream callers of ExecutorStartCachedPlan() remain intact. ExecutorStart() and ExecutorStart_hook implementations now return a boolean value indicating whether plan initialization succeeded with a valid PlanState tree in QueryDesc.planstate, or false otherwise, in which case QueryDesc.planstate is NULL. Hook implementations are required to call standard_ExecutorStart() at the beginning, and if it returns false, they should do the same without proceeding. * Testing: To verify these changes, the delay_execution module tests scenarios where cached plans become invalid due to changes in prunable relations after deferred locks. * Note to extension authors: ExecutorStart_hook implementations must verify plan validity after calling standard_ExecutorStart(), as explained earlier. For example: if (prev_ExecutorStart) plan_valid = prev_ExecutorStart(queryDesc, eflags); else plan_valid = standard_ExecutorStart(queryDesc, eflags); if (!plan_valid) return false; <extension-code> return true; Extensions accessing child relations, especially prunable partitions, via ExecGetRangeTableRelation() must now ensure their RT indexes are present in es_unpruned_relids (introduced in commit cbc1279), or they will encounter an error. This is a strict requirement after this change, as only relations in that set are locked. The idea of deferring some locks to executor startup, allowing locks for prunable partitions to be skipped, was first proposed by Tom Lane. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
1 parent 4aa6fa3 commit 525392d

33 files changed

+1014
-95
lines changed

contrib/auto_explain/auto_explain.c

+12-4
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
7676
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
7777
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
7878

79-
static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
79+
static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
8080
static void explain_ExecutorRun(QueryDesc *queryDesc,
8181
ScanDirection direction,
8282
uint64 count);
@@ -256,9 +256,11 @@ _PG_init(void)
256256
/*
257257
* ExecutorStart hook: start up logging if needed
258258
*/
259-
static void
259+
static bool
260260
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
261261
{
262+
bool plan_valid;
263+
262264
/*
263265
* At the beginning of each top-level statement, decide whether we'll
264266
* sample this statement. If nested-statement explaining is enabled,
@@ -294,9 +296,13 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
294296
}
295297

296298
if (prev_ExecutorStart)
297-
prev_ExecutorStart(queryDesc, eflags);
299+
plan_valid = prev_ExecutorStart(queryDesc, eflags);
298300
else
299-
standard_ExecutorStart(queryDesc, eflags);
301+
plan_valid = standard_ExecutorStart(queryDesc, eflags);
302+
303+
/* The plan may have become invalid during standard_ExecutorStart() */
304+
if (!plan_valid)
305+
return false;
300306

301307
if (auto_explain_enabled())
302308
{
@@ -314,6 +320,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
314320
MemoryContextSwitchTo(oldcxt);
315321
}
316322
}
323+
324+
return true;
317325
}
318326

319327
/*

contrib/pg_stat_statements/pg_stat_statements.c

+12-4
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ static PlannedStmt *pgss_planner(Query *parse,
333333
const char *query_string,
334334
int cursorOptions,
335335
ParamListInfo boundParams);
336-
static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
336+
static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
337337
static void pgss_ExecutorRun(QueryDesc *queryDesc,
338338
ScanDirection direction,
339339
uint64 count);
@@ -987,13 +987,19 @@ pgss_planner(Query *parse,
987987
/*
988988
* ExecutorStart hook: start up tracking if needed
989989
*/
990-
static void
990+
static bool
991991
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
992992
{
993+
bool plan_valid;
994+
993995
if (prev_ExecutorStart)
994-
prev_ExecutorStart(queryDesc, eflags);
996+
plan_valid = prev_ExecutorStart(queryDesc, eflags);
995997
else
996-
standard_ExecutorStart(queryDesc, eflags);
998+
plan_valid = standard_ExecutorStart(queryDesc, eflags);
999+
1000+
/* The plan may have become invalid during standard_ExecutorStart() */
1001+
if (!plan_valid)
1002+
return false;
9971003

9981004
/*
9991005
* If query has queryId zero, don't track it. This prevents double
@@ -1016,6 +1022,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
10161022
MemoryContextSwitchTo(oldcxt);
10171023
}
10181024
}
1025+
1026+
return true;
10191027
}
10201028

10211029
/*

src/backend/commands/copyto.c

+3-2
Original file line numberDiff line numberDiff line change
@@ -556,7 +556,7 @@ BeginCopyTo(ParseState *pstate,
556556
((DR_copy *) dest)->cstate = cstate;
557557

558558
/* Create a QueryDesc requesting no output */
559-
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
559+
cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
560560
GetActiveSnapshot(),
561561
InvalidSnapshot,
562562
dest, NULL, NULL, 0);
@@ -566,7 +566,8 @@ BeginCopyTo(ParseState *pstate,
566566
*
567567
* ExecutorStart computes a result tupdesc for us
568568
*/
569-
ExecutorStart(cstate->queryDesc, 0);
569+
if (!ExecutorStart(cstate->queryDesc, 0))
570+
elog(ERROR, "ExecutorStart() failed unexpectedly");
570571

571572
tupDesc = cstate->queryDesc->tupDesc;
572573
}

src/backend/commands/createas.c

+3-2
Original file line numberDiff line numberDiff line change
@@ -332,12 +332,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
332332
UpdateActiveSnapshotCommandId();
333333

334334
/* Create a QueryDesc, redirecting output to our tuple receiver */
335-
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
335+
queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
336336
GetActiveSnapshot(), InvalidSnapshot,
337337
dest, params, queryEnv, 0);
338338

339339
/* call ExecutorStart to prepare the plan for execution */
340-
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
340+
if (!ExecutorStart(queryDesc, GetIntoRelEFlags(into)))
341+
elog(ERROR, "ExecutorStart() failed unexpectedly");
341342

342343
/* run the plan to completion */
343344
ExecutorRun(queryDesc, ForwardScanDirection, 0);

src/backend/commands/explain.c

+17-5
Original file line numberDiff line numberDiff line change
@@ -519,7 +519,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
519519
}
520520

521521
/* run it (if needed) and produce output */
522-
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
522+
ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
523+
queryEnv,
523524
&planduration, (es->buffers ? &bufusage : NULL),
524525
es->memory ? &mem_counters : NULL);
525526
}
@@ -641,7 +642,9 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
641642
* to call it.
642643
*/
643644
void
644-
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
645+
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
646+
CachedPlanSource *plansource, int query_index,
647+
IntoClause *into, ExplainState *es,
645648
const char *queryString, ParamListInfo params,
646649
QueryEnvironment *queryEnv, const instr_time *planduration,
647650
const BufferUsage *bufusage,
@@ -697,7 +700,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
697700
dest = None_Receiver;
698701

699702
/* Create a QueryDesc for the query */
700-
queryDesc = CreateQueryDesc(plannedstmt, queryString,
703+
queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
701704
GetActiveSnapshot(), InvalidSnapshot,
702705
dest, params, queryEnv, instrument_option);
703706

@@ -711,8 +714,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
711714
if (into)
712715
eflags |= GetIntoRelEFlags(into);
713716

714-
/* call ExecutorStart to prepare the plan for execution */
715-
ExecutorStart(queryDesc, eflags);
717+
/* Prepare the plan for execution. */
718+
if (queryDesc->cplan)
719+
{
720+
ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
721+
Assert(queryDesc->planstate);
722+
}
723+
else
724+
{
725+
if (!ExecutorStart(queryDesc, eflags))
726+
elog(ERROR, "ExecutorStart() failed unexpectedly");
727+
}
716728

717729
/* Execute the plan for statistics if asked for */
718730
if (es->analyze)

src/backend/commands/extension.c

+3-1
Original file line numberDiff line numberDiff line change
@@ -907,11 +907,13 @@ execute_sql_string(const char *sql, const char *filename)
907907
QueryDesc *qdesc;
908908

909909
qdesc = CreateQueryDesc(stmt,
910+
NULL,
910911
sql,
911912
GetActiveSnapshot(), NULL,
912913
dest, NULL, NULL, 0);
913914

914-
ExecutorStart(qdesc, 0);
915+
if (!ExecutorStart(qdesc, 0))
916+
elog(ERROR, "ExecutorStart() failed unexpectedly");
915917
ExecutorRun(qdesc, ForwardScanDirection, 0);
916918
ExecutorFinish(qdesc);
917919
ExecutorEnd(qdesc);

src/backend/commands/matview.c

+3-2
Original file line numberDiff line numberDiff line change
@@ -438,12 +438,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
438438
UpdateActiveSnapshotCommandId();
439439

440440
/* Create a QueryDesc, redirecting output to our tuple receiver */
441-
queryDesc = CreateQueryDesc(plan, queryString,
441+
queryDesc = CreateQueryDesc(plan, NULL, queryString,
442442
GetActiveSnapshot(), InvalidSnapshot,
443443
dest, NULL, NULL, 0);
444444

445445
/* call ExecutorStart to prepare the plan for execution */
446-
ExecutorStart(queryDesc, 0);
446+
if (!ExecutorStart(queryDesc, 0))
447+
elog(ERROR, "ExecutorStart() failed unexpectedly");
447448

448449
/* run the plan */
449450
ExecutorRun(queryDesc, ForwardScanDirection, 0);

src/backend/commands/portalcmds.c

+1
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
117117
queryString,
118118
CMDTAG_SELECT, /* cursor's query is always a SELECT */
119119
list_make1(plan),
120+
NULL,
120121
NULL);
121122

122123
/*----------

src/backend/commands/prepare.c

+7-2
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
202202
query_string,
203203
entry->plansource->commandTag,
204204
plan_list,
205-
cplan);
205+
cplan,
206+
entry->plansource);
206207

207208
/*
208209
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -582,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
582583
MemoryContextCounters mem_counters;
583584
MemoryContext planner_ctx = NULL;
584585
MemoryContext saved_ctx = NULL;
586+
int query_index = 0;
585587

586588
if (es->memory)
587589
{
@@ -654,7 +656,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
654656
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
655657

656658
if (pstmt->commandType != CMD_UTILITY)
657-
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
659+
ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
660+
into, es, query_string, paramLI, pstate->p_queryEnv,
658661
&planduration, (es->buffers ? &bufusage : NULL),
659662
es->memory ? &mem_counters : NULL);
660663
else
@@ -665,6 +668,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
665668
/* Separate plans with an appropriate separator */
666669
if (lnext(plan_list, p) != NULL)
667670
ExplainSeparatePlans(es);
671+
672+
query_index++;
668673
}
669674

670675
if (estate)

src/backend/commands/trigger.c

+15
Original file line numberDiff line numberDiff line change
@@ -5057,6 +5057,21 @@ AfterTriggerBeginQuery(void)
50575057
}
50585058

50595059

5060+
/* ----------
5061+
* AfterTriggerAbortQuery()
5062+
*
5063+
* Called by standard_ExecutorEnd() if the query execution was aborted due to
5064+
* the plan becoming invalid during initialization.
5065+
* ----------
5066+
*/
5067+
void
5068+
AfterTriggerAbortQuery(void)
5069+
{
5070+
/* Revert the actions of AfterTriggerBeginQuery(). */
5071+
afterTriggers.query_depth--;
5072+
}
5073+
5074+
50605075
/* ----------
50615076
* AfterTriggerEndQuery()
50625077
*

src/backend/executor/README

+32-3
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
280280
associated with ExprContexts, and commonly each PlanState node has its own
281281
ExprContext to evaluate its qual and targetlist expressions in.
282282

283+
Relation Locking
284+
----------------
285+
286+
When the executor initializes a plan tree for execution, it doesn't lock
287+
non-index relations if the plan tree is freshly generated and not derived
288+
from a CachedPlan. This is because such locks have already been established
289+
during the query's parsing, rewriting, and planning phases. However, with a
290+
cached plan tree, some relations may remain unlocked. The function
291+
AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
292+
the locking of prunable ones to executor initialization. This avoids
293+
unnecessary locking of relations that will be pruned during "initial" runtime
294+
pruning in ExecDoInitialPruning().
295+
296+
This approach creates a window where a cached plan tree with child tables
297+
could become outdated if another backend modifies these tables before
298+
ExecDoInitialPruning() locks them. As a result, the executor has the added duty
299+
to verify the plan tree's validity whenever it locks a child table after
300+
doing initial pruning. This validation is done by checking the CachedPlan.is_valid
301+
flag. If the plan tree is outdated (is_valid = false), the executor stops
302+
further initialization, cleans up anything in EState that would have been
303+
allocated up to that point, and retries execution after recreating the
304+
invalid plan in the CachedPlan. See ExecutorStartCachedPlan().
283305

284306
Query Processing Control Flow
285307
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
288310

289311
CreateQueryDesc
290312

291-
ExecutorStart
313+
ExecutorStart or ExecutorStartCachedPlan
292314
CreateExecutorState
293315
creates per-query context
294-
switch to per-query context to run ExecInitNode
316+
switch to per-query context to run ExecDoInitialPruning and ExecInitNode
295317
AfterTriggerBeginQuery
318+
ExecDoInitialPruning
319+
does initial pruning and locks surviving partitions if needed
296320
ExecInitNode --- recursively scans plan tree
297321
ExecInitNode
298322
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
316340

317341
FreeQueryDesc
318342

319-
Per above comments, it's not really critical for ExecEndNode to free any
343+
As mentioned in the "Relation Locking" section, if the plan tree is found to
344+
be stale after locking partitions in ExecDoInitialPruning(), the control is
345+
immediately returned to ExecutorStartCachedPlan(), which will create a new plan
346+
tree and perform the steps starting from CreateExecutorState() again.
347+
348+
Per above comments, it's not really critical for ExecEndPlan to free any
320349
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
321350
be careful to close relations, drop buffer pins, etc, so we do need to scan
322351
the plan state tree to find these sorts of resources.

0 commit comments

Comments
 (0)