Skip to content

Commit 3fb9310

Browse files
committed
Fix bugs in RETURNING in cross-partition UPDATE cases.
If the source and destination partitions don't have identical rowtypes (for example, one has dropped columns the other lacks), then the planSlot contents will be different because of that. If the query has a RETURNING list that tries to return resjunk columns out of the planSlot, that is columns from tables that were joined to the target table, we'd get errors or wrong answers. That's because we used the RETURNING list generated for the destination partition, which expects a planSlot matching that partition's subplan. The most practical fix seems to be to convert the updated destination tuple back to the source partition's rowtype, and then apply the RETURNING list generated for the source partition. This avoids making fragile assumptions about whether the per-subpartition subplans generated all the resjunk columns in the same order. This has been broken since v11 introduced cross-partition UPDATE. The lack of field complaints shows that non-identical partitions aren't a common case; therefore, don't stress too hard about making the conversion efficient. There's no such bug in HEAD, because commit 86dc900 got rid of per-target-relation variance in the contents of the planSlot. Hence, patch v11-v13 only. Amit Langote and Etsuro Fujita, small changes by me Discussion: https://postgr.es/m/CA+HiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA@mail.gmail.com
1 parent f7ac005 commit 3fb9310

File tree

3 files changed

+142
-12
lines changed

3 files changed

+142
-12
lines changed

src/backend/executor/nodeModifyTable.c

+77-12
Original file line numberDiff line numberDiff line change
@@ -149,34 +149,40 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
149149
/*
150150
* ExecProcessReturning --- evaluate a RETURNING list
151151
*
152-
* resultRelInfo: current result rel
152+
* projectReturning: the projection to evaluate
153+
* resultRelOid: result relation's OID
153154
* tupleSlot: slot holding tuple actually inserted/updated/deleted
154155
* planSlot: slot holding tuple returned by top subplan node
155156
*
157+
* In cross-partition UPDATE cases, projectReturning and planSlot are as
158+
* for the source partition, and tupleSlot must conform to that. But
159+
* resultRelOid is for the destination partition.
160+
*
156161
* Note: If tupleSlot is NULL, the FDW should have already provided econtext's
157162
* scan tuple.
158163
*
159164
* Returns a slot holding the result tuple
160165
*/
161166
static TupleTableSlot *
162-
ExecProcessReturning(ResultRelInfo *resultRelInfo,
167+
ExecProcessReturning(ProjectionInfo *projectReturning,
168+
Oid resultRelOid,
163169
TupleTableSlot *tupleSlot,
164170
TupleTableSlot *planSlot)
165171
{
166-
ProjectionInfo *projectReturning = resultRelInfo->ri_projectReturning;
167172
ExprContext *econtext = projectReturning->pi_exprContext;
168173

169174
/* Make tuple and any needed join variables available to ExecProject */
170175
if (tupleSlot)
171176
econtext->ecxt_scantuple = tupleSlot;
177+
else
178+
Assert(econtext->ecxt_scantuple);
172179
econtext->ecxt_outertuple = planSlot;
173180

174181
/*
175-
* RETURNING expressions might reference the tableoid column, so
176-
* reinitialize tts_tableOid before evaluating them.
182+
* RETURNING expressions might reference the tableoid column, so be sure
183+
* we expose the desired OID, ie that of the real target relation.
177184
*/
178-
econtext->ecxt_scantuple->tts_tableOid =
179-
RelationGetRelid(resultRelInfo->ri_RelationDesc);
185+
econtext->ecxt_scantuple->tts_tableOid = resultRelOid;
180186

181187
/* Compute the RETURNING expressions */
182188
return ExecProject(projectReturning);
@@ -343,13 +349,25 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
343349
* For INSERT, we have to insert the tuple into the target relation
344350
* and insert appropriate tuples into the index relations.
345351
*
352+
* slot contains the new tuple value to be stored.
353+
* planSlot is the output of the ModifyTable's subplan; we use it
354+
* to access "junk" columns that are not going to be stored.
355+
* In a cross-partition UPDATE, srcSlot is the slot that held the
356+
* updated tuple for the source relation; otherwise it's NULL.
357+
*
358+
* returningRelInfo is the resultRelInfo for the source relation of a
359+
* cross-partition UPDATE; otherwise it's the current result relation.
360+
* We use it to process RETURNING lists, for reasons explained below.
361+
*
346362
* Returns RETURNING result if any, otherwise NULL.
347363
* ----------------------------------------------------------------
348364
*/
349365
static TupleTableSlot *
350366
ExecInsert(ModifyTableState *mtstate,
351367
TupleTableSlot *slot,
352368
TupleTableSlot *planSlot,
369+
TupleTableSlot *srcSlot,
370+
ResultRelInfo *returningRelInfo,
353371
EState *estate,
354372
bool canSetTag)
355373
{
@@ -652,8 +670,46 @@ ExecInsert(ModifyTableState *mtstate,
652670
ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
653671

654672
/* Process RETURNING if present */
655-
if (resultRelInfo->ri_projectReturning)
656-
result = ExecProcessReturning(resultRelInfo, slot, planSlot);
673+
if (returningRelInfo->ri_projectReturning)
674+
{
675+
/*
676+
* In a cross-partition UPDATE with RETURNING, we have to use the
677+
* source partition's RETURNING list, because that matches the output
678+
* of the planSlot, while the destination partition might have
679+
* different resjunk columns. This means we have to map the
680+
* destination tuple back to the source's format so we can apply that
681+
* RETURNING list. This is expensive, but it should be an uncommon
682+
* corner case, so we won't spend much effort on making it fast.
683+
*
684+
* We assume that we can use srcSlot to hold the re-converted tuple.
685+
* Note that in the common case where the child partitions both match
686+
* the root's format, previous optimizations will have resulted in
687+
* slot and srcSlot being identical, cueing us that there's nothing to
688+
* do here.
689+
*/
690+
if (returningRelInfo != resultRelInfo && slot != srcSlot)
691+
{
692+
Relation srcRelationDesc = returningRelInfo->ri_RelationDesc;
693+
AttrNumber *map;
694+
695+
map = convert_tuples_by_name_map_if_req(RelationGetDescr(resultRelationDesc),
696+
RelationGetDescr(srcRelationDesc),
697+
gettext_noop("could not convert row type"));
698+
if (map)
699+
{
700+
TupleTableSlot *origSlot = slot;
701+
702+
slot = execute_attr_map_slot(map, slot, srcSlot);
703+
slot->tts_tid = origSlot->tts_tid;
704+
slot->tts_tableOid = origSlot->tts_tableOid;
705+
pfree(map);
706+
}
707+
}
708+
709+
result = ExecProcessReturning(returningRelInfo->ri_projectReturning,
710+
RelationGetRelid(resultRelationDesc),
711+
slot, planSlot);
712+
}
657713

658714
return result;
659715
}
@@ -1002,7 +1058,9 @@ ldelete:;
10021058
}
10031059
}
10041060

1005-
rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
1061+
rslot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
1062+
RelationGetRelid(resultRelationDesc),
1063+
slot, planSlot);
10061064

10071065
/*
10081066
* Before releasing the target tuple again, make sure rslot has a
@@ -1178,6 +1236,7 @@ lreplace:;
11781236
{
11791237
bool tuple_deleted;
11801238
TupleTableSlot *ret_slot;
1239+
TupleTableSlot *orig_slot = slot;
11811240
TupleTableSlot *epqslot = NULL;
11821241
PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
11831242
int map_index;
@@ -1284,6 +1343,7 @@ lreplace:;
12841343
mtstate->rootResultRelInfo, slot);
12851344

12861345
ret_slot = ExecInsert(mtstate, slot, planSlot,
1346+
orig_slot, resultRelInfo,
12871347
estate, canSetTag);
12881348

12891349
/* Revert ExecPrepareTupleRouting's node change. */
@@ -1480,7 +1540,9 @@ lreplace:;
14801540

14811541
/* Process RETURNING if present */
14821542
if (resultRelInfo->ri_projectReturning)
1483-
return ExecProcessReturning(resultRelInfo, slot, planSlot);
1543+
return ExecProcessReturning(resultRelInfo->ri_projectReturning,
1544+
RelationGetRelid(resultRelationDesc),
1545+
slot, planSlot);
14841546

14851547
return NULL;
14861548
}
@@ -2130,7 +2192,9 @@ ExecModifyTable(PlanState *pstate)
21302192
* ExecProcessReturning by IterateDirectModify, so no need to
21312193
* provide it here.
21322194
*/
2133-
slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
2195+
slot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
2196+
RelationGetRelid(resultRelInfo->ri_RelationDesc),
2197+
NULL, planSlot);
21342198

21352199
estate->es_result_relation_info = saved_resultRelInfo;
21362200
return slot;
@@ -2220,6 +2284,7 @@ ExecModifyTable(PlanState *pstate)
22202284
slot = ExecPrepareTupleRouting(node, estate, proute,
22212285
resultRelInfo, slot);
22222286
slot = ExecInsert(node, slot, planSlot,
2287+
NULL, estate->es_result_relation_info,
22232288
estate, node->canSetTag);
22242289
/* Revert ExecPrepareTupleRouting's state change. */
22252290
if (proute)

src/test/regress/expected/update.out

+40
Original file line numberDiff line numberDiff line change
@@ -447,6 +447,46 @@ UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (r
447447
part_c_1_100 | b | 17 | 95 | 19 |
448448
(6 rows)
449449

450+
-- The following tests computing RETURNING when the source and the destination
451+
-- partitions of a UPDATE row movement operation have different tuple
452+
-- descriptors, which has been shown to be problematic in the cases where the
453+
-- RETURNING targetlist contains non-target relation attributes that are
454+
-- computed by referring to the source partition plan's output tuple. Also,
455+
-- a trigger on the destination relation may change the tuple, which must be
456+
-- reflected in the RETURNING output, so we test that too.
457+
CREATE TABLE part_c_1_c_20 (LIKE range_parted);
458+
ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
459+
ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
460+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END; $$;
461+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
462+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
463+
tableoid | a | b | c | d | e | x | y
464+
---------------+---+----+-----+---+---------------+---+----
465+
part_c_1_c_20 | c | 1 | 1 | 1 | in trigfunc() | a | 1
466+
part_c_1_c_20 | c | 10 | 200 | 1 | in trigfunc() | a | 10
467+
part_c_1_c_20 | c | 12 | 96 | 1 | in trigfunc() | b | 12
468+
(3 rows)
469+
470+
DROP TRIGGER trig ON part_c_1_c_20;
471+
DROP FUNCTION trigfunc;
472+
:init_range_parted;
473+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
474+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
475+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
476+
tableoid | a | b | c | d | e | x | y
477+
----------+---+---+---+---+---+---+---
478+
(0 rows)
479+
480+
:show_data;
481+
partname | a | b | c | d | e
482+
--------------+---+----+-----+----+---
483+
part_c_1_100 | b | 13 | 97 | 2 |
484+
part_d_15_20 | b | 15 | 105 | 16 |
485+
part_d_15_20 | b | 17 | 105 | 19 |
486+
(3 rows)
487+
488+
DROP TABLE part_c_1_c_20;
489+
DROP FUNCTION trigfunc;
450490
-- Transition tables with update row movement
451491
:init_range_parted;
452492
CREATE FUNCTION trans_updatetrigfunc() RETURNS trigger LANGUAGE plpgsql AS

src/test/regress/sql/update.sql

+25
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,31 @@ DROP VIEW upview;
238238
UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (range_parted), *;
239239
:show_data;
240240

241+
-- The following tests computing RETURNING when the source and the destination
242+
-- partitions of a UPDATE row movement operation have different tuple
243+
-- descriptors, which has been shown to be problematic in the cases where the
244+
-- RETURNING targetlist contains non-target relation attributes that are
245+
-- computed by referring to the source partition plan's output tuple. Also,
246+
-- a trigger on the destination relation may change the tuple, which must be
247+
-- reflected in the RETURNING output, so we test that too.
248+
CREATE TABLE part_c_1_c_20 (LIKE range_parted);
249+
ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
250+
ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
251+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END; $$;
252+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
253+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
254+
255+
DROP TRIGGER trig ON part_c_1_c_20;
256+
DROP FUNCTION trigfunc;
257+
258+
:init_range_parted;
259+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
260+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
261+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
262+
:show_data;
263+
264+
DROP TABLE part_c_1_c_20;
265+
DROP FUNCTION trigfunc;
241266

242267
-- Transition tables with update row movement
243268
:init_range_parted;

0 commit comments

Comments
 (0)