Skip to content

Commit 27835b5

Browse files
committed
Fix bugs in RETURNING in cross-partition UPDATE cases.
If the source and destination partitions don't have identical rowtypes (for example, one has dropped columns the other lacks), then the planSlot contents will be different because of that. If the query has a RETURNING list that tries to return resjunk columns out of the planSlot, that is columns from tables that were joined to the target table, we'd get errors or wrong answers. That's because we used the RETURNING list generated for the destination partition, which expects a planSlot matching that partition's subplan. The most practical fix seems to be to convert the updated destination tuple back to the source partition's rowtype, and then apply the RETURNING list generated for the source partition. This avoids making fragile assumptions about whether the per-subpartition subplans generated all the resjunk columns in the same order. This has been broken since v11 introduced cross-partition UPDATE. The lack of field complaints shows that non-identical partitions aren't a common case; therefore, don't stress too hard about making the conversion efficient. There's no such bug in HEAD, because commit 86dc900 got rid of per-target-relation variance in the contents of the planSlot. Hence, patch v11-v13 only. Amit Langote and Etsuro Fujita, small changes by me Discussion: https://postgr.es/m/CA+HiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA@mail.gmail.com
1 parent e34adb2 commit 27835b5

File tree

3 files changed

+161
-11
lines changed

3 files changed

+161
-11
lines changed

src/backend/executor/nodeModifyTable.c

Lines changed: 96 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -146,21 +146,26 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
146146
/*
147147
* ExecProcessReturning --- evaluate a RETURNING list
148148
*
149-
* resultRelInfo: current result rel
149+
* projectReturning: the projection to evaluate
150+
* resultRelOid: result relation's OID
150151
* tupleSlot: slot holding tuple actually inserted/updated/deleted
151152
* planSlot: slot holding tuple returned by top subplan node
152153
*
154+
* In cross-partition UPDATE cases, projectReturning and planSlot are as
155+
* for the source partition, and tupleSlot must conform to that. But
156+
* resultRelOid is for the destination partition.
157+
*
153158
* Note: If tupleSlot is NULL, the FDW should have already provided econtext's
154159
* scan tuple.
155160
*
156161
* Returns a slot holding the result tuple
157162
*/
158163
static TupleTableSlot *
159-
ExecProcessReturning(ResultRelInfo *resultRelInfo,
164+
ExecProcessReturning(ProjectionInfo *projectReturning,
165+
Oid resultRelOid,
160166
TupleTableSlot *tupleSlot,
161167
TupleTableSlot *planSlot)
162168
{
163-
ProjectionInfo *projectReturning = resultRelInfo->ri_projectReturning;
164169
ExprContext *econtext = projectReturning->pi_exprContext;
165170

166171
/*
@@ -177,12 +182,13 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
177182
HeapTuple tuple;
178183

179184
/*
180-
* RETURNING expressions might reference the tableoid column, so
181-
* initialize t_tableOid before evaluating them.
185+
* RETURNING expressions might reference the tableoid column, so be
186+
* sure we expose the desired OID, ie that of the real target
187+
* relation.
182188
*/
183189
Assert(!TupIsNull(econtext->ecxt_scantuple));
184190
tuple = ExecMaterializeSlot(econtext->ecxt_scantuple);
185-
tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
191+
tuple->t_tableOid = resultRelOid;
186192
}
187193
econtext->ecxt_outertuple = planSlot;
188194

@@ -256,13 +262,25 @@ ExecCheckTIDVisible(EState *estate,
256262
* For INSERT, we have to insert the tuple into the target relation
257263
* and insert appropriate tuples into the index relations.
258264
*
265+
* slot contains the new tuple value to be stored.
266+
* planSlot is the output of the ModifyTable's subplan; we use it
267+
* to access "junk" columns that are not going to be stored.
268+
* In a cross-partition UPDATE, srcSlot is the slot that held the
269+
* updated tuple for the source relation; otherwise it's NULL.
270+
*
271+
* returningRelInfo is the resultRelInfo for the source relation of a
272+
* cross-partition UPDATE; otherwise it's the current result relation.
273+
* We use it to process RETURNING lists, for reasons explained below.
274+
*
259275
* Returns RETURNING result if any, otherwise NULL.
260276
* ----------------------------------------------------------------
261277
*/
262278
static TupleTableSlot *
263279
ExecInsert(ModifyTableState *mtstate,
264280
TupleTableSlot *slot,
265281
TupleTableSlot *planSlot,
282+
TupleTableSlot *srcSlot,
283+
ResultRelInfo *returningRelInfo,
266284
EState *estate,
267285
bool canSetTag)
268286
{
@@ -590,8 +608,66 @@ ExecInsert(ModifyTableState *mtstate,
590608
ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
591609

592610
/* Process RETURNING if present */
593-
if (resultRelInfo->ri_projectReturning)
594-
result = ExecProcessReturning(resultRelInfo, slot, planSlot);
611+
if (returningRelInfo->ri_projectReturning)
612+
{
613+
/*
614+
* In a cross-partition UPDATE with RETURNING, we have to use the
615+
* source partition's RETURNING list, because that matches the output
616+
* of the planSlot, while the destination partition might have
617+
* different resjunk columns. This means we have to map the
618+
* destination tuple back to the source's format so we can apply that
619+
* RETURNING list. This is expensive, but it should be an uncommon
620+
* corner case, so we won't spend much effort on making it fast.
621+
*
622+
* We assume that we can use srcSlot to hold the re-converted tuple.
623+
* Note that in the common case where the child partitions both match
624+
* the root's format, previous optimizations will have resulted in
625+
* slot and srcSlot being identical, cueing us that there's nothing to
626+
* do here.
627+
*/
628+
if (returningRelInfo != resultRelInfo && slot != srcSlot)
629+
{
630+
Relation srcRelationDesc = returningRelInfo->ri_RelationDesc;
631+
TupleConversionMap *map;
632+
633+
map = convert_tuples_by_name(RelationGetDescr(resultRelationDesc),
634+
RelationGetDescr(srcRelationDesc),
635+
gettext_noop("could not convert row type"));
636+
if (map)
637+
{
638+
HeapTuple origTuple = ExecMaterializeSlot(slot);
639+
HeapTuple newTuple;
640+
641+
newTuple = do_convert_tuple(origTuple, map);
642+
643+
/* do_convert_tuple doesn't copy system columns, so do that */
644+
newTuple->t_self = newTuple->t_data->t_ctid =
645+
origTuple->t_self;
646+
newTuple->t_tableOid = origTuple->t_tableOid;
647+
648+
HeapTupleHeaderSetXmin(newTuple->t_data,
649+
HeapTupleHeaderGetRawXmin(origTuple->t_data));
650+
HeapTupleHeaderSetCmin(newTuple->t_data,
651+
HeapTupleHeaderGetRawCommandId(origTuple->t_data));
652+
HeapTupleHeaderSetXmax(newTuple->t_data,
653+
InvalidTransactionId);
654+
655+
if (RelationGetDescr(resultRelationDesc)->tdhasoid)
656+
{
657+
Assert(RelationGetDescr(srcRelationDesc)->tdhasoid);
658+
HeapTupleSetOid(newTuple, HeapTupleGetOid(origTuple));
659+
}
660+
661+
slot = ExecStoreTuple(newTuple, srcSlot, InvalidBuffer, true);
662+
663+
free_conversion_map(map);
664+
}
665+
}
666+
667+
result = ExecProcessReturning(returningRelInfo->ri_projectReturning,
668+
RelationGetRelid(resultRelationDesc),
669+
slot, planSlot);
670+
}
595671

596672
return result;
597673
}
@@ -891,7 +967,9 @@ ldelete:;
891967
ExecStoreTuple(&deltuple, slot, InvalidBuffer, false);
892968
}
893969

894-
rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
970+
rslot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
971+
RelationGetRelid(resultRelationDesc),
972+
slot, planSlot);
895973

896974
/*
897975
* Before releasing the target tuple again, make sure rslot has a
@@ -1068,6 +1146,7 @@ lreplace:;
10681146
{
10691147
bool tuple_deleted;
10701148
TupleTableSlot *ret_slot;
1149+
TupleTableSlot *orig_slot = slot;
10711150
TupleTableSlot *epqslot = NULL;
10721151
PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
10731152
int map_index;
@@ -1175,6 +1254,7 @@ lreplace:;
11751254
mtstate->rootResultRelInfo, slot);
11761255

11771256
ret_slot = ExecInsert(mtstate, slot, planSlot,
1257+
orig_slot, resultRelInfo,
11781258
estate, canSetTag);
11791259

11801260
/* Revert ExecPrepareTupleRouting's node change. */
@@ -1334,7 +1414,9 @@ lreplace:;
13341414

13351415
/* Process RETURNING if present */
13361416
if (resultRelInfo->ri_projectReturning)
1337-
return ExecProcessReturning(resultRelInfo, slot, planSlot);
1417+
return ExecProcessReturning(resultRelInfo->ri_projectReturning,
1418+
RelationGetRelid(resultRelationDesc),
1419+
slot, planSlot);
13381420

13391421
return NULL;
13401422
}
@@ -2067,7 +2149,9 @@ ExecModifyTable(PlanState *pstate)
20672149
* ExecProcessReturning by IterateDirectModify, so no need to
20682150
* provide it here.
20692151
*/
2070-
slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
2152+
slot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
2153+
RelationGetRelid(resultRelInfo->ri_RelationDesc),
2154+
NULL, planSlot);
20712155

20722156
estate->es_result_relation_info = saved_resultRelInfo;
20732157
return slot;
@@ -2157,6 +2241,7 @@ ExecModifyTable(PlanState *pstate)
21572241
slot = ExecPrepareTupleRouting(node, estate, proute,
21582242
resultRelInfo, slot);
21592243
slot = ExecInsert(node, slot, planSlot,
2244+
NULL, estate->es_result_relation_info,
21602245
estate, node->canSetTag);
21612246
/* Revert ExecPrepareTupleRouting's state change. */
21622247
if (proute)

src/test/regress/expected/update.out

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -424,6 +424,46 @@ UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (r
424424
part_c_1_100 | b | 17 | 95 | 19 |
425425
(6 rows)
426426

427+
-- The following tests computing RETURNING when the source and the destination
428+
-- partitions of a UPDATE row movement operation have different tuple
429+
-- descriptors, which has been shown to be problematic in the cases where the
430+
-- RETURNING targetlist contains non-target relation attributes that are
431+
-- computed by referring to the source partition plan's output tuple. Also,
432+
-- a trigger on the destination relation may change the tuple, which must be
433+
-- reflected in the RETURNING output, so we test that too.
434+
CREATE TABLE part_c_1_c_20 (LIKE range_parted);
435+
ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
436+
ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
437+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END; $$;
438+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
439+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
440+
tableoid | a | b | c | d | e | x | y
441+
---------------+---+----+-----+---+---------------+---+----
442+
part_c_1_c_20 | c | 1 | 1 | 1 | in trigfunc() | a | 1
443+
part_c_1_c_20 | c | 10 | 200 | 1 | in trigfunc() | a | 10
444+
part_c_1_c_20 | c | 12 | 96 | 1 | in trigfunc() | b | 12
445+
(3 rows)
446+
447+
DROP TRIGGER trig ON part_c_1_c_20;
448+
DROP FUNCTION trigfunc;
449+
:init_range_parted;
450+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
451+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
452+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
453+
tableoid | a | b | c | d | e | x | y
454+
----------+---+---+---+---+---+---+---
455+
(0 rows)
456+
457+
:show_data;
458+
partname | a | b | c | d | e
459+
--------------+---+----+-----+----+---
460+
part_c_1_100 | b | 13 | 97 | 2 |
461+
part_d_15_20 | b | 15 | 105 | 16 |
462+
part_d_15_20 | b | 17 | 105 | 19 |
463+
(3 rows)
464+
465+
DROP TABLE part_c_1_c_20;
466+
DROP FUNCTION trigfunc;
427467
-- Transition tables with update row movement
428468
:init_range_parted;
429469
CREATE FUNCTION trans_updatetrigfunc() RETURNS trigger LANGUAGE plpgsql AS

src/test/regress/sql/update.sql

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,31 @@ DROP VIEW upview;
223223
UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (range_parted), *;
224224
:show_data;
225225

226+
-- The following tests computing RETURNING when the source and the destination
227+
-- partitions of a UPDATE row movement operation have different tuple
228+
-- descriptors, which has been shown to be problematic in the cases where the
229+
-- RETURNING targetlist contains non-target relation attributes that are
230+
-- computed by referring to the source partition plan's output tuple. Also,
231+
-- a trigger on the destination relation may change the tuple, which must be
232+
-- reflected in the RETURNING output, so we test that too.
233+
CREATE TABLE part_c_1_c_20 (LIKE range_parted);
234+
ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
235+
ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
236+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END; $$;
237+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
238+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
239+
240+
DROP TRIGGER trig ON part_c_1_c_20;
241+
DROP FUNCTION trigfunc;
242+
243+
:init_range_parted;
244+
CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
245+
CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
246+
UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING tableoid::regclass, *;
247+
:show_data;
248+
249+
DROP TABLE part_c_1_c_20;
250+
DROP FUNCTION trigfunc;
226251

227252
-- Transition tables with update row movement
228253
:init_range_parted;

0 commit comments

Comments
 (0)