Skip to content

Commit 89e46da

Browse files
author
Amit Kapila
committed
Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber.
Using REPLICA IDENTITY FULL on the publisher can lead to a full table scan per tuple change on the subscription when REPLICA IDENTITY or PK index is not available. This makes REPLICA IDENTITY FULL impractical to use apart from some small number of use cases. This patch allows using indexes other than PRIMARY KEY or REPLICA IDENTITY on the subscriber during apply of update/delete. The index that can be used must be a btree index, not a partial index, and it must have at least one column reference (i.e. cannot consist of only expressions). We can uplift these restrictions in the future. There is no smart mechanism to pick the index. If there is more than one index that satisfies these requirements, we just pick the first one. We discussed using some of the optimizer's low-level APIs for this but ruled it out as that can be a maintenance burden in the long run. This patch improves the performance in the vast majority of cases and the improvement is proportional to the amount of data in the table. However, there could be some regression in a small number of cases where the indexes have a lot of duplicate and dead rows. It was discussed that those are mostly impractical cases but we can provide a table or subscription level option to disable this feature if required. Author: Onder Kalaci, Amit Kapila Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com
1 parent 720de00 commit 89e46da

File tree

6 files changed

+325
-69
lines changed

6 files changed

+325
-69
lines changed

doc/src/sgml/logical-replication.sgml

+8-1
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,14 @@
132132
certain additional requirements) can also be set to be the replica
133133
identity. If the table does not have any suitable key, then it can be set
134134
to replica identity <quote>full</quote>, which means the entire row becomes
135-
the key. This, however, is very inefficient and should only be used as a
135+
the key. When replica identity <quote>full</quote> is specified,
136+
indexes can be used on the subscriber side for searching the rows. Candidate
137+
indexes must be btree, non-partial, and have at least one column reference
138+
(i.e. cannot consist of only expressions). These restrictions
139+
on the non-unique index properties adhere to some of the restrictions that
140+
are enforced for primary keys. If there are no such suitable indexes,
141+
the search on the subscriber side can be very inefficient, therefore
142+
replica identity <quote>full</quote> should only be used as a
136143
fallback if no other solution is possible. If a replica identity other
137144
than <quote>full</quote> is set on the publisher side, a replica identity
138145
comprising the same or fewer columns must also be set on the subscriber

src/backend/executor/execReplication.c

+79-33
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include "nodes/nodeFuncs.h"
2626
#include "parser/parse_relation.h"
2727
#include "parser/parsetree.h"
28+
#include "replication/logicalrelation.h"
2829
#include "storage/bufmgr.h"
2930
#include "storage/lmgr.h"
3031
#include "utils/builtins.h"
@@ -37,49 +38,63 @@
3738
#include "utils/typcache.h"
3839

3940

41+
static bool tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
42+
TypeCacheEntry **eq);
43+
4044
/*
4145
* Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
4246
* is setup to match 'rel' (*NOT* idxrel!).
4347
*
44-
* Returns whether any column contains NULLs.
48+
* Returns how many columns to use for the index scan.
49+
*
50+
* This is not generic routine, it expects the idxrel to be a btree, non-partial
51+
* and have at least one column reference (i.e. cannot consist of only
52+
* expressions).
4553
*
46-
* This is not generic routine, it expects the idxrel to be replication
47-
* identity of a rel and meet all limitations associated with that.
54+
* By definition, replication identity of a rel meets all limitations associated
55+
* with that. Note that any other index could also meet these limitations.
4856
*/
49-
static bool
57+
static int
5058
build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
5159
TupleTableSlot *searchslot)
5260
{
53-
int attoff;
61+
int index_attoff;
62+
int skey_attoff = 0;
5463
bool isnull;
5564
Datum indclassDatum;
5665
oidvector *opclass;
5766
int2vector *indkey = &idxrel->rd_index->indkey;
58-
bool hasnulls = false;
59-
60-
Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
61-
RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));
6267

6368
indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
6469
Anum_pg_index_indclass, &isnull);
6570
Assert(!isnull);
6671
opclass = (oidvector *) DatumGetPointer(indclassDatum);
6772

68-
/* Build scankey for every attribute in the index. */
69-
for (attoff = 0; attoff < IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
73+
/* Build scankey for every non-expression attribute in the index. */
74+
for (index_attoff = 0; index_attoff < IndexRelationGetNumberOfKeyAttributes(idxrel);
75+
index_attoff++)
7076
{
7177
Oid operator;
78+
Oid optype;
7279
Oid opfamily;
7380
RegProcedure regop;
74-
int pkattno = attoff + 1;
75-
int mainattno = indkey->values[attoff];
76-
Oid optype = get_opclass_input_type(opclass->values[attoff]);
81+
int table_attno = indkey->values[index_attoff];
82+
83+
if (!AttributeNumberIsValid(table_attno))
84+
{
85+
/*
86+
* XXX: Currently, we don't support expressions in the scan key,
87+
* see code below.
88+
*/
89+
continue;
90+
}
7791

7892
/*
7993
* Load the operator info. We need this to get the equality operator
8094
* function for the scan key.
8195
*/
82-
opfamily = get_opclass_family(opclass->values[attoff]);
96+
optype = get_opclass_input_type(opclass->values[index_attoff]);
97+
opfamily = get_opclass_family(opclass->values[index_attoff]);
8398

8499
operator = get_opfamily_member(opfamily, optype,
85100
optype,
@@ -91,23 +106,25 @@ build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
91106
regop = get_opcode(operator);
92107

93108
/* Initialize the scankey. */
94-
ScanKeyInit(&skey[attoff],
95-
pkattno,
109+
ScanKeyInit(&skey[skey_attoff],
110+
index_attoff + 1,
96111
BTEqualStrategyNumber,
97112
regop,
98-
searchslot->tts_values[mainattno - 1]);
113+
searchslot->tts_values[table_attno - 1]);
99114

100-
skey[attoff].sk_collation = idxrel->rd_indcollation[attoff];
115+
skey[skey_attoff].sk_collation = idxrel->rd_indcollation[index_attoff];
101116

102117
/* Check for null value. */
103-
if (searchslot->tts_isnull[mainattno - 1])
104-
{
105-
hasnulls = true;
106-
skey[attoff].sk_flags |= SK_ISNULL;
107-
}
118+
if (searchslot->tts_isnull[table_attno - 1])
119+
skey[skey_attoff].sk_flags |= (SK_ISNULL | SK_SEARCHNULL);
120+
121+
skey_attoff++;
108122
}
109123

110-
return hasnulls;
124+
/* There must always be at least one attribute for the index scan. */
125+
Assert(skey_attoff > 0);
126+
127+
return skey_attoff;
111128
}
112129

113130
/*
@@ -123,33 +140,58 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
123140
TupleTableSlot *outslot)
124141
{
125142
ScanKeyData skey[INDEX_MAX_KEYS];
143+
int skey_attoff;
126144
IndexScanDesc scan;
127145
SnapshotData snap;
128146
TransactionId xwait;
129147
Relation idxrel;
130148
bool found;
149+
TypeCacheEntry **eq = NULL;
150+
bool isIdxSafeToSkipDuplicates;
131151

132152
/* Open the index. */
133153
idxrel = index_open(idxoid, RowExclusiveLock);
134154

135-
/* Start an index scan. */
155+
isIdxSafeToSkipDuplicates = (GetRelationIdentityOrPK(rel) == idxoid);
156+
136157
InitDirtySnapshot(snap);
137-
scan = index_beginscan(rel, idxrel, &snap,
138-
IndexRelationGetNumberOfKeyAttributes(idxrel),
139-
0);
140158

141159
/* Build scan key. */
142-
build_replindex_scan_key(skey, rel, idxrel, searchslot);
160+
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
161+
162+
/* Start an index scan. */
163+
scan = index_beginscan(rel, idxrel, &snap, skey_attoff, 0);
143164

144165
retry:
145166
found = false;
146167

147-
index_rescan(scan, skey, IndexRelationGetNumberOfKeyAttributes(idxrel), NULL, 0);
168+
index_rescan(scan, skey, skey_attoff, NULL, 0);
148169

149170
/* Try to find the tuple */
150-
if (index_getnext_slot(scan, ForwardScanDirection, outslot))
171+
while (index_getnext_slot(scan, ForwardScanDirection, outslot))
151172
{
152-
found = true;
173+
/*
174+
* Avoid expensive equality check if the index is primary key or
175+
* replica identity index.
176+
*/
177+
if (!isIdxSafeToSkipDuplicates)
178+
{
179+
if (eq == NULL)
180+
{
181+
#ifdef USE_ASSERT_CHECKING
182+
/* apply assertions only once for the input idxoid */
183+
IndexInfo *indexInfo = BuildIndexInfo(idxrel);
184+
185+
Assert(IsIndexUsableForReplicaIdentityFull(indexInfo));
186+
#endif
187+
188+
eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
189+
}
190+
191+
if (!tuples_equal(outslot, searchslot, eq))
192+
continue;
193+
}
194+
153195
ExecMaterializeSlot(outslot);
154196

155197
xwait = TransactionIdIsValid(snap.xmin) ?
@@ -164,6 +206,10 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
164206
XactLockTableWait(xwait, NULL, NULL, XLTW_None);
165207
goto retry;
166208
}
209+
210+
/* Found our tuple and it's not locked */
211+
found = true;
212+
break;
167213
}
168214

169215
/* Found tuple, try to lock it in the lockmode. */

0 commit comments

Comments
 (0)