Skip to content

Commit 12e611d

Browse files
committed
Rename jsonb_hash_ops to jsonb_path_ops.
There's no longer much pressure to switch the default GIN opclass for jsonb, but there was still some unhappiness with the name "jsonb_hash_ops", since hashing is no longer a distinguishing property of that opclass, and anyway it seems like a relatively minor detail. At the suggestion of Heikki Linnakangas, we'll use "jsonb_path_ops" instead; that captures the important characteristic that each index entry depends on the entire path from the document root to the indexed value. Also add a user-facing explanation of the implementation properties of these two opclasses.
1 parent e136271 commit 12e611d

File tree

12 files changed

+78
-44
lines changed

12 files changed

+78
-44
lines changed

doc/src/sgml/gin.sgml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,7 @@
395395
</entry>
396396
</row>
397397
<row>
398-
<entry><literal>jsonb_hash_ops</></entry>
398+
<entry><literal>jsonb_path_ops</></entry>
399399
<entry><type>jsonb</></entry>
400400
<entry>
401401
<literal>@&gt;</>
@@ -415,7 +415,7 @@
415415

416416
<para>
417417
Of the two operator classes for type <type>jsonb</>, <literal>jsonb_ops</>
418-
is the default. <literal>jsonb_hash_ops</> supports fewer operators but
418+
is the default. <literal>jsonb_path_ops</> supports fewer operators but
419419
offers better performance for those operators.
420420
See <xref linkend="json-indexing"> for details.
421421
</para>

doc/src/sgml/json.sgml

Lines changed: 40 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@
156156
</table>
157157

158158
<sect2 id="json-keys-elements">
159-
<title><type>jsonb</> Input and Output Syntax</title>
159+
<title>JSON Input and Output Syntax</title>
160160
<para>
161161
The input/output syntax for the JSON data types is as specified in
162162
<acronym>RFC</> 7159.
@@ -366,11 +366,11 @@ SELECT '"foo"'::jsonb ? 'foo';
366366
<programlisting>
367367
CREATE INDEX idxgin ON api USING gin (jdoc);
368368
</programlisting>
369-
The non-default GIN operator class <literal>jsonb_hash_ops</>
369+
The non-default GIN operator class <literal>jsonb_path_ops</>
370370
supports indexing the <literal>@&gt;</> operator only.
371371
An example of creating an index with this operator class is:
372372
<programlisting>
373-
CREATE INDEX idxginh ON api USING gin (jdoc jsonb_hash_ops);
373+
CREATE INDEX idxginp ON api USING gin (jdoc jsonb_path_ops);
374374
</programlisting>
375375
</para>
376376

@@ -444,17 +444,52 @@ SELECT jdoc-&gt;'guid', jdoc-&gt;'name' FROM api WHERE jdoc @&gt; '{"tags": ["qu
444444
</para>
445445

446446
<para>
447-
Although the <literal>jsonb_hash_ops</literal> operator class supports
447+
Although the <literal>jsonb_path_ops</literal> operator class supports
448448
only queries with the <literal>@&gt;</> operator, it has notable
449449
performance advantages over the default operator
450-
class <literal>jsonb_ops</literal>. A <literal>jsonb_hash_ops</literal>
450+
class <literal>jsonb_ops</literal>. A <literal>jsonb_path_ops</literal>
451451
index is usually much smaller than a <literal>jsonb_ops</literal>
452452
index over the same data, and the specificity of searches is better,
453453
particularly when queries contain keys that appear frequently in the
454454
data. Therefore search operations typically perform better
455455
than with the default operator class.
456456
</para>
457457

458+
<para>
459+
The technical difference between a <literal>jsonb_ops</literal>
460+
and a <literal>jsonb_path_ops</literal> GIN index is that the former
461+
creates independent index items for each key and value in the data,
462+
while the latter creates index items only for each value in the
463+
data.<footnote><para>For this purpose, the term <quote>value</>
464+
includes array elements, though JSON terminology sometimes considers
465+
array elements distinct from values within objects.</para></footnote>
466+
But in <literal>jsonb_path_ops</literal>, each index item is a hash
467+
of both the value and the key(s) leading to it; for example to index
468+
<literal>{"foo": {"bar": "baz"}}</literal>, a single index item would
469+
be created incorporating all three of <literal>foo</>, <literal>bar</>,
470+
and <literal>baz</> into the hash value. Thus a containment query
471+
looking for this structure would result in an extremely specific index
472+
search; but there is no way at all to find out whether <literal>foo</>
473+
appears as a key. On the other hand, a <literal>jsonb_ops</literal>
474+
index would create three index items representing <literal>foo</>,
475+
<literal>bar</>, and <literal>baz</> separately; then to do the
476+
containment query, it would look for rows containing all three of
477+
these items. While GIN indexes can perform such an AND search fairly
478+
efficiently, it will still be less specific and slower than the
479+
equivalent <literal>jsonb_path_ops</literal> search, especially if
480+
there are a very large number of rows containing any single one of the
481+
three index items.
482+
</para>
483+
484+
<para>
485+
A disadvantage of the <literal>jsonb_path_ops</literal> approach is
486+
that it produces no index entries for JSON structures not containing
487+
any values, such as <literal>{"a": {}}</literal>. If a search for
488+
documents containing such a structure is requested, it will require a
489+
full-index scan, which is quite slow. <literal>jsonb_path_ops</> is
490+
therefore ill-suited for applications that often perform such searches.
491+
</para>
492+
458493
<para>
459494
<type>jsonb</> also supports <literal>btree</> and <literal>hash</>
460495
indexes. These are usually useful only if it's important to check

src/backend/utils/adt/jsonb_gin.c

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -315,9 +315,9 @@ gin_triconsistent_jsonb(PG_FUNCTION_ARGS)
315315

316316
/*
317317
*
318-
* jsonb_hash_ops GIN opclass support functions
318+
* jsonb_path_ops GIN opclass support functions
319319
*
320-
* In a jsonb_hash_ops index, the GIN keys are uint32 hashes, one per JSON
320+
* In a jsonb_path_ops index, the GIN keys are uint32 hashes, one per JSON
321321
* value; but the JSON key(s) leading to each value are also included in its
322322
* hash computation. This means we can only support containment queries,
323323
* but the index can distinguish, for example, {"foo": 42} from {"bar": 42}
@@ -326,7 +326,7 @@ gin_triconsistent_jsonb(PG_FUNCTION_ARGS)
326326
*/
327327

328328
Datum
329-
gin_extract_jsonb_hash(PG_FUNCTION_ARGS)
329+
gin_extract_jsonb_path(PG_FUNCTION_ARGS)
330330
{
331331
Jsonb *jb = PG_GETARG_JSONB(0);
332332
int32 *nentries = (int32 *) PG_GETARG_POINTER(1);
@@ -349,7 +349,7 @@ gin_extract_jsonb_hash(PG_FUNCTION_ARGS)
349349
/* Otherwise, use 2 * root count as initial estimate of result size */
350350
entries = (Datum *) palloc(sizeof(Datum) * total);
351351

352-
/* We keep a stack of hashes corresponding to parent key levels */
352+
/* We keep a stack of partial hashes corresponding to parent key levels */
353353
tail.parent = NULL;
354354
tail.hash = 0;
355355
stack = &tail;
@@ -439,7 +439,7 @@ gin_extract_jsonb_hash(PG_FUNCTION_ARGS)
439439
}
440440

441441
Datum
442-
gin_extract_jsonb_query_hash(PG_FUNCTION_ARGS)
442+
gin_extract_jsonb_query_path(PG_FUNCTION_ARGS)
443443
{
444444
int32 *nentries = (int32 *) PG_GETARG_POINTER(1);
445445
StrategyNumber strategy = PG_GETARG_UINT16(2);
@@ -449,9 +449,9 @@ gin_extract_jsonb_query_hash(PG_FUNCTION_ARGS)
449449
if (strategy != JsonbContainsStrategyNumber)
450450
elog(ERROR, "unrecognized strategy number: %d", strategy);
451451

452-
/* Query is a jsonb, so just apply gin_extract_jsonb_hash ... */
452+
/* Query is a jsonb, so just apply gin_extract_jsonb_path ... */
453453
entries = (Datum *)
454-
DatumGetPointer(DirectFunctionCall2(gin_extract_jsonb_hash,
454+
DatumGetPointer(DirectFunctionCall2(gin_extract_jsonb_path,
455455
PG_GETARG_DATUM(0),
456456
PointerGetDatum(nentries)));
457457

@@ -463,7 +463,7 @@ gin_extract_jsonb_query_hash(PG_FUNCTION_ARGS)
463463
}
464464

465465
Datum
466-
gin_consistent_jsonb_hash(PG_FUNCTION_ARGS)
466+
gin_consistent_jsonb_path(PG_FUNCTION_ARGS)
467467
{
468468
bool *check = (bool *) PG_GETARG_POINTER(0);
469469
StrategyNumber strategy = PG_GETARG_UINT16(1);
@@ -480,13 +480,12 @@ gin_consistent_jsonb_hash(PG_FUNCTION_ARGS)
480480
elog(ERROR, "unrecognized strategy number: %d", strategy);
481481

482482
/*
483-
* jsonb_hash_ops is necessarily lossy, not only because of hash
483+
* jsonb_path_ops is necessarily lossy, not only because of hash
484484
* collisions but also because it doesn't preserve complete information
485485
* about the structure of the JSON object. Besides, there are some
486-
* special rules around the containment of raw scalar arrays and regular
487-
* arrays that are not handled here. So we must always recheck a match.
488-
* However, if not all of the keys are present, the tuple certainly
489-
* doesn't match.
486+
* special rules around the containment of raw scalars in arrays that are
487+
* not handled here. So we must always recheck a match. However, if not
488+
* all of the keys are present, the tuple certainly doesn't match.
490489
*/
491490
*recheck = true;
492491
for (i = 0; i < nkeys; i++)
@@ -502,7 +501,7 @@ gin_consistent_jsonb_hash(PG_FUNCTION_ARGS)
502501
}
503502

504503
Datum
505-
gin_triconsistent_jsonb_hash(PG_FUNCTION_ARGS)
504+
gin_triconsistent_jsonb_path(PG_FUNCTION_ARGS)
506505
{
507506
GinTernaryValue *check = (GinTernaryValue *) PG_GETARG_POINTER(0);
508507
StrategyNumber strategy = PG_GETARG_UINT16(1);

src/include/catalog/catversion.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,6 @@
5353
*/
5454

5555
/* yyyymmddN */
56-
#define CATALOG_VERSION_NO 201405093
56+
#define CATALOG_VERSION_NO 201405111
5757

5858
#endif

src/include/catalog/pg_amop.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -787,20 +787,20 @@ DATA(insert ( 4033 3802 3802 4 s 3245 403 0 ));
787787
DATA(insert ( 4033 3802 3802 5 s 3243 403 0 ));
788788

789789
/*
790-
* hash jsonb ops
790+
* hash jsonb_ops
791791
*/
792792
DATA(insert ( 4034 3802 3802 1 s 3240 405 0 ));
793793

794794
/*
795-
* GIN jsonb ops
795+
* GIN jsonb_ops
796796
*/
797797
DATA(insert ( 4036 3802 3802 7 s 3246 2742 0 ));
798798
DATA(insert ( 4036 3802 25 9 s 3247 2742 0 ));
799799
DATA(insert ( 4036 3802 1009 10 s 3248 2742 0 ));
800800
DATA(insert ( 4036 3802 1009 11 s 3249 2742 0 ));
801801

802802
/*
803-
* GIN jsonb hash ops
803+
* GIN jsonb_path_ops
804804
*/
805805
DATA(insert ( 4037 3802 3802 7 s 3246 2742 0 ));
806806

src/include/catalog/pg_opclass.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,6 @@ DATA(insert ( 4000 text_ops PGNSP PGUID 4017 25 t 0 ));
232232
DATA(insert ( 403 jsonb_ops PGNSP PGUID 4033 3802 t 0 ));
233233
DATA(insert ( 405 jsonb_ops PGNSP PGUID 4034 3802 t 0 ));
234234
DATA(insert ( 2742 jsonb_ops PGNSP PGUID 4036 3802 t 25 ));
235-
DATA(insert ( 2742 jsonb_hash_ops PGNSP PGUID 4037 3802 f 23 ));
235+
DATA(insert ( 2742 jsonb_path_ops PGNSP PGUID 4037 3802 f 23 ));
236236

237237
#endif /* PG_OPCLASS_H */

src/include/catalog/pg_opfamily.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,11 +148,11 @@ DATA(insert OID = 3474 ( 4000 range_ops PGNSP PGUID ));
148148
DATA(insert OID = 4015 ( 4000 quad_point_ops PGNSP PGUID ));
149149
DATA(insert OID = 4016 ( 4000 kd_point_ops PGNSP PGUID ));
150150
DATA(insert OID = 4017 ( 4000 text_ops PGNSP PGUID ));
151+
#define TEXT_SPGIST_FAM_OID 4017
151152
DATA(insert OID = 4033 ( 403 jsonb_ops PGNSP PGUID ));
152153
DATA(insert OID = 4034 ( 405 jsonb_ops PGNSP PGUID ));
153154
DATA(insert OID = 4035 ( 783 jsonb_ops PGNSP PGUID ));
154155
DATA(insert OID = 4036 ( 2742 jsonb_ops PGNSP PGUID ));
155-
DATA(insert OID = 4037 ( 2742 jsonb_hash_ops PGNSP PGUID ));
156-
#define TEXT_SPGIST_FAM_OID 4017
156+
DATA(insert OID = 4037 ( 2742 jsonb_path_ops PGNSP PGUID ));
157157

158158
#endif /* PG_OPFAMILY_H */

src/include/catalog/pg_proc.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4645,13 +4645,13 @@ DATA(insert OID = 3484 ( gin_consistent_jsonb PGNSP PGUID 12 1 0 0 0 f f f f t
46454645
DESCR("GIN support");
46464646
DATA(insert OID = 3488 ( gin_triconsistent_jsonb PGNSP PGUID 12 1 0 0 0 f f f f t f i 7 0 16 "2281 21 2277 23 2281 2281 2281" _null_ _null_ _null_ _null_ gin_triconsistent_jsonb _null_ _null_ _null_ ));
46474647
DESCR("GIN support");
4648-
DATA(insert OID = 3485 ( gin_extract_jsonb_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ gin_extract_jsonb_hash _null_ _null_ _null_ ));
4648+
DATA(insert OID = 3485 ( gin_extract_jsonb_path PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ gin_extract_jsonb_path _null_ _null_ _null_ ));
46494649
DESCR("GIN support");
4650-
DATA(insert OID = 3486 ( gin_extract_jsonb_query_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i 7 0 2281 "2277 2281 21 2281 2281 2281 2281" _null_ _null_ _null_ _null_ gin_extract_jsonb_query_hash _null_ _null_ _null_ ));
4650+
DATA(insert OID = 3486 ( gin_extract_jsonb_query_path PGNSP PGUID 12 1 0 0 0 f f f f t f i 7 0 2281 "2277 2281 21 2281 2281 2281 2281" _null_ _null_ _null_ _null_ gin_extract_jsonb_query_path _null_ _null_ _null_ ));
46514651
DESCR("GIN support");
4652-
DATA(insert OID = 3487 ( gin_consistent_jsonb_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i 8 0 16 "2281 21 2277 23 2281 2281 2281 2281" _null_ _null_ _null_ _null_ gin_consistent_jsonb_hash _null_ _null_ _null_ ));
4652+
DATA(insert OID = 3487 ( gin_consistent_jsonb_path PGNSP PGUID 12 1 0 0 0 f f f f t f i 8 0 16 "2281 21 2277 23 2281 2281 2281 2281" _null_ _null_ _null_ _null_ gin_consistent_jsonb_path _null_ _null_ _null_ ));
46534653
DESCR("GIN support");
4654-
DATA(insert OID = 3489 ( gin_triconsistent_jsonb_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i 7 0 16 "2281 21 2277 23 2281 2281 2281" _null_ _null_ _null_ _null_ gin_triconsistent_jsonb_hash _null_ _null_ _null_ ));
4654+
DATA(insert OID = 3489 ( gin_triconsistent_jsonb_path PGNSP PGUID 12 1 0 0 0 f f f f t f i 7 0 16 "2281 21 2277 23 2281 2281 2281" _null_ _null_ _null_ _null_ gin_triconsistent_jsonb_path _null_ _null_ _null_ ));
46554655
DESCR("GIN support");
46564656

46574657
/* txid */

src/include/utils/jsonb.h

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -332,18 +332,18 @@ extern Datum jsonb_eq(PG_FUNCTION_ARGS);
332332
extern Datum jsonb_cmp(PG_FUNCTION_ARGS);
333333
extern Datum jsonb_hash(PG_FUNCTION_ARGS);
334334

335-
/* GIN support functions */
335+
/* GIN support functions for jsonb_ops */
336336
extern Datum gin_compare_jsonb(PG_FUNCTION_ARGS);
337337
extern Datum gin_extract_jsonb(PG_FUNCTION_ARGS);
338338
extern Datum gin_extract_jsonb_query(PG_FUNCTION_ARGS);
339339
extern Datum gin_consistent_jsonb(PG_FUNCTION_ARGS);
340340
extern Datum gin_triconsistent_jsonb(PG_FUNCTION_ARGS);
341341

342-
/* GIN hash opclass functions */
343-
extern Datum gin_extract_jsonb_hash(PG_FUNCTION_ARGS);
344-
extern Datum gin_extract_jsonb_query_hash(PG_FUNCTION_ARGS);
345-
extern Datum gin_consistent_jsonb_hash(PG_FUNCTION_ARGS);
346-
extern Datum gin_triconsistent_jsonb_hash(PG_FUNCTION_ARGS);
342+
/* GIN support functions for jsonb_path_ops */
343+
extern Datum gin_extract_jsonb_path(PG_FUNCTION_ARGS);
344+
extern Datum gin_extract_jsonb_query_path(PG_FUNCTION_ARGS);
345+
extern Datum gin_consistent_jsonb_path(PG_FUNCTION_ARGS);
346+
extern Datum gin_triconsistent_jsonb_path(PG_FUNCTION_ARGS);
347347

348348
/* Support functions */
349349
extern int compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);

src/test/regress/expected/jsonb.out

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1685,9 +1685,9 @@ SELECT count(*) FROM testjsonb WHERE j = '{"pos":98, "line":371, "node":"CBA", "
16851685
1
16861686
(1 row)
16871687

1688-
--gin hash
1688+
--gin path opclass
16891689
DROP INDEX jidx;
1690-
CREATE INDEX jidx ON testjsonb USING gin (j jsonb_hash_ops);
1690+
CREATE INDEX jidx ON testjsonb USING gin (j jsonb_path_ops);
16911691
SET enable_seqscan = off;
16921692
SELECT count(*) FROM testjsonb WHERE j @> '{"wait":null}';
16931693
count

src/test/regress/expected/jsonb_1.out

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1685,9 +1685,9 @@ SELECT count(*) FROM testjsonb WHERE j = '{"pos":98, "line":371, "node":"CBA", "
16851685
1
16861686
(1 row)
16871687

1688-
--gin hash
1688+
--gin path opclass
16891689
DROP INDEX jidx;
1690-
CREATE INDEX jidx ON testjsonb USING gin (j jsonb_hash_ops);
1690+
CREATE INDEX jidx ON testjsonb USING gin (j jsonb_path_ops);
16911691
SET enable_seqscan = off;
16921692
SELECT count(*) FROM testjsonb WHERE j @> '{"wait":null}';
16931693
count

src/test/regress/sql/jsonb.sql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -391,9 +391,9 @@ SET enable_seqscan = off;
391391
SELECT count(*) FROM testjsonb WHERE j > '{"p":1}';
392392
SELECT count(*) FROM testjsonb WHERE j = '{"pos":98, "line":371, "node":"CBA", "indexed":true}';
393393

394-
--gin hash
394+
--gin path opclass
395395
DROP INDEX jidx;
396-
CREATE INDEX jidx ON testjsonb USING gin (j jsonb_hash_ops);
396+
CREATE INDEX jidx ON testjsonb USING gin (j jsonb_path_ops);
397397
SET enable_seqscan = off;
398398

399399
SELECT count(*) FROM testjsonb WHERE j @> '{"wait":null}';

0 commit comments

Comments
 (0)