Skip to content

Commit 2c05560

Browse files
committed
Indexing support for pattern matching operations via separate operator
class when lc_collate is not C.
1 parent 2a2f6cf commit 2c05560

File tree

20 files changed

+488
-207
lines changed

20 files changed

+488
-207
lines changed

doc/src/sgml/charset.sgml

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.35 2003/04/15 13:26:54 petere Exp $ -->
1+
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.36 2003/05/15 15:50:18 petere Exp $ -->
22

33
<chapter id="charset">
44
<title>Localization</>
@@ -213,23 +213,13 @@ initdb --locale=sv_SE
213213
The <function>to_char</> family of functions
214214
</para>
215215
</listitem>
216-
217-
<listitem>
218-
<para>
219-
The <literal>LIKE</> and <literal>~</> operators for pattern
220-
matching
221-
</para>
222-
</listitem>
223216
</itemizedlist>
224217
</para>
225218

226219
<para>
227220
The only severe drawback of using the locale support in
228-
<productname>PostgreSQL</> is its speed. So use locales only if you
229-
actually need it. It should be noted in particular that selecting
230-
a non-C locale disables index optimizations for <literal>LIKE</> and
231-
<literal>~</> operators, which can make a huge difference in the
232-
speed of searches that use those operators.
221+
<productname>PostgreSQL</> is its speed. So use locales only if
222+
you actually need them.
233223
</para>
234224
</sect2>
235225

doc/src/sgml/indices.sgml

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.40 2003/03/25 16:15:36 petere Exp $ -->
1+
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.41 2003/05/15 15:50:18 petere Exp $ -->
22

33
<chapter id="indexes">
44
<title id="indexes-title">Indexes</title>
@@ -132,6 +132,19 @@ CREATE INDEX test1_id_index ON test1 (id);
132132
</simplelist>
133133
</para>
134134

135+
<para>
136+
The optimizer can also use a B-tree index for queries involving the
137+
pattern matching operators <literal>LIKE</>,
138+
<literal>ILIKE</literal>, <literal>~</literal>, and
139+
<literal>~*</literal>, <emphasis>if</emphasis> the pattern is
140+
anchored to the beginning of the string, e.g., <literal>col LIKE
141+
'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
142+
<literal>col LIKE '%bar'</literal>. However, if your server does
143+
not use the C locale you will need to create the index with a
144+
special operator class. See <xref linkend="indexes-opclass">
145+
below.
146+
</para>
147+
135148
<para>
136149
<indexterm>
137150
<primary>indexes</primary>
@@ -405,6 +418,36 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
405418
<literal>bigbox_ops</literal>.
406419
</para>
407420
</listitem>
421+
422+
<listitem>
423+
<para>
424+
The operator classes <literal>text_pattern_ops</literal>,
425+
<literal>varchar_pattern_ops</literal>,
426+
<literal>bpchar_pattern_ops</literal>, and
427+
<literal>name_pattern_ops</literal> support B-tree indexes on
428+
the types <type>text</type>, <type>varchar</type>,
429+
<type>char</type>, and <type>name</type>, respectively. The
430+
difference to the ordinary operator classes is that the values
431+
are compared strictly character by character rather than
432+
according to the locale-specific collation rules. This makes
433+
these operator classes suitable for use by queries involving
434+
pattern matching expressions (<literal>LIKE</literal> or POSIX
435+
regular expressions) if the server does not use the standard
436+
<quote>C</quote> locale. As an example, to index a
437+
<type>varchar</type> column like this:
438+
<programlisting>
439+
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
440+
</programlisting>
441+
If you do use the C locale, you should instead create an index
442+
with the default operator class. Also note that you should
443+
create an index with the default operator class if you want
444+
queries involving ordinary comparisons to use an index. Such
445+
queries cannot use the
446+
<literal><replaceable>xxx</replaceable>_pattern_ops</literal>
447+
operator classes. It is possible, however, to create multiple
448+
indexes on the same column with different operator classes.
449+
</para>
450+
</listitem>
408451
</itemizedlist>
409452
</para>
410453

doc/src/sgml/release.sgml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.187 2003/05/14 03:25:59 tgl Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.188 2003/05/15 15:50:18 petere Exp $
33
-->
44

55
<appendix id="release">
@@ -24,6 +24,7 @@ CDATA means the content is "SGML-free", so you can write without
2424
worries about funny characters.
2525
-->
2626
<literallayout><![CDATA[
27+
Pattern matching operations can use indexes regardless of locale
2728
New frontend/backend protocol supports many long-requested features
2829
SET AUTOCOMMIT TO OFF is no longer supported
2930
Reimplementation of NUMERIC datatype for more speed

doc/src/sgml/runtime.sgml

Lines changed: 8 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.179 2003/05/14 03:26:00 tgl Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.180 2003/05/15 15:50:18 petere Exp $
33
-->
44

55
<Chapter Id="runtime">
@@ -133,26 +133,13 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
133133
</para>
134134

135135
<para>
136-
<command>initdb</command> also initializes the default locale<indexterm><primary>locale</></> for
137-
the database cluster. Normally, it will just take the locale
138-
settings in the environment and apply them to the initialized
139-
database. It is possible to specify a different locale for the
140-
database; more information about that can be found in <xref
141-
linkend="locale">. One surprise you might encounter while running
142-
<command>initdb</command> is a notice similar to this:
143-
<screen>
144-
The database cluster will be initialized with locale de_DE.
145-
This locale setting will prevent the use of indexes for pattern matching
146-
operations. If that is a concern, rerun initdb with the collation order
147-
set to "C". For more information see the documentation.
148-
</screen>
149-
This is intended to warn you that the currently selected locale
150-
will cause indexes to be sorted in an order that prevents them from
151-
being used for <literal>LIKE</> and regular-expression searches. If you need
152-
good performance in such searches, you should set your current
153-
locale to <literal>C</> and re-run <command>initdb</command>, e.g.,
154-
by running <literal>initdb --lc-collate=C</literal>. The sort
155-
order used within a particular database cluster is set by
136+
<command>initdb</command> also initializes the default
137+
locale<indexterm><primary>locale</></> for the database cluster.
138+
Normally, it will just take the locale settings in the environment
139+
and apply them to the initialized database. It is possible to
140+
specify a different locale for the database; more information about
141+
that can be found in <xref linkend="locale">. The sort order used
142+
within a particular database cluster is set by
156143
<command>initdb</command> and cannot be changed later, short of
157144
dumping all data, rerunning <command>initdb</command>, and
158145
reloading the data. So it's important to make this choice correctly

src/backend/access/nbtree/nbtcompare.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtcompare.c,v 1.44 2002/06/20 20:29:25 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtcompare.c,v 1.45 2003/05/15 15:50:18 petere Exp $
1212
*
1313
* NOTES
1414
*
@@ -149,3 +149,12 @@ btnamecmp(PG_FUNCTION_ARGS)
149149

150150
PG_RETURN_INT32(strncmp(NameStr(*a), NameStr(*b), NAMEDATALEN));
151151
}
152+
153+
Datum
154+
btname_pattern_cmp(PG_FUNCTION_ARGS)
155+
{
156+
Name a = PG_GETARG_NAME(0);
157+
Name b = PG_GETARG_NAME(1);
158+
159+
PG_RETURN_INT32(memcmp(NameStr(*a), NameStr(*b), NAMEDATALEN));
160+
}

src/backend/optimizer/path/indxpath.c

Lines changed: 46 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
*
1111
* IDENTIFICATION
12-
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/indxpath.c,v 1.137 2003/05/13 04:38:58 tgl Exp $
12+
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/indxpath.c,v 1.138 2003/05/15 15:50:18 petere Exp $
1313
*
1414
*-------------------------------------------------------------------------
1515
*/
@@ -1797,44 +1797,40 @@ match_special_index_operator(Expr *clause, Oid opclass,
17971797
case OID_VARCHAR_LIKE_OP:
17981798
case OID_NAME_LIKE_OP:
17991799
/* the right-hand const is type text for all of these */
1800-
if (locale_is_like_safe())
1801-
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Like,
1802-
&prefix, &rest) != Pattern_Prefix_None;
1800+
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Like,
1801+
&prefix, &rest) != Pattern_Prefix_None;
18031802
break;
18041803

18051804
case OID_BYTEA_LIKE_OP:
18061805
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Like,
1807-
&prefix, &rest) != Pattern_Prefix_None;
1806+
&prefix, &rest) != Pattern_Prefix_None;
18081807
break;
18091808

18101809
case OID_TEXT_ICLIKE_OP:
18111810
case OID_BPCHAR_ICLIKE_OP:
18121811
case OID_VARCHAR_ICLIKE_OP:
18131812
case OID_NAME_ICLIKE_OP:
18141813
/* the right-hand const is type text for all of these */
1815-
if (locale_is_like_safe())
1816-
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Like_IC,
1817-
&prefix, &rest) != Pattern_Prefix_None;
1814+
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Like_IC,
1815+
&prefix, &rest) != Pattern_Prefix_None;
18181816
break;
18191817

18201818
case OID_TEXT_REGEXEQ_OP:
18211819
case OID_BPCHAR_REGEXEQ_OP:
18221820
case OID_VARCHAR_REGEXEQ_OP:
18231821
case OID_NAME_REGEXEQ_OP:
18241822
/* the right-hand const is type text for all of these */
1825-
if (locale_is_like_safe())
1826-
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Regex,
1827-
&prefix, &rest) != Pattern_Prefix_None;
1823+
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Regex,
1824+
&prefix, &rest) != Pattern_Prefix_None;
18281825
break;
18291826

18301827
case OID_TEXT_ICREGEXEQ_OP:
18311828
case OID_BPCHAR_ICREGEXEQ_OP:
18321829
case OID_VARCHAR_ICREGEXEQ_OP:
18331830
case OID_NAME_ICREGEXEQ_OP:
18341831
/* the right-hand const is type text for all of these */
1835-
if (locale_is_like_safe())
1836-
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Regex_IC,
1837-
&prefix, &rest) != Pattern_Prefix_None;
1832+
isIndexable = pattern_fixed_prefix(patt, Pattern_Type_Regex_IC,
1833+
&prefix, &rest) != Pattern_Prefix_None;
18381834
break;
18391835

18401836
case OID_INET_SUB_OP:
@@ -1867,42 +1863,53 @@ match_special_index_operator(Expr *clause, Oid opclass,
18671863
case OID_TEXT_ICLIKE_OP:
18681864
case OID_TEXT_REGEXEQ_OP:
18691865
case OID_TEXT_ICREGEXEQ_OP:
1870-
if (!op_in_opclass(find_operator(">=", TEXTOID), opclass) ||
1871-
!op_in_opclass(find_operator("<", TEXTOID), opclass))
1872-
isIndexable = false;
1866+
if (lc_collate_is_c())
1867+
isIndexable = (op_in_opclass(find_operator(">=", TEXTOID), opclass)
1868+
&& op_in_opclass(find_operator("<", TEXTOID), opclass));
1869+
else
1870+
isIndexable = (op_in_opclass(find_operator("~>=~", TEXTOID), opclass)
1871+
&& op_in_opclass(find_operator("~<~", TEXTOID), opclass));
18731872
break;
18741873

18751874
case OID_BYTEA_LIKE_OP:
1876-
if (!op_in_opclass(find_operator(">=", BYTEAOID), opclass) ||
1877-
!op_in_opclass(find_operator("<", BYTEAOID), opclass))
1878-
isIndexable = false;
1875+
isIndexable = (op_in_opclass(find_operator(">=", BYTEAOID), opclass)
1876+
&& op_in_opclass(find_operator("<", BYTEAOID), opclass));
18791877
break;
18801878

18811879
case OID_BPCHAR_LIKE_OP:
18821880
case OID_BPCHAR_ICLIKE_OP:
18831881
case OID_BPCHAR_REGEXEQ_OP:
18841882
case OID_BPCHAR_ICREGEXEQ_OP:
1885-
if (!op_in_opclass(find_operator(">=", BPCHAROID), opclass) ||
1886-
!op_in_opclass(find_operator("<", BPCHAROID), opclass))
1887-
isIndexable = false;
1883+
if (lc_collate_is_c())
1884+
isIndexable = (op_in_opclass(find_operator(">=", BPCHAROID), opclass)
1885+
&& op_in_opclass(find_operator("<", BPCHAROID), opclass));
1886+
else
1887+
isIndexable = (op_in_opclass(find_operator("~>=~", BPCHAROID), opclass)
1888+
&& op_in_opclass(find_operator("~<~", BPCHAROID), opclass));
18881889
break;
18891890

18901891
case OID_VARCHAR_LIKE_OP:
18911892
case OID_VARCHAR_ICLIKE_OP:
18921893
case OID_VARCHAR_REGEXEQ_OP:
18931894
case OID_VARCHAR_ICREGEXEQ_OP:
1894-
if (!op_in_opclass(find_operator(">=", VARCHAROID), opclass) ||
1895-
!op_in_opclass(find_operator("<", VARCHAROID), opclass))
1896-
isIndexable = false;
1895+
if (lc_collate_is_c())
1896+
isIndexable = (op_in_opclass(find_operator(">=", VARCHAROID), opclass)
1897+
&& op_in_opclass(find_operator("<", VARCHAROID), opclass));
1898+
else
1899+
isIndexable = (op_in_opclass(find_operator("~>=~", VARCHAROID), opclass)
1900+
&& op_in_opclass(find_operator("~<~", VARCHAROID), opclass));
18971901
break;
18981902

18991903
case OID_NAME_LIKE_OP:
19001904
case OID_NAME_ICLIKE_OP:
19011905
case OID_NAME_REGEXEQ_OP:
19021906
case OID_NAME_ICREGEXEQ_OP:
1903-
if (!op_in_opclass(find_operator(">=", NAMEOID), opclass) ||
1904-
!op_in_opclass(find_operator("<", NAMEOID), opclass))
1905-
isIndexable = false;
1907+
if (lc_collate_is_c())
1908+
isIndexable = (op_in_opclass(find_operator(">=", NAMEOID), opclass)
1909+
&& op_in_opclass(find_operator("<", NAMEOID), opclass));
1910+
else
1911+
isIndexable = (op_in_opclass(find_operator("~>=~", NAMEOID), opclass)
1912+
&& op_in_opclass(find_operator("~<~", NAMEOID), opclass));
19061913
break;
19071914

19081915
case OID_INET_SUB_OP:
@@ -2039,6 +2046,7 @@ prefix_quals(Node *leftop, Oid expr_op,
20392046
List *result;
20402047
Oid datatype;
20412048
Oid oproid;
2049+
const char *oprname;
20422050
char *prefix;
20432051
Const *con;
20442052
Expr *expr;
@@ -2098,9 +2106,10 @@ prefix_quals(Node *leftop, Oid expr_op,
20982106
*/
20992107
if (pstatus == Pattern_Prefix_Exact)
21002108
{
2101-
oproid = find_operator("=", datatype);
2109+
oprname = (datatype == BYTEAOID || lc_collate_is_c() ? "=" : "~=~");
2110+
oproid = find_operator(oprname, datatype);
21022111
if (oproid == InvalidOid)
2103-
elog(ERROR, "prefix_quals: no = operator for type %u", datatype);
2112+
elog(ERROR, "prefix_quals: no operator %s for type %u", oprname, datatype);
21042113
con = string_to_const(prefix, datatype);
21052114
expr = make_opclause(oproid, BOOLOID, false,
21062115
(Expr *) leftop, (Expr *) con);
@@ -2113,9 +2122,10 @@ prefix_quals(Node *leftop, Oid expr_op,
21132122
*
21142123
* We can always say "x >= prefix".
21152124
*/
2116-
oproid = find_operator(">=", datatype);
2125+
oprname = (datatype == BYTEAOID || lc_collate_is_c() ? ">=" : "~>=~");
2126+
oproid = find_operator(oprname, datatype);
21172127
if (oproid == InvalidOid)
2118-
elog(ERROR, "prefix_quals: no >= operator for type %u", datatype);
2128+
elog(ERROR, "prefix_quals: no operator %s for type %u", oprname, datatype);
21192129
con = string_to_const(prefix, datatype);
21202130
expr = make_opclause(oproid, BOOLOID, false,
21212131
(Expr *) leftop, (Expr *) con);
@@ -2129,9 +2139,10 @@ prefix_quals(Node *leftop, Oid expr_op,
21292139
greaterstr = make_greater_string(con);
21302140
if (greaterstr)
21312141
{
2132-
oproid = find_operator("<", datatype);
2142+
oprname = (datatype == BYTEAOID || lc_collate_is_c() ? "<" : "~<~");
2143+
oproid = find_operator(oprname, datatype);
21332144
if (oproid == InvalidOid)
2134-
elog(ERROR, "prefix_quals: no < operator for type %u", datatype);
2145+
elog(ERROR, "prefix_quals: no operator %s for type %u", oprname, datatype);
21352146
expr = make_opclause(oproid, BOOLOID, false,
21362147
(Expr *) leftop, (Expr *) greaterstr);
21372148
result = lappend(result, expr);

0 commit comments

Comments
 (0)