|
1 |
| -<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.14 2008/04/14 17:05:32 tgl Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.15 2008/05/16 16:31:01 tgl Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="GIN">
|
4 | 4 | <title>GIN Indexes</title>
|
|
52 | 52 | </para>
|
53 | 53 |
|
54 | 54 | <para>
|
55 |
| - All it takes to get a <acronym>GIN</acronym> access method working |
56 |
| - is to implement four user-defined methods, which define the behavior of |
| 55 | + All it takes to get a <acronym>GIN</acronym> access method working is to |
| 56 | + implement four (or five) user-defined methods, which define the behavior of |
57 | 57 | keys in the tree and the relationships between keys, indexed values,
|
58 | 58 | and indexable queries. In short, <acronym>GIN</acronym> combines
|
59 | 59 | extensibility with generality, code reuse, and a clean interface.
|
60 | 60 | </para>
|
61 | 61 |
|
62 | 62 | <para>
|
63 |
| - The four methods that an index operator class for |
| 63 | + The four methods that an operator class for |
64 | 64 | <acronym>GIN</acronym> must provide are:
|
65 | 65 | </para>
|
66 | 66 |
|
|
77 | 77 | </varlistentry>
|
78 | 78 |
|
79 | 79 | <varlistentry>
|
80 |
| - <term>Datum* extractValue(Datum inputValue, int32 *nkeys)</term> |
| 80 | + <term>Datum *extractValue(Datum inputValue, int32 *nkeys)</term> |
81 | 81 | <listitem>
|
82 | 82 | <para>
|
83 | 83 | Returns an array of keys given a value to be indexed. The
|
|
87 | 87 | </varlistentry>
|
88 | 88 |
|
89 | 89 | <varlistentry>
|
90 |
| - <term>Datum* extractQuery(Datum query, int32 *nkeys, |
91 |
| - StrategyNumber n)</term> |
| 90 | + <term>Datum *extractQuery(Datum query, int32 *nkeys, |
| 91 | + StrategyNumber n, bool **pmatch)</term> |
92 | 92 | <listitem>
|
93 | 93 | <para>
|
94 | 94 | Returns an array of keys given a value to be queried; that is,
|
|
100 | 100 | to consult <literal>n</> to determine the data type of
|
101 | 101 | <literal>query</> and the key values that need to be extracted.
|
102 | 102 | The number of returned keys must be stored into <literal>*nkeys</>.
|
103 |
| - If number of keys is equal to zero then <function>extractQuery</> |
104 |
| - should store 0 or -1 into <literal>*nkeys</>. 0 means that any |
105 |
| - row matches the <literal>query</> and sequence scan should be |
106 |
| - produced. -1 means nothing can satisfy <literal>query</>. |
107 |
| - Choice of value should be based on semantics meaning of operation with |
108 |
| - given strategy number. |
| 103 | + If the query contains no keys then <function>extractQuery</> |
| 104 | + should store 0 or -1 into <literal>*nkeys</>, depending on the |
| 105 | + semantics of the operator. 0 means that every |
| 106 | + value matches the <literal>query</> and a sequential scan should be |
| 107 | + produced. -1 means nothing can match the <literal>query</>. |
| 108 | + <literal>pmatch</> is an output argument for use when partial match |
| 109 | + is supported. To use it, <function>extractQuery</> must allocate |
| 110 | + an array of <literal>*nkeys</> booleans and store its address at |
| 111 | + <literal>*pmatch</>. Each element of the array should be set to TRUE |
| 112 | + if the corresponding key requires partial match, FALSE if not. |
| 113 | + If <literal>*pmatch</> is set to NULL then GIN assumes partial match |
| 114 | + is not required. The variable is initialized to NULL before call, |
| 115 | + so this argument can simply be ignored by operator classes that do |
| 116 | + not support partial match. |
109 | 117 | </para>
|
| 118 | + |
110 | 119 | </listitem>
|
111 | 120 | </varlistentry>
|
112 | 121 |
|
|
133 | 142 |
|
134 | 143 | </variablelist>
|
135 | 144 |
|
| 145 | + <para> |
| 146 | + Optionally, an operator class for |
| 147 | + <acronym>GIN</acronym> can supply a fifth method: |
| 148 | + </para> |
| 149 | + |
| 150 | + <variablelist> |
| 151 | + |
| 152 | + <varlistentry> |
| 153 | + <term>int comparePartial(Datum partial_key, Datum key, StrategyNumber n)</term> |
| 154 | + <listitem> |
| 155 | + <para> |
| 156 | + Compare a partial-match query to an index key. Returns an integer |
| 157 | + whose sign indicates the result: less than zero means the index key |
| 158 | + does not match the query, but the index scan should continue; zero |
| 159 | + means that the index key does match the query; greater than zero |
| 160 | + indicates that the index scan should stop because no more matches |
| 161 | + are possible. The strategy number <literal>n</> of the operator |
| 162 | + that generated the partial match query is provided, in case its |
| 163 | + semantics are needed to determine when to end the scan. |
| 164 | + </para> |
| 165 | + </listitem> |
| 166 | + </varlistentry> |
| 167 | + |
| 168 | + </variablelist> |
| 169 | + |
| 170 | + <para> |
| 171 | + To support <quote>partial match</> queries, an operator class must |
| 172 | + provide the <function>comparePartial</> method, and its |
| 173 | + <function>extractQuery</> method must set the <literal>pmatch</> |
| 174 | + parameter when a partial-match query is encountered. See |
| 175 | + <xref linkend="gin-partial-match"> for details. |
| 176 | + </para> |
| 177 | + |
136 | 178 | </sect1>
|
137 | 179 |
|
138 | 180 | <sect1 id="gin-implementation">
|
|
146 | 188 | list of heap pointers (PL, posting list) if the list is small enough.
|
147 | 189 | </para>
|
148 | 190 |
|
| 191 | + <sect2 id="gin-partial-match"> |
| 192 | + <title>Partial match algorithm</title> |
| 193 | + |
| 194 | + <para> |
| 195 | + GIN can support <quote>partial match</> queries, in which the query |
| 196 | + does not determine an exact match for one or more keys, but the possible |
| 197 | + matches fall within a reasonably narrow range of key values (within the |
| 198 | + key sorting order determined by the <function>compare</> support method). |
| 199 | + The <function>extractQuery</> method, instead of returning a key value |
| 200 | + to be matched exactly, returns a key value that is the lower bound of |
| 201 | + the range to be searched, and sets the <literal>pmatch</> flag true. |
| 202 | + The key range is then searched using the <function>comparePartial</> |
| 203 | + method. <function>comparePartial</> must return zero for an actual |
| 204 | + match, less than zero for a non-match that is still within the range |
| 205 | + to be searched, or greater than zero if the index key is past the range |
| 206 | + that could match. |
| 207 | + </para> |
| 208 | + |
| 209 | + <para> |
| 210 | + During a partial-match scan, all <literal>itemPointer</>s for matching keys |
| 211 | + are OR'ed into a <literal>TIDBitmap</>. |
| 212 | + The scan fails if the <literal>TIDBitmap</> becomes lossy. |
| 213 | + In this case an error message will be reported with advice |
| 214 | + to increase <literal>work_mem</>. |
| 215 | + </para> |
| 216 | + </sect2> |
| 217 | + |
149 | 218 | </sect1>
|
150 | 219 |
|
151 | 220 | <sect1 id="gin-tips">
|
|
236 | 305 | </para>
|
237 | 306 |
|
238 | 307 | <para>
|
239 |
| - <acronym>GIN</acronym> searches keys only by equality matching. This might |
240 |
| - be improved in future. |
| 308 | + It is possible for an operator class to circumvent the restriction against |
| 309 | + full index scan. To do that, <function>extractValue</> must return at least |
| 310 | + one (possibly dummy) key for every indexed value, and |
| 311 | + <function>extractQuery</function> must convert an unrestricted search into |
| 312 | + a partial-match query that will scan the whole index. This is inefficient |
| 313 | + but might be necessary to avoid corner-case failures with operators such |
| 314 | + as LIKE. Note however that failure could still occur if the intermediate |
| 315 | + <literal>TIDBitmap</> becomes lossy. |
241 | 316 | </para>
|
242 | 317 | </sect1>
|
243 | 318 |
|
|
247 | 322 | <para>
|
248 | 323 | The <productname>PostgreSQL</productname> source distribution includes
|
249 | 324 | <acronym>GIN</acronym> operator classes for <type>tsvector</> and
|
250 |
| - for one-dimensional arrays of all internal types. The following |
251 |
| - <filename>contrib</> modules also contain <acronym>GIN</acronym> |
252 |
| - operator classes: |
| 325 | + for one-dimensional arrays of all internal types. Prefix searching in |
| 326 | + <type>tsvector</> is implemented using the <acronym>GIN</> partial match |
| 327 | + feature. |
| 328 | + The following <filename>contrib</> modules also contain |
| 329 | + <acronym>GIN</acronym> operator classes: |
253 | 330 | </para>
|
254 | 331 |
|
255 | 332 | <variablelist>
|
|
0 commit comments