Skip to content

Commit b82323e

Browse files
committed
This adds mention of my latest tweak to the tsearch2/pg_trgm
integration. It is much better to create a word list of unstemmed words than stemmed ones. Chris K-L
1 parent c2e5631 commit b82323e

File tree

1 file changed

+9
-5
lines changed

1 file changed

+9
-5
lines changed

contrib/pg_trgm/README.pg_trgm

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,15 @@ Tsearch2 Integration
100100
The first step is to generate an auxiliary table containing all
101101
the unique words in the Tsearch2 index:
102102

103-
CREATE TABLE words AS
104-
SELECT word FROM stat('SELECT vector FROM documents');
105-
106-
Where 'documents' is the table that contains the Tsearch2 index
107-
column 'vector', of type 'tsvector'.
103+
CREATE TABLE words AS SELECT word FROM
104+
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
105+
106+
Where 'documents' is a table that has a text field 'bodytext'
107+
that TSearch2 is used to search. The use of the 'simple' dictionary
108+
with the to_tsvector function, instead of just using the already
109+
existing vector is to avoid creating a list of already stemmed
110+
words. This way, only the original, unstemmed words are added
111+
to the word list.
108112

109113
Next, create a trigram index on the word column:
110114

0 commit comments

Comments
 (0)