@@ -110,75 +110,71 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
110
110
FROM
111
111
generate_series(1,10000000);
112
112
SELECT 10000000
113
- =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
114
- CREATE INDEX
115
- =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
116
- pg_size_pretty
117
- ----------------
118
- 153 MB
119
- (1 row)
120
- =# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
121
- CREATE INDEX
122
- =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
123
- pg_size_pretty
124
- ----------------
125
- 387 MB
126
- (1 row)
127
113
</programlisting>
128
114
129
115
<para>
130
116
A sequential scan over this large table takes a long time:
131
117
<programlisting>
132
118
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
133
- QUERY PLAN
134
- ------------------------------------------------------------------------------------------------------------
135
- Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
119
+ EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
120
+ QUERY PLAN
121
+ ------------------------------------------------------------------------------------------------------
122
+ Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=14.894..14.895 rows=0 loops=1)
136
123
Filter: ((i2 = 898732) AND (i5 = 123451))
137
- Rows Removed by Filter: 10000000
138
- Planning time : 0.177 ms
139
- Execution time: 1445.473 ms
124
+ Rows Removed by Filter: 100000
125
+ Planning Time : 0.350 ms
126
+ Execution Time: 14.916 ms
140
127
(5 rows)
141
128
</programlisting>
142
129
</para>
143
130
144
131
<para>
145
- So the planner will usually select an index scan if possible.
146
- With a btree index, we get results like this :
132
+ Even with the btree index defined the result will still be a
133
+ sequential scan :
147
134
<programlisting>
135
+ =# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
136
+ CREATE INDEX
137
+ =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
138
+ pg_size_pretty
139
+ ----------------
140
+ 3992 kB
141
+ (1 row)
148
142
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
149
- QUERY PLAN
150
- --------------------------------------------------------------------------------------------------------------------------------
151
- Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
152
- Index Cond : ((i2 = 898732) AND (i5 = 123451))
153
- Heap Fetches: 0
154
- Planning time : 0.193 ms
155
- Execution time: 445.770 ms
143
+ QUERY PLAN
144
+ ------------------------------------------------------------------------------------------------------
145
+ Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.004..12.004 rows=0 loops=1)
146
+ Filter : ((i2 = 898732) AND (i5 = 123451))
147
+ Rows Removed by Filter: 100000
148
+ Planning Time : 0.157 ms
149
+ Execution Time: 12.019 ms
156
150
(5 rows)
157
151
</programlisting>
158
152
</para>
159
153
160
154
<para>
161
- Bloom is better than btree in handling this type of search:
155
+ Having the bloom index defined on the table is better than btree in
156
+ handling this type of search:
162
157
<programlisting>
158
+ =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
159
+ CREATE INDEX
160
+ =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
161
+ pg_size_pretty
162
+ ----------------
163
+ 1584 kB
164
+ (1 row)
163
165
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
164
- QUERY PLAN
165
- ---------------------------------------------------------------------------------------------------------------------------
166
- Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
166
+ QUERY PLAN
167
+ ---------------------------------------------------------------------------------------------------------------------
168
+ Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.390..0.391 rows=0 loops=1)
167
169
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
168
- Rows Removed by Index Recheck: 2439
169
- Heap Blocks: exact=2408
170
- -> Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
170
+ Rows Removed by Index Recheck: 19
171
+ Heap Blocks: exact=19
172
+ -> Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.361..0.361 rows=19 loops=1)
171
173
Index Cond: ((i2 = 898732) AND (i5 = 123451))
172
- Planning time : 0.475 ms
173
- Execution time: 76.778 ms
174
+ Planning Time : 0.128 ms
175
+ Execution Time: 0.415 ms
174
176
(8 rows)
175
177
</programlisting>
176
- Note the relatively large number of false positives: 2439 rows were
177
- selected to be visited in the heap, but none actually matched the
178
- query. We could reduce that by specifying a larger signature length.
179
- In this example, creating the index with <literal>length=200</>
180
- reduced the number of false positives to 55; but it doubled the index size
181
- (to 306 MB) and ended up being slower for this query (125 ms overall).
182
178
</para>
183
179
184
180
<para>
@@ -187,24 +183,36 @@ CREATE INDEX
187
183
A better strategy for btree is to create a separate index on each column.
188
184
Then the planner will choose something like this:
189
185
<programlisting>
186
+ =# CREATE INDEX btreeidx1 ON tbloom (i1);
187
+ CREATE INDEX
188
+ =# CREATE INDEX btreeidx2 ON tbloom (i2);
189
+ CREATE INDEX
190
+ =# CREATE INDEX btreeidx3 ON tbloom (i3);
191
+ CREATE INDEX
192
+ =# CREATE INDEX btreeidx4 ON tbloom (i4);
193
+ CREATE INDEX
194
+ =# CREATE INDEX btreeidx5 ON tbloom (i5);
195
+ CREATE INDEX
196
+ =# CREATE INDEX btreeidx6 ON tbloom (i6);
197
+ CREATE INDEX
190
198
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
191
- QUERY PLAN
192
- ------------------------------------------------------------------------------------------------------------------------------
193
- Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148 ..0.148 rows=0 loops=1)
199
+ QUERY PLAN
200
+ ---------------------------------------------------------------------------------------------------------------------------
201
+ Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.031 ..0.032 rows=0 loops=1)
194
202
Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
195
- -> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145 ..0.145 rows=0 loops=1)
196
- -> Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089 ..0.089 rows=10 loops=1)
203
+ -> BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.028 ..0.029 rows=0 loops=1)
204
+ -> Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.028 ..0.028 rows=0 loops=1)
197
205
Index Cond: (i5 = 123451)
198
- -> Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1 )
206
+ -> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed )
199
207
Index Cond: (i2 = 898732)
200
- Planning time: 2.049 ms
201
- Execution time : 0.280 ms
208
+ Planning Time: 0.474 ms
209
+ Execution Time : 0.064 ms
202
210
(9 rows)
203
211
</programlisting>
204
212
Although this query runs much faster than with either of the single
205
- indexes, we pay a large penalty in index size. Each of the single-column
206
- btree indexes occupies 214 MB, so the total space needed is over 1.2GB ,
207
- more than 8 times the space used by the bloom index.
213
+ indexes, we pay a penalty in index size. Each of the single-column
214
+ btree indexes occupies 2 MB, so the total space needed is 12 MB ,
215
+ eight times the space used by the bloom index.
208
216
</para>
209
217
</sect2>
210
218
0 commit comments