Skip to content

Commit 3714665

Browse files
committed
doc: make blooms docs match reality
Parallel execution changed the way bloom queries are executed, so update the EXPLAIN output, and restructure the docs to be clearer and more accurate. Reported-by: Daniel Westermann Discussion: https://postgr.es/m/ZR0P278MB0122119FAE78721A694C30C8D2340@ZR0P278MB0122.CHEP278.PROD.OUTLOOK.COM Author: Daniel Westermann and me Backpatch-through: 9.6
1 parent f38b66e commit 3714665

File tree

1 file changed

+63
-55
lines changed

1 file changed

+63
-55
lines changed

doc/src/sgml/bloom.sgml

Lines changed: 63 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -110,75 +110,71 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
110110
FROM
111111
generate_series(1,10000000);
112112
SELECT 10000000
113-
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
114-
CREATE INDEX
115-
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
116-
pg_size_pretty
117-
----------------
118-
153 MB
119-
(1 row)
120-
=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
121-
CREATE INDEX
122-
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
123-
pg_size_pretty
124-
----------------
125-
387 MB
126-
(1 row)
127113
</programlisting>
128114

129115
<para>
130116
A sequential scan over this large table takes a long time:
131117
<programlisting>
132118
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
133-
QUERY PLAN
134-
------------------------------------------------------------------------------------------------------------
135-
Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
119+
EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
120+
QUERY PLAN
121+
------------------------------------------------------------------------------------------------------
122+
Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=14.894..14.895 rows=0 loops=1)
136123
Filter: ((i2 = 898732) AND (i5 = 123451))
137-
Rows Removed by Filter: 10000000
138-
Planning time: 0.177 ms
139-
Execution time: 1445.473 ms
124+
Rows Removed by Filter: 100000
125+
Planning Time: 0.350 ms
126+
Execution Time: 14.916 ms
140127
(5 rows)
141128
</programlisting>
142129
</para>
143130

144131
<para>
145-
So the planner will usually select an index scan if possible.
146-
With a btree index, we get results like this:
132+
Even with the btree index defined the result will still be a
133+
sequential scan:
147134
<programlisting>
135+
=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
136+
CREATE INDEX
137+
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
138+
pg_size_pretty
139+
----------------
140+
3992 kB
141+
(1 row)
148142
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
149-
QUERY PLAN
150-
--------------------------------------------------------------------------------------------------------------------------------
151-
Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
152-
Index Cond: ((i2 = 898732) AND (i5 = 123451))
153-
Heap Fetches: 0
154-
Planning time: 0.193 ms
155-
Execution time: 445.770 ms
143+
QUERY PLAN
144+
------------------------------------------------------------------------------------------------------
145+
Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.004..12.004 rows=0 loops=1)
146+
Filter: ((i2 = 898732) AND (i5 = 123451))
147+
Rows Removed by Filter: 100000
148+
Planning Time: 0.157 ms
149+
Execution Time: 12.019 ms
156150
(5 rows)
157151
</programlisting>
158152
</para>
159153

160154
<para>
161-
Bloom is better than btree in handling this type of search:
155+
Having the bloom index defined on the table is better than btree in
156+
handling this type of search:
162157
<programlisting>
158+
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
159+
CREATE INDEX
160+
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
161+
pg_size_pretty
162+
----------------
163+
1584 kB
164+
(1 row)
163165
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
164-
QUERY PLAN
165-
---------------------------------------------------------------------------------------------------------------------------
166-
Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
166+
QUERY PLAN
167+
---------------------------------------------------------------------------------------------------------------------
168+
Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.390..0.391 rows=0 loops=1)
167169
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
168-
Rows Removed by Index Recheck: 2439
169-
Heap Blocks: exact=2408
170-
-&gt; Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
170+
Rows Removed by Index Recheck: 19
171+
Heap Blocks: exact=19
172+
-&gt; Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.361..0.361 rows=19 loops=1)
171173
Index Cond: ((i2 = 898732) AND (i5 = 123451))
172-
Planning time: 0.475 ms
173-
Execution time: 76.778 ms
174+
Planning Time: 0.128 ms
175+
Execution Time: 0.415 ms
174176
(8 rows)
175177
</programlisting>
176-
Note the relatively large number of false positives: 2439 rows were
177-
selected to be visited in the heap, but none actually matched the
178-
query. We could reduce that by specifying a larger signature length.
179-
In this example, creating the index with <literal>length=200</>
180-
reduced the number of false positives to 55; but it doubled the index size
181-
(to 306 MB) and ended up being slower for this query (125 ms overall).
182178
</para>
183179

184180
<para>
@@ -187,24 +183,36 @@ CREATE INDEX
187183
A better strategy for btree is to create a separate index on each column.
188184
Then the planner will choose something like this:
189185
<programlisting>
186+
=# CREATE INDEX btreeidx1 ON tbloom (i1);
187+
CREATE INDEX
188+
=# CREATE INDEX btreeidx2 ON tbloom (i2);
189+
CREATE INDEX
190+
=# CREATE INDEX btreeidx3 ON tbloom (i3);
191+
CREATE INDEX
192+
=# CREATE INDEX btreeidx4 ON tbloom (i4);
193+
CREATE INDEX
194+
=# CREATE INDEX btreeidx5 ON tbloom (i5);
195+
CREATE INDEX
196+
=# CREATE INDEX btreeidx6 ON tbloom (i6);
197+
CREATE INDEX
190198
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
191-
QUERY PLAN
192-
------------------------------------------------------------------------------------------------------------------------------
193-
Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148..0.148 rows=0 loops=1)
199+
QUERY PLAN
200+
---------------------------------------------------------------------------------------------------------------------------
201+
Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.031..0.032 rows=0 loops=1)
194202
Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
195-
-&gt; BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145..0.145 rows=0 loops=1)
196-
-&gt; Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089..0.089 rows=10 loops=1)
203+
-&gt; BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.028..0.029 rows=0 loops=1)
204+
-&gt; Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.028..0.028 rows=0 loops=1)
197205
Index Cond: (i5 = 123451)
198-
-&gt; Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1)
206+
-&gt; Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed)
199207
Index Cond: (i2 = 898732)
200-
Planning time: 2.049 ms
201-
Execution time: 0.280 ms
208+
Planning Time: 0.474 ms
209+
Execution Time: 0.064 ms
202210
(9 rows)
203211
</programlisting>
204212
Although this query runs much faster than with either of the single
205-
indexes, we pay a large penalty in index size. Each of the single-column
206-
btree indexes occupies 214 MB, so the total space needed is over 1.2GB,
207-
more than 8 times the space used by the bloom index.
213+
indexes, we pay a penalty in index size. Each of the single-column
214+
btree indexes occupies 2 MB, so the total space needed is 12 MB,
215+
eight times the space used by the bloom index.
208216
</para>
209217
</sect2>
210218

0 commit comments

Comments
 (0)