Skip to content

Commit 10c70b8

Browse files
committed
Add note about space usage of 'manual' approach to clustering, per
suggestion from Sergey Koposov. Also some other minor editing.
1 parent 6fada49 commit 10c70b8

File tree

1 file changed

+20
-17
lines changed

1 file changed

+20
-17
lines changed

doc/src/sgml/ref/cluster.sgml

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.37 2006/10/31 01:52:31 neilc Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.38 2006/11/04 19:03:51 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -108,8 +108,8 @@ CLUSTER
108108
If you are requesting a range of indexed values from a table, or a
109109
single indexed value that has multiple rows that match,
110110
<command>CLUSTER</command> will help because once the index identifies the
111-
heap page for the first row that matches, all other rows
112-
that match are probably already on the same heap page,
111+
table page for the first row that matches, all other rows
112+
that match are probably already on the same table page,
113113
and so you save disk accesses and speed up the query.
114114
</para>
115115

@@ -137,30 +137,33 @@ CLUSTER
137137

138138
<para>
139139
There is another way to cluster data. The
140-
<command>CLUSTER</command> command reorders the original table using
141-
the ordering of the index you specify. This can be slow
142-
on large tables because the rows are fetched from the heap
143-
in index order, and if the heap table is unordered, the
140+
<command>CLUSTER</command> command reorders the original table by
141+
scanning it using the index you specify. This can be slow
142+
on large tables because the rows are fetched from the table
143+
in index order, and if the table is disordered, the
144144
entries are on random pages, so there is one disk page
145-
retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache,
146-
but the majority of a big table will not fit in the cache.)
145+
retrieved for every row moved. (<productname>PostgreSQL</productname> has
146+
a cache, but the majority of a big table will not fit in the cache.)
147147
The other way to cluster a table is to use
148148

149149
<programlisting>
150150
CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
151-
SELECT <replaceable class="parameter">columnlist</replaceable> FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
151+
SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
152152
</programlisting>
153153

154-
which uses the <productname>PostgreSQL</productname> sorting code in
155-
the <literal>ORDER BY</literal> clause to create the desired order; this is usually much
156-
faster than an index scan for
157-
unordered data. You then drop the old table, use
154+
which uses the <productname>PostgreSQL</productname> sorting code
155+
to produce the desired order;
156+
this is usually much faster than an index scan for disordered data.
157+
Then you drop the old table, use
158158
<command>ALTER TABLE ... RENAME</command>
159-
to rename <replaceable class="parameter">newtable</replaceable> to the old name, and
160-
recreate the table's indexes. However, this approach does not preserve
159+
to rename <replaceable class="parameter">newtable</replaceable> to the
160+
old name, and recreate the table's indexes.
161+
The big disadvantage of this approach is that it does not preserve
161162
OIDs, constraints, foreign key relationships, granted privileges, and
162163
other ancillary properties of the table &mdash; all such items must be
163-
manually recreated.
164+
manually recreated. Another disadvantage is that this way requires a sort
165+
temporary file about the same size as the table itself, so peak disk usage
166+
is about three times the table size instead of twice the table size.
164167
</para>
165168
</refsect1>
166169

0 commit comments

Comments
 (0)