Skip to content

Commit 014a86a

Browse files
committed
Editorial improvements.
1 parent 74ce5c9 commit 014a86a

File tree

1 file changed

+38
-50
lines changed

1 file changed

+38
-50
lines changed

doc/src/sgml/ref/cluster.sgml

Lines changed: 38 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.18 2002/08/10 21:03:33 momjian Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.19 2002/08/11 02:43:57 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -73,19 +73,6 @@ CLUSTER
7373
</para>
7474
</listitem>
7575
</varlistentry>
76-
<varlistentry>
77-
<term><computeroutput>
78-
ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exist!
79-
</computeroutput></term>
80-
<listitem>
81-
<para>
82-
<comment>
83-
The specified relation was not shown in the error message,
84-
which contained a random string instead of the relation name.
85-
</comment>
86-
</para>
87-
</listitem>
88-
</varlistentry>
8976
</variablelist>
9077
</para>
9178
</refsect2>
@@ -101,7 +88,7 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
10188
<para>
10289
<command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
10390
to cluster the table specified
104-
by <replaceable class="parameter">table</replaceable> approximately
91+
by <replaceable class="parameter">table</replaceable>
10592
based on the index specified by
10693
<replaceable class="parameter">indexname</replaceable>. The index must
10794
already have been defined on
@@ -110,11 +97,11 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
11097

11198
<para>
11299
When a table is clustered, it is physically reordered
113-
based on the index information. The clustering is static.
114-
In other words, as the table is updated, the changes are
115-
not clustered. No attempt is made to keep new instances or
116-
updated tuples clustered. If one wishes, one can
117-
re-cluster manually by issuing the command again.
100+
based on the index information. Clustering is a one-time operation:
101+
when the table is subsequently updated, the changes are
102+
not clustered. That is, no attempt is made to store new or
103+
updated tuples according to their index order. If one wishes, one can
104+
periodically re-cluster by issuing the command again.
118105
</para>
119106

120107
<refsect2 id="R2-SQL-CLUSTER-3">
@@ -146,49 +133,50 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
146133
</para>
147134

148135
<para>
149-
There are two ways to cluster data. The first is with the
150-
<command>CLUSTER</command> command, which reorders the original table with
136+
During the cluster operation, a temporary copy of the table is created
137+
that contains the table data in the index order. Temporary copies of
138+
each index on the table are created as well. Therefore, you need free
139+
space on disk at least equal to the sum of the table size and the index
140+
sizes.
141+
</para>
142+
143+
<para>
144+
CLUSTER preserves GRANT, inheritance, index, foreign key, and other
145+
ancillary information about the table.
146+
</para>
147+
148+
<para>
149+
Because the optimizer records statistics about the ordering of tables, it
150+
is advisable to run <command>ANALYZE</command> on the newly clustered
151+
table. Otherwise, the optimizer may make poor choices of query plans.
152+
</para>
153+
154+
<para>
155+
There is another way to cluster data. The
156+
<command>CLUSTER</command> command reorders the original table using
151157
the ordering of the index you specify. This can be slow
152158
on large tables because the rows are fetched from the heap
153159
in index order, and if the heap table is unordered, the
154160
entries are on random pages, so there is one disk page
155-
retrieved for every row moved. <productname>PostgreSQL</productname> has a cache,
156-
but the majority of a big table will not fit in the cache.
157-
</para>
158-
159-
<para>
160-
Another way to cluster data is to use
161+
retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache,
162+
but the majority of a big table will not fit in the cache.)
163+
The other way to cluster a table is to use
161164

162165
<programlisting>
163166
SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <replaceable class="parameter">newtable</replaceable>
164167
FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>
165168
</programlisting>
166169

167170
which uses the <productname>PostgreSQL</productname> sorting code in
168-
the ORDER BY clause to match the index, and which is much faster for
171+
the ORDER BY clause to create the desired order; this is usually much
172+
faster than an indexscan for
169173
unordered data. You then drop the old table, use
170174
<command>ALTER TABLE...RENAME</command>
171175
to rename <replaceable class="parameter">newtable</replaceable> to the old name, and
172-
recreate the table's indexes. The only problem is that <acronym>OID</acronym>s
173-
will not be preserved. From then on, <command>CLUSTER</command> should be
174-
fast because most of the heap data has already been
175-
ordered, and the existing index is used.
176-
</para>
177-
178-
<para>
179-
During the cluster operation, a temporal table is created that contains
180-
the table in the index order. Due to this, you need to have free space
181-
on disk at least the size of the table itself, or the biggest index if
182-
you have one that is larger than the table.
183-
</para>
184-
185-
<para>
186-
CLUSTER preserves GRANT, inheritance index, and foreign key information.
187-
</para>
188-
189-
<para>
190-
Because the optimizer records the cluster status of tables, it is
191-
advised to run <command>ANALYZE</command> on the newly clustered table.
176+
recreate the table's indexes. However, this approach does not preserve
177+
OIDs, constraints, foreign key relationships, granted privileges, and
178+
other ancillary properties of the table --- all such items must be
179+
manually recreated.
192180
</para>
193181

194182
</refsect2>
@@ -199,7 +187,7 @@ SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <repla
199187
Usage
200188
</title>
201189
<para>
202-
Cluster the employees relation on the basis of its salary attribute:
190+
Cluster the employees relation on the basis of its ID attribute:
203191
</para>
204192
<programlisting>
205193
CLUSTER emp_ind ON emp;

0 commit comments

Comments
 (0)