Skip to content

Commit f5bc741

Browse files
committed
Make GEQO's planning deterministic by having it start from a predictable
random number seed each time. This is how it used to work years ago, but we got rid of the seed reset because it was resetting the main random() sequence and thus having undesirable effects on the rest of the system. To fix, establish a private random number state for each execution of geqo(), and initialize the state using the new GUC variable geqo_seed. People who want to experiment with different random searches can do so by changing geqo_seed, but you'll always get the same plan for the same value of geqo_seed (if holding all other planner inputs constant, of course). The new state is kept in PlannerInfo by adding a "void *" field reserved for use by join_search hooks. Most of the rather bulky code changes in this commit are just arranging to pass PlannerInfo around to all the GEQO functions (many of which formerly didn't receive it). Andres Freund, with some editorialization by Tom
1 parent c43feef commit f5bc741

27 files changed

+305
-202
lines changed

doc/src/sgml/config.sgml

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.221 2009/07/03 19:14:25 petere Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.222 2009/07/16 20:55:44 tgl Exp $ -->
22

33
<chapter Id="runtime-config">
44
<title>Server Configuration</title>
@@ -2149,7 +2149,23 @@ archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows
21492149
</para>
21502150
</listitem>
21512151
</varlistentry>
2152-
2152+
2153+
<varlistentry id="guc-geqo-seed" xreflabel="geqo_seed">
2154+
<term><varname>geqo_seed</varname> (<type>floating point</type>)</term>
2155+
<indexterm>
2156+
<primary><varname>geqo_seed</> configuration parameter</primary>
2157+
</indexterm>
2158+
<listitem>
2159+
<para>
2160+
Controls the initial value of the random number generator used
2161+
by GEQO to select random paths through the join order search space.
2162+
The value can range from zero (the default) to one. Varying the
2163+
value changes the set of join paths explored, and may result in a
2164+
better or worse best path being found.
2165+
</para>
2166+
</listitem>
2167+
</varlistentry>
2168+
21532169
</variablelist>
21542170
</sect2>
21552171
<sect2 id="runtime-config-query-other">

doc/src/sgml/geqo.sgml

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.40 2007/07/21 04:02:41 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.41 2009/07/16 20:55:44 tgl Exp $ -->
22

33
<chapter id="geqo">
44
<chapterinfo>
@@ -49,7 +49,7 @@
4949
methods</firstterm> (e.g., nested loop, hash join, merge join in
5050
<productname>PostgreSQL</productname>) to process individual joins
5151
and a diversity of <firstterm>indexes</firstterm> (e.g.,
52-
B-tree, hash, GiST and GIN in <productname>PostgreSQL</productname>) as
52+
B-tree, hash, GiST and GIN in <productname>PostgreSQL</productname>) as
5353
access paths for relations.
5454
</para>
5555

@@ -88,8 +88,7 @@
8888

8989
<para>
9090
The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
91-
operates through
92-
nondeterministic, randomized search. The set of possible solutions for the
91+
operates through randomized search. The set of possible solutions for the
9392
optimization problem is considered as a
9493
<firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
9594
The degree of adaptation of an individual to its environment is specified
@@ -116,7 +115,7 @@
116115
According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
117116
strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
118117
problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
119-
non-random (better than random).
118+
non-random (better than random).
120119
</para>
121120

122121
<figure id="geqo-diagram">
@@ -260,9 +259,13 @@
260259
<para>
261260
This process is inherently nondeterministic, because of the randomized
262261
choices made during both the initial population selection and subsequent
263-
<quote>mutation</> of the best candidates. Hence different plans may
264-
be selected from one run to the next, resulting in varying run time
265-
and varying output row order.
262+
<quote>mutation</> of the best candidates. To avoid surprising changes
263+
of the selected plan, each run of the GEQO algorithm restarts its
264+
random number generator with the current <xref linkend="guc-geqo-seed">
265+
parameter setting. As long as <varname>geqo_seed</> and the other
266+
GEQO parameters are kept fixed, the same plan will be generated for a
267+
given query (and other planner inputs such as statistics). To experiment
268+
with different search paths, try changing <varname>geqo_seed</>.
266269
</para>
267270

268271
</sect2>
@@ -330,7 +333,7 @@
330333
url="news://comp.ai.genetic"></ulink>)
331334
</para>
332335
</listitem>
333-
336+
334337
<listitem>
335338
<para>
336339
<ulink url="http://www.red3d.com/cwr/evolve.html">

src/backend/optimizer/geqo/Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
#
66
# Copyright (c) 1994, Regents of the University of California
77
#
8-
# $PostgreSQL: pgsql/src/backend/optimizer/geqo/Makefile,v 1.20 2008/02/19 10:30:07 petere Exp $
8+
# $PostgreSQL: pgsql/src/backend/optimizer/geqo/Makefile,v 1.21 2009/07/16 20:55:44 tgl Exp $
99
#
1010
#-------------------------------------------------------------------------
1111

@@ -14,7 +14,7 @@ top_builddir = ../../../..
1414
include $(top_builddir)/src/Makefile.global
1515

1616
OBJS = geqo_copy.o geqo_eval.o geqo_main.o geqo_misc.o \
17-
geqo_mutation.o geqo_pool.o geqo_recombination.o \
17+
geqo_mutation.o geqo_pool.o geqo_random.o geqo_recombination.o \
1818
geqo_selection.o \
1919
geqo_erx.o geqo_pmx.o geqo_cx.o geqo_px.o geqo_ox1.o geqo_ox2.o
2020

src/backend/optimizer/geqo/geqo_copy.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
* Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
66
* Portions Copyright (c) 1994, Regents of the University of California
77
*
8-
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_copy.c,v 1.19 2009/01/01 17:23:43 momjian Exp $
8+
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_copy.c,v 1.20 2009/07/16 20:55:44 tgl Exp $
99
*
1010
*-------------------------------------------------------------------------
1111
*/
@@ -42,7 +42,8 @@
4242
*
4343
*/
4444
void
45-
geqo_copy(Chromosome *chromo1, Chromosome *chromo2, int string_length)
45+
geqo_copy(PlannerInfo *root, Chromosome *chromo1, Chromosome *chromo2,
46+
int string_length)
4647
{
4748
int i;
4849

src/backend/optimizer/geqo/geqo_cx.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
* CX operator according to Oliver et al
77
* (Proc 2nd Int'l Conf on GA's)
88
*
9-
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_cx.c,v 1.10 2003/11/29 22:39:49 pgsql Exp $
9+
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_cx.c,v 1.11 2009/07/16 20:55:44 tgl Exp $
1010
*
1111
*-------------------------------------------------------------------------
1212
*/
@@ -44,7 +44,8 @@
4444
* cycle crossover
4545
*/
4646
int
47-
cx(Gene *tour1, Gene *tour2, Gene *offspring, int num_gene, City *city_table)
47+
cx(PlannerInfo *root, Gene *tour1, Gene *tour2, Gene *offspring,
48+
int num_gene, City *city_table)
4849
{
4950

5051
int i,
@@ -62,7 +63,7 @@ cx(Gene *tour1, Gene *tour2, Gene *offspring, int num_gene, City *city_table)
6263
}
6364

6465
/* choose random cycle starting position */
65-
start_pos = geqo_randint(num_gene - 1, 0);
66+
start_pos = geqo_randint(root, num_gene - 1, 0);
6667

6768
/* child inherits first city */
6869
offspring[start_pos] = tour1[start_pos];

src/backend/optimizer/geqo/geqo_erx.c

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
* geqo_erx.c
44
* edge recombination crossover [ER]
55
*
6-
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_erx.c,v 1.20 2005/10/15 02:49:19 momjian Exp $
6+
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_erx.c,v 1.21 2009/07/16 20:55:44 tgl Exp $
77
*
88
*-------------------------------------------------------------------------
99
*/
@@ -36,11 +36,11 @@
3636
#include "optimizer/geqo_random.h"
3737

3838

39-
static int gimme_edge(Gene gene1, Gene gene2, Edge *edge_table);
40-
static void remove_gene(Gene gene, Edge edge, Edge *edge_table);
41-
static Gene gimme_gene(Edge edge, Edge *edge_table);
39+
static int gimme_edge(PlannerInfo *root, Gene gene1, Gene gene2, Edge *edge_table);
40+
static void remove_gene(PlannerInfo *root, Gene gene, Edge edge, Edge *edge_table);
41+
static Gene gimme_gene(PlannerInfo *root, Edge edge, Edge *edge_table);
4242

43-
static Gene edge_failure(Gene *gene, int index, Edge *edge_table, int num_gene);
43+
static Gene edge_failure(PlannerInfo *root, Gene *gene, int index, Edge *edge_table, int num_gene);
4444

4545

4646
/* alloc_edge_table
@@ -50,7 +50,7 @@ static Gene edge_failure(Gene *gene, int index, Edge *edge_table, int num_gene);
5050
*/
5151

5252
Edge *
53-
alloc_edge_table(int num_gene)
53+
alloc_edge_table(PlannerInfo *root, int num_gene)
5454
{
5555
Edge *edge_table;
5656

@@ -70,7 +70,7 @@ alloc_edge_table(int num_gene)
7070
*
7171
*/
7272
void
73-
free_edge_table(Edge *edge_table)
73+
free_edge_table(PlannerInfo *root, Edge *edge_table)
7474
{
7575
pfree(edge_table);
7676
}
@@ -89,7 +89,8 @@ free_edge_table(Edge *edge_table)
8989
*
9090
*/
9191
float
92-
gimme_edge_table(Gene *tour1, Gene *tour2, int num_gene, Edge *edge_table)
92+
gimme_edge_table(PlannerInfo *root, Gene *tour1, Gene *tour2,
93+
int num_gene, Edge *edge_table)
9394
{
9495
int i,
9596
index1,
@@ -121,11 +122,11 @@ gimme_edge_table(Gene *tour1, Gene *tour2, int num_gene, Edge *edge_table)
121122
* twice per edge
122123
*/
123124

124-
edge_total += gimme_edge(tour1[index1], tour1[index2], edge_table);
125-
gimme_edge(tour1[index2], tour1[index1], edge_table);
125+
edge_total += gimme_edge(root, tour1[index1], tour1[index2], edge_table);
126+
gimme_edge(root, tour1[index2], tour1[index1], edge_table);
126127

127-
edge_total += gimme_edge(tour2[index1], tour2[index2], edge_table);
128-
gimme_edge(tour2[index2], tour2[index1], edge_table);
128+
edge_total += gimme_edge(root, tour2[index1], tour2[index2], edge_table);
129+
gimme_edge(root, tour2[index2], tour2[index1], edge_table);
129130
}
130131

131132
/* return average number of edges per index */
@@ -147,7 +148,7 @@ gimme_edge_table(Gene *tour1, Gene *tour2, int num_gene, Edge *edge_table)
147148
* 0 if edge was already registered and edge_table is unchanged
148149
*/
149150
static int
150-
gimme_edge(Gene gene1, Gene gene2, Edge *edge_table)
151+
gimme_edge(PlannerInfo *root, Gene gene1, Gene gene2, Edge *edge_table)
151152
{
152153
int i;
153154
int edges;
@@ -189,13 +190,13 @@ gimme_edge(Gene gene1, Gene gene2, Edge *edge_table)
189190
*
190191
*/
191192
int
192-
gimme_tour(Edge *edge_table, Gene *new_gene, int num_gene)
193+
gimme_tour(PlannerInfo *root, Edge *edge_table, Gene *new_gene, int num_gene)
193194
{
194195
int i;
195196
int edge_failures = 0;
196197

197-
new_gene[0] = (Gene) geqo_randint(num_gene, 1); /* choose int between 1
198-
* and num_gene */
198+
/* choose int between 1 and num_gene */
199+
new_gene[0] = (Gene) geqo_randint(root, num_gene, 1);
199200

200201
for (i = 1; i < num_gene; i++)
201202
{
@@ -204,18 +205,18 @@ gimme_tour(Edge *edge_table, Gene *new_gene, int num_gene)
204205
* table
205206
*/
206207

207-
remove_gene(new_gene[i - 1], edge_table[(int) new_gene[i - 1]], edge_table);
208+
remove_gene(root, new_gene[i - 1], edge_table[(int) new_gene[i - 1]], edge_table);
208209

209210
/* find destination for the newly entered point */
210211

211212
if (edge_table[new_gene[i - 1]].unused_edges > 0)
212-
new_gene[i] = gimme_gene(edge_table[(int) new_gene[i - 1]], edge_table);
213+
new_gene[i] = gimme_gene(root, edge_table[(int) new_gene[i - 1]], edge_table);
213214

214215
else
215216
{ /* cope with fault */
216217
edge_failures++;
217218

218-
new_gene[i] = edge_failure(new_gene, i - 1, edge_table, num_gene);
219+
new_gene[i] = edge_failure(root, new_gene, i - 1, edge_table, num_gene);
219220
}
220221

221222
/* mark this node as incorporated */
@@ -235,7 +236,7 @@ gimme_tour(Edge *edge_table, Gene *new_gene, int num_gene)
235236
*
236237
*/
237238
static void
238-
remove_gene(Gene gene, Edge edge, Edge *edge_table)
239+
remove_gene(PlannerInfo *root, Gene gene, Edge edge, Edge *edge_table)
239240
{
240241
int i,
241242
j;
@@ -277,7 +278,7 @@ remove_gene(Gene gene, Edge edge, Edge *edge_table)
277278
*
278279
*/
279280
static Gene
280-
gimme_gene(Edge edge, Edge *edge_table)
281+
gimme_gene(PlannerInfo *root, Edge edge, Edge *edge_table)
281282
{
282283
int i;
283284
Gene friend;
@@ -340,7 +341,7 @@ gimme_gene(Edge edge, Edge *edge_table)
340341

341342

342343
/* random decision of the possible candidates to use */
343-
rand_decision = (int) geqo_randint(minimum_count - 1, 0);
344+
rand_decision = geqo_randint(root, minimum_count - 1, 0);
344345

345346

346347
for (i = 0; i < edge.unused_edges; i++)
@@ -368,7 +369,7 @@ gimme_gene(Edge edge, Edge *edge_table)
368369
*
369370
*/
370371
static Gene
371-
edge_failure(Gene *gene, int index, Edge *edge_table, int num_gene)
372+
edge_failure(PlannerInfo *root, Gene *gene, int index, Edge *edge_table, int num_gene)
372373
{
373374
int i;
374375
Gene fail_gene = gene[index];
@@ -401,7 +402,7 @@ edge_failure(Gene *gene, int index, Edge *edge_table, int num_gene)
401402
if (four_count != 0)
402403
{
403404

404-
rand_decision = (int) geqo_randint(four_count - 1, 0);
405+
rand_decision = geqo_randint(root, four_count - 1, 0);
405406

406407
for (i = 1; i <= num_gene; i++)
407408
{
@@ -423,7 +424,7 @@ edge_failure(Gene *gene, int index, Edge *edge_table, int num_gene)
423424
else if (remaining_edges != 0)
424425
{
425426
/* random decision of the gene with remaining edges */
426-
rand_decision = (int) geqo_randint(remaining_edges - 1, 0);
427+
rand_decision = geqo_randint(root, remaining_edges - 1, 0);
427428

428429
for (i = 1; i <= num_gene; i++)
429430
{

0 commit comments

Comments
 (0)