A Multilevel Algorithm For Partitioning
A Multilevel Algorithm For Partitioning
A Multilevel Algorithm For Partitioning
2
As mentioned, the graph partitioning problem is NP-complete, and we therefore
should not expect to solve it in polynomial time. We instead seek an approximate
solution with acceptable execution time. One common approach to the general problem
is to reduce it to a sequence of bisection steps. That is, rst divide the graph into
two pieces, then recursively bisect the two subpieces independently. This approach
can generate an arbitrary number of equal sized sets if the bisections aren't required
to divide evenly. For example, if nine sets are desired, the rst cut should divide into
pieces of size 4/9 and 5/9, and then recursively divide the two pieces until there are nine
total. Unfortunately, graph bisection is also NP-complete, so this simpli cation doesn't
resolve the intractability of the original problem. Also, this simpli cation has several
inherent shortcomings. First, bisection algorithms are unable to accept a less attractive
initial cut that will lead to net savings in later cuts. Second, the application may require
the minimization of a more complicated function in which edges between some sets are
more costly than edges between others. For example, in parallel computing the overhead
due to a message may depend upon how far apart the communicating processors are.
Bisection algorithms have a dicult time minimizing this kind of metric. In fact, graphs
can be constructed on which an optimal bisection algorithm will do badly. All other
things being equal, it is preferable to divide into as many sets at once as possible so as
to limit the depth of the recursion. Our algorithm allows for this generality.
As mentioned above, there are three di erent stages to our multilevel graph parti-
tioning algorithm. First, a sequence of smaller and smaller graphs is created from the
original graph. Second, the smallest graph in the sequence is partitioned carefully. And
third, the partition is propagated back through the sequence of grids, with an occasional
local re nement. This structure is sketched in Fig. 2, and di erent stages are discussed
in detail below. The algorithm due to Bui and Jones [2, 15] has a similar structure, but
di ers in the details. In particular, unlike their algorithm, ours uses edge and vertex
weights to preserve in the coarse graphs as much structure of the original graph as
possible (see detailed explanation below). This permits application of the algorithm
to weighted graphs and natural extensions to other contexts, for example that of the
terminal propagation technique used in VLSI layout [4, 14].
Coarsening is advantageous because is that the number of possible partitions grows
exponentially with the number of vertices in the graph, so a good partition of the coarse
graph is much easier to nd than a good partition of the original one. The price paid
for this reduction in complexity is that only a small number of the possible ne graph
partitions are represented and are therefore examinable on the coarse graph. However,
by working on di erent levels, the local re nement scheme improves the partition on
multiple scales. Although the partition of the coarse graph may not directly induce a
very good partition of the original graph, the repeated application of the local re nement
largely resolves this problem.
1 The use of edge and vertex weights can be important in some applications. For example, in circuit
layouts weighted graphs are often used to model net lists, and in parallel computing they can model
inhomogeneous computation and communication.
3
(1) Until graph is small enough
graph := coarsen(graph)
(2) Partition graph
(3) Until graph = original graph
graph := uncoarsen(graph)
partition := uncoarsen(partition)
locally re ne partition if desired
Fig. 2. A multilevel algorithm for graph partitioning.
2.1. Constructing a coarse graph. The fundamental step in our coarsening
scheme is the edge contraction operation. In this step, two vertices joined by an edge
are merged, and the new vertex retains edges to the union of the neighbors of the
merged vertices. The weight of the new vertex is set equal to the sum of the weights of
its constituent vertices. Edge weights are left unchanged unless both merged vertices
are adjacent to the same neighbor. In this case the new edge that represents the two
original edges is given a weight equal to the sum of the weights of the two edges it
replaces. So, for example, contracting one edge of a triangle with unit edge and vertex
weights would produce a single edge of weight two joining vertices of weight one and
two.
A coarsening step in our algorithm requires the contraction of a large number of
edges that are well dispersed throughout the graph. This can be accomplished by rst
nding a maximal matching. This is a maximal set of edges, no two of which are incident
on the same vertex. Maximal matchings can be generated quickly using a depth- rst
search or a randomized algorithm. In our maximal matching, the vertices were visited in
a random order, and if not already matched, a random unmatched neighbor was selected
and the edge included in the maximal matching. This simple algorithm requires time
proportional to the number of edges in the graph.
Our algorithm for graph coarsening can now be sketched in Fig. 3. First a maximal
matching in the graph is identi ed. Each edge in the matching is then contracted, and
vertex and edge weights are adjusted as previously described. This coarsening procedure
has the property that each vertex in the ne graph is mapped to a unique vertex in the
coarse graph.
This coarsening procedure has a number of attractive properties. First any partition
of the coarse graph corresponds naturally to a partition of the ne graph. Second, since
vertex weights are summed, constraints on the set sizes that depend just on the number
of vertices in a set are preserved in a weighted sense in the coarse graph. Third, since
edge weights are combined, any linear penalty function on the edges crossing between
partitions will be preserved as a weighted metric under the coarsening process. Thus for
most metrics of partition quality, a good partition of the coarse graph will correspond
to a good partition of the ne graph. Lastly, this coarsening algorithm is fast. The
4
Find a maximal matching in the graph
For Each matching edge (i; j )
Contract edge to form new vertex v
Vertex weight(v) := weight(i) + weight(j )
If i and j are both adjacent to a vertex k Then
Edge weight(v; k) := weight(i; k) + weight(j; k)
Fig. 3. Constructing a coarse approximation to a graph.
entire operation can be implemented in time proportional to the number of edges in the
original graph.
2.2. Partitioning the coarsest graph. The details of the partitioning of the
coarsest graph are not central to the algorithm, so we will not dwell on them. However,
it is important to note that the partitioning algorithm must be able to handle edge and
vertex weights, even if the original graph is unweighted. Our implementation uses a
spectral partitioner for the coarse graph which requires one, two or three eigenvectors of
the Laplacian matrix of a graph to partition into two, four or eight sets respectively [13].
This is followed by an invocation of the Kernighan{Lin re nement algorithm described
in x3 which can re ne a partition into an arbitrary number of sets with any inter{set
cost metric. By dividing into more than two sets at once, the algorithm can attempt
to minimize more complicated cost functions. For example, in parallel computing it
is generally desirable to keep message trac between architecturally close processors.
This consideration can't easily be included in a recursive bisection approach, but it
can be addressed when a greater amount of partitioning is performed at once. In our
experience, virtually any initial partitioner followed by Kernighan{Lin works well on
small graphs, so the use of spectral partitioning is not essential to the overall algorithm.
2.3. Uncoarsening the partition. The uncoarsening of a partition is trivial.
Each vertex in a coarse graph is simply the union of one or two vertices from a larger
graph. We simply assign a vertex from the larger graph to the same set as its coarse
graph counterpart. Since the weight of the coarse vertex was the sum of the weights of
its constituents, the uncoarsening procedure preserves the sum of the vertex weights in
each set. And it similarly preserves the sum of edge weights connecting di erent sets.
The uncoarsened graph has more degrees of freedom than the smaller coarse graph.
Consequently, the best partition for the coarse graph may not be optimal for its un-
coarsened counterpart. We therefore apply the local re nement scheme described in the
next section to the uncoarsened partition.
3. Locally re ning the partition. For our multilevel graph partitioning algo-
rithm to be useful it must incorporate a local re nement scheme that is both fast and
e ective. In this section we describe our implementation of one such method, a general-
ization of the graph bisection algorithm originally due to Kernighan and Lin (KL) [17].
This algorithm is quite good at nding locally optimal answers, but unless it is initial-
5
ized with a good global partition, the resulting local optimum can be far from the best
possible. However, since the multilevel nature of our algorithm allows the re nement
technique to work on di erent scales, the local nesse of KL is all we require.
Several improvements to the original KL algorithm have been developed over the
years, including the important linear time implementation described by Fiduccia and
Mattheyses (FM) [5]. Our implementation is closely patterned after the algorithm in [5],
but is generalized and optimized in several ways. First, we can re ne a partition into
an arbitrary number of sets, rather than just two sets. A generalization to four sets is
described by Suaris and Kedem for circuit layout problems [22], and we have extended
their idea. Second, we allow for integer weights on edges and vertices (Fiduccia and
Mattheyses allow for weighted vertices). Third, we can handle an arbitrary (integral)
inter{set cost metric. This can be important in mapping parallel computations since
the cost of sending a message between two processors may depend on how near these
processors are in the machine. Fourth, we include an element of randomness in the
algorithm which improves robustness. Fifth, when only a small fraction of the vertices
are moved, we are able to reuse computations in a manner that signi cantly reduces
the overall run time of the algorithm. And sixth, we use a lazy evaluation strategy
to minimize the number of gain values computed, and thereby signi cantly reduce run
time. These generalizations complicate the implementation in a variety of ways which
are discussed in detail below.
The fundamental idea behind all KL{like algorithms is the concept of the gain
associated with moving a vertex into a di erent set. The gain is simply the net reduction
in the weight of cut edges that would result from switching a vertex to a di erent set.
More formally, if vertex i were to move from set l to set k, the corresponding gain gk (i)
can be expressed as
8> wij ckl if P (j ) = k
X <
(1) k
g (i) = > ?wij ckl if P (j ) = l
(i;j )2E : wij (clm ? ckm ) if P (j ) = m; m 6= k; m 6= l;
where P (j ) is the current set for vertex j , wij is the weight of the edge between vertices
i and j , and clk is the symmetric inter{set cost metric for an edge between sets l and k.
Clearly, if the gain associated with moving a vertex is positive, then making that
move will reduce the total cost of the edges cut in the partition. We could simply make
all such moves and see what partition results, but this approach has two shortcomings.
First, we need to enforce some constraints on how large the di erence in set sizes can
become and hence may disallow certain moves. Second, a purely greedy strategy that
stops when no single allowed move improves the partition may miss more complicated
sequences of moves that lead to a net improvement. Hence the algorithm continues
for some time to consider moves that actually make the partition worse. The basic
structure of our algorithm is sketched in Fig. 4.
The algorithm consists of two nested loops. The inner loop presides over a se-
quence of moves of vertices from one set to another. The outer loop continues allowing
attempted sequences until no further improvement is detected. We will refer to a single
6
Until No better partition is discovered
Best Partition := Current Partition
Compute all initial gains
Until Termination criteria reached
Select vertex to move
Perform move
Update gains of all neighbors of moved vertex
If Current Partition balanced and better than Best Partition Then
Best Partition := Current Partition
End Until
Current Partition := Best Partition
End Until
Fig. 4. An algorithm for re ning graph partitions.
iteration of the outer loop as a pass of the algorithm. At each step in the inner loop, a
vertex is selected to be moved from one set to another. To avoid the possibility of an
in nite loop we insist that a vertex may be moved at most once within a single pass.
The move that is selected is the one with the largest gain, subject to some balance
constraints. It is important to note that this gain may be negative; that is, the current
move may make the partition worse. However, several moves that reduce the partition
quality may lead to later moves that more than compensate for the initial regression.
This ability to climb out of local minima is a crucial feature of the KL algorithm.
When a vertex is moved, the gains of all its neighbors must be modi ed. Then an-
other move is selected and the process repeated. The best partition that is encountered
in this sequence is recorded and the data is moved into that con guration prior to the
start of the next pass of the algorithm. In practice, the number of passes required is
quite small so the run time of the algorithm consists of two terms, a startup cost and
a cost proportional to the execution time of a single pass through the inner loop.
Another important detail is the data structure used for representing gains. The
key di erence between the KL and FM algorithms is that Fiduccia and Mattheyses use
a more complicated data structure that allows a single pass through the outer loop to
be performed in time proportional to the number of edges in the graph. The basic idea
is to bucket sort the gains associated with all possible moves. Each bucket consists of
a doubly linked list of moves with a particular gain value. Selecting a move with the
highest gain simply requires nding the highest ranked nonempty bucket. The gains of
neighbor vertices are updated by moving the appropriate markers from one bucket to
another, which can be done in constant time.
Up to this point, the discussion describes Fiduccia and Mattheyses' algorithm.
Our algorithm extends the data structure from FM to allow for an arbitrary number
of sets. If there are s sets, n vertices and m edges, then there are s(s ? 1) di erent
types of moves. We maintain a bucket sorted data structure for each of these move
types. Since each vertex can move to any of s ? 1 other sets, this requires memory of
7
size (n(s ? 1)). Before entering the innermost loop, gain values are computed and
placed in the appropriate doubly linked lists comprising the di erent buckets. If all the
gains are computed, the cost of this operation is O((s ? 1)n + m). However, using the
tricks described in x3.4 and x3.5, only a fraction of the gains need be computed. If the
innermost loop continues through all vertices, move selection requires time O(s(s ? 1)n),
and the remainder of the inner loop requires O(sm) time. However, for large problems
the innermost loop always ejects early due to the termination criteria discussed in x3.2
below. Hence, the execution time of KL can be sublinear in the size of the graph, and
for large problems, the performance of the overall algorithm is limited by the time spent
constructing the sequence of coarse graphs.
3.1. Move selection. To select the next move, the algorithm nds the top bucket
for all the s(s ? 1) di erent types of moves. Rather than simply choosing whichever of
these moves has the highest gain, the algorithm also considers the e ect of each move
on set balance2. Balance is encouraged by considering only moves from sets of at least
average size to those that are at or below average size. This does not guarantee that a
balanced partition will result, but in practice it seems to work quite well.
We note that move selection can be easily modi ed to allow for nonequal set sizes.
To do so, in the selection of a move, we simply compare the size of a set to its desired
size, which may not be the same as the desired size of the other sets. Moves are then
allowed only from a set at least as large as desired to a set that is at least as small as
desired.
3.2. Termination criteria. Our implementation has two types of termination
criteria. The most obvious condition is when there are no possible moves from large
to small sets. The second condition allows for early termination when further improve-
ment seems unlikely. This is particularly important in the context of our multilevel
partitioner since we expect the local re nement algorithm to make only small changes
in the partition. In the second condition, we monitor the di erence in quality between
the best partition yet encountered and the current partition. When this di erence be-
comes large the inner loop is exited. A discussion of other possible termination criteria
can be found in [7].
3.3. Randomization. Since there are typically many di erent vertex moves that
have the same gain value, the manner in which ties are broken can signi cantly a ect
the results. Our implementation avoids this issue by using randomization. The initial
gain values of the vertices are computed in random order. When a value is updated,
the corresponding vertex is placed at the front of the list of vertices with its gain value.
Thus, collections of adjacent vertices are likely to be considered together, which is in
keeping with the spirit of the overall algorithm. Also, with randomization a failure
in the outermost loop to nd any better partition does not necessarily mean that no
2 We consider a partition to be balanced if the di erence between the largest and smallest set sizes
is no larger than the largest vertex weight.
8
further improvement is possible. Another pass through the algorithm may manage to
nd a better partition.
3.4. Avoiding recomputing gains. Our multilevel partitioning algorithm typ-
ically provides a fairly good partition to the re nement algorithm, so only a small
number of vertices must be moved. In this case, the cuto s typically come into play
quite quickly and for large problems the cost of the algorithm is dominated by the initial
calculation of gain values. Although these values need to be computed once, a simple
observation alleviates the need to recompute and sort most of them. The only vertices
whose gain values are modi ed in a single pass of the algorithm are (1) the vertices
that are selected to be moved and (2) the neighbors of these vertices. All other vertices
remain una ected. By reconsidering only the necessary vertices, the full bucket sort is
performed just once rather than on every pass through the outer loop. In the absence
of the lazy evaluation technique described in x3.5, this simpli cation reduces the overall
run time of the multilevel algorithm by 30% to 95%.
3.5. Lazy initialization. As observed by several other researchers [25, 16, 26],
when initialized with a good partition a local re nement algorithm need only move
vertices that are on or near the initial set boundaries. In principle, the algorithm need
never compute gain values for most of the vertices since they aren't near a boundary.
Unfortunately, exploiting this observation requires determining which vertices are on
the boundary, a task which is generally not much cheaper than computing all the gains.
However, if these vertices are already known, a signi cant reduction in run time is
possible.
Fortunately, in the multilevel algorithm the identity of vertices near the boundary
can be easily propagated between levels during the uncoarsening. On the smallest graph
all vertices are treated as boundary vertices. After each invocation of KL the vertices on
the boundary are determined. The uncoarsened counterparts of these boundary vertices
are the only vertices initialized in the next call to KL.
When running KL in this way, it is possible that a vertex which was not initialized
should be selected for movement. To deal with this possibility we use a lazy evaluation
strategy in which uninitialized vertices are initialized on an as{needed basis. Speci cally,
when we select a vertex to move, we try to update the gain values of all its neighbors. If a
neighbor is uninitialized, we activate and initialize it then. Since for large graphs only a
small fraction of the vertices are involved in a pass of KL, this avoidance of unnecessary
calculation results in a signi cant performance improvement. Similar strategies are
utilized by Walshaw, et al. [26] and Karypis and Kumar [16].
4. Results and evaluation. We have implemented the algorithm described in
the previous sections in the C programming language as part of the Chaco 2.0 graph
partitioning code [12]. We ran our code on several graphs corresponding to computa-
tional grids for large scienti c computing problems. Vertices of these grids represent
computation, while edges indicate data dependencies. To perform these scienti c com-
putations on a parallel machine, the grids must be partitioned into pieces which are
assigned to processors. For optimal performance, each processor should have the same
9
number of vertices to ensure that the work load is balanced, and since edges crossing
between sets represent communication, the number of these cross edges must be kept
small.
We compared the run times and partitions produced by our multilevel algorithm
against several other partitioning strategies that are popular in the parallel computing
community and also implemented in Chaco 2.0. The rst competing algorithm is
inertial bisection, descriptions of which can be found in [21, 27]. This approach ignores
the graph, and instead uses geometric information which is typically available for meshes
used in scienti c and engineering computations. Each node is assigned a unit weight,
and the principle axis of the resulting structure is determined. The grid is then divided
by a plane orthogonal to this axis. The intuition behind this approach is that the
principle axis is typically a direction in which the grid is elongated, so a cut orthogonal
to this direction will hopefully cut the grid at a narrow point. The inertial bisection
algorithm can be implemented to run in linear time.
The second competing algorithm is spectral bisection. In this method a con-
strained quadratic minimization problem is formulated which corresponds approxi-
mately to nding a partitioning which retains balance and has a small number of cross
edges. This minimization is cast in terms of the Laplacian matrix of the graph and is
solved by an eigenvector of the Laplacian. The eigenvector de nes a partitioning with
the desired properties. More detailed descriptions of spectral bisection can be found
in [10, 11, 13, 19, 21].
Both the inertial and spectral partitioning methods can be coupled to advantage
with a local re nement scheme like Kernighan{Lin. These coupled methods provide the
third and fourth alternatives we compare against. All of these methods are included in
Chaco 2.0 [12], making comparison convenient. A similar coupling of global and local
methods is described in [25].
We have run these algorithms on a variety of graphs, and present results here for
three that are typical of what we have generally observed. The rst graph is a small two
dimensional nite element mesh of space around an airfoil, which is due to Hammond
and can be obtained from RIACS at NASA Ames via anonymous ftp [8]. The second
graph, known as Barth5 is a similar but larger mesh, and can be obtained in the same
way. The third graph is a fairly large nite di erence grid of the Earth's oceans provided
by Swisshelm [23].
We recursively divided each graph through six levels of bisection, producing 64 sets.
Results of running the di erent partitioning algorithms on these grids are presented in
the following tables. The run times shown on the last row are the times for the entire
division into 64 sets. All calculations were performed on a Sun Sparc Station 20 with a
50 MHz clock and 64 Mbytes of memory.
Based on these and similar experiments, we make several observations:
The local re nement scheme signi cantly improves the partitions generated
by both the inertial and spectral partitioners and has modest cost. In fact
application of KL may actually reduce net run time (see the spectral column
of the ocean or hammond tables) because in the process of locally re ning at
10
Table 1. Performance of partitioning algorithms on Hammond mesh.
(jV j = 4720, jE j = 13722)
EDGES CUT
Inertial Spectral Multi{Level
Alone With KL Alone With KL
2 sets 209 117 117 97 119
4 sets 559 312 262 227 236
8 sets 880 486 475 388 393
16 sets 1342 727 769 668 652
32 sets 1791 1152 1219 1121 1065
64 sets 2392 1732 1885 1753 1688
Time (sec) 0.24 1.35 9.32 8.64 3.44
Table 2. Performance of partitioning algorithms on Barth5 mesh. (jV j =
15606, jE j = 45878)
EDGES CUT
Inertial Spectral Multi{Level
Alone With KL Alone With KL
2 sets 245 190 169 146 196
4 sets 897 425 527 413 412
8 sets 1441 878 866 699 648
16 sets 2267 1382 1432 1220 1118
32 sets 3138 1988 2163 1869 1779
64 sets 4257 3168 3273 2893 2906
Time (sec) 0.74 3.30 23.7 28.6 6.17
one recursion level, subgraphs are produced which are more eciently handled
at the next level of recursion, i.e. local re nement e ectively preconditions the
problem.
The inertial approach is quite fast, but produces low quality partitions. How-
ever, when combined with a good local re nement technique, the combination
often produces fairly good answers quite quickly. The partitions thus produced
are generally better than those from spectral bisection; that is, the combination
of simple global and local methods is usually superior to a sophisticated global
method. However, the inertial algorithm must be used with caution since it
sometimes produces signi cantly worse partitions, eg. bisection of the ocean
graph.
Spectral bisection generally produces better partitions than inertial bisection,
albeit at a substantial increase in run time. This improvement remains after
each method is coupled with a local re nement.
Our multilevel method takes somewhat longer than inertial combined with
KL, but it generally produces partitions similar in quality to those generated
11
Table 3. Performance of partitioning algorithms on ocean mesh. (jV j =
143437, jE j = 409593)
EDGES CUT
Inertial Spectral Multi{Level
Alone With KL Alone With KL
2 sets 2932 2576 660 533 499
4 sets 5607 4674 2392 2107 1991
8 sets 8695 7479 5264 4763 4743
16 sets 13240 11245 10204 9158 9160
32 sets 19292 16604 16348 14624 14628
64 sets 29230 24741 25547 22410 22343
Time (sec) 4.67 26.1 392.6 316.0 38.0
13
[10] B. Hendrickson and R. Leland, An improved spectral load balancing method, in Proc. 6th
SIAM Conf. Parallel Processing for Scienti c Computing, SIAM, 1993, pp. 953{961.
[11] , Multidimensional spectral load balancing, Tech. Rep. SAND93{0074, Sandia National Lab-
oratories, Albuquerque, NM, January 1993.
[12] , The Chaco user's guide, version 2.0, Tech. Rep. SAND94{2692, Sandia National Labora-
tories, Albuquerque, NM, October 1994.
[13] , An improved spectral graph partitioning algorithm for mapping parallel computations, SIAM
J. Sci. Comput., 16 (1995).
[14] B. Hendrickson, R. Leland, and R. Van Driessche, Enhancing data locality by using
terminal propagation, in Proc. 29th Hawaii Conf. System Science, January 1996. To appear.
[15] C. A. Jones, Vertex and Edge Partitions of Graphs, PhD thesis, Penn State, Dept. Computer
Science, State College, PA, 1992.
[16] G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irreg-
ular graphs, Tech. Rep. CORR 95{035, University of Minnesota, Dept. Computer Science,
Minneapolis, MN, June 1995.
[17] B. Kernighan and S. Lin, An ecient heuristic procedure for partitioning graphs, Bell System
Technical Journal, 29 (1970), pp. 291{307.
[18] R. Ponnusamy, N. Mansour, A. Choudhary, and G. C. Fox, Graph contraction and
physical optimization methods: a quality{cost tradeo for mapping data on parallel computers,
in Intl. Conf. Supercomputing, Tokyo, Japan, July 1993.
[19] A. Pothen, H. Simon, and K. Liou, Partitioning sparse matrices with eigenvectors of graphs,
SIAM J. Matrix Anal., 11 (1990), pp. 430{452.
[20] J. E. Savage and M. G. Wloka, Parallelism in graph{partitioning, J. Par. Dist. Comput., 13
(1991), pp. 257{272.
[21] H. D. Simon, Partitioning of unstructured problems for parallel processing, in Proc. Conference
on Parallel Methods on Large Scale Structural Analysis and Physics Applications, Pergammon
Press, 1991.
[22] P. Suaris and G. Kedem, An algorithm for quadrisection and its application to standard cell
placement, IEEE Trans. Circuits and Systems, 35 (1988), pp. 294{303.
[23] J. Swisshelm. Personal Communication, February 1992.
[24] D. Vanderstraeten, C. Farhat, P. S. Chen, R. Keunings, and O. Zone, A retro t based
methodology for the fast generation and optimization of mesh partitions: beyond the minimum
interface size criteria, Comput. Meths. Appl. Mech. Eng., (1995). To appear.
[25] D. Vanderstraeten, R. Keunings, and C. Farhat, Optimization of mesh partitions and
impact on parallel cfd, in Parallel Computational Fluid Dynamics, New Trends and Advances,
A. Ecer, J. Hauser, P. Leca, and J. Periaux, eds., Elsevier, 1995, pp. 233{239. (Also in Proc.
Parallel CFD '93).
[26] C. Walshaw, M. Cross, and M. Everett, A parallelisable algorithm for optimising unstruc-
tured mesh partitions, Tech. Rep. 95/IM/03, University of Greenwich, London SE18 6PF,
UK, 1995. (submitted for publication).
[27] R. Williams, Performance of dynamic load balancing algorithms for unstructured mesh calcu-
lations, Concurrency, 3 (1991), pp. 457{481.
14