Matching Nuts and Bolts Faster?
Phillip G. Bradford
Rudolf Fleischer
Max-Planck-Institut fur Informatik, Im Stadtwald, 66123 Saarbrucken, Germany.
E-mail: fbradford,rudolfg@mpi-sb.mpg.de.
Abstract. The problem of matching nuts and bolts is the following : Given
a collection of n nuts of distinct sizes and n bolts such that there is a oneto-one correspondence between the nuts and the bolts, nd for each nut
its corresponding bolt. We can only compare nuts to bolts. That is we can
neither compare nuts to nuts, nor bolts to bolts. This humble restriction
on the comparisons appears to make this problem very hard to solve. In
fact, the best deterministic solution to date is due to Alon et al . [1] and
takes (n log4 n) time. Their solution uses (ecient) graph expanders. In
this paper, we give a simpler O(n log2 n) time algorithm which uses only a
simple (and not so ecient) expander.
1
Introduction
In [7], page 293, Rawlins posed the following interesting problem :
We wish to sort a bag of n nuts and n bolts by size in the dark. We can
compare the sizes of a nut and a bolt by attempting to screw one into the
other. This operation tells us that either the nut is bigger than the bolt; the
bolt is bigger than the nut; or they are the same size (and so t together).
Because it is dark we are not allowed to compare nuts directly or bolts
directly.
How many tting operations do we need to sort the nuts and bolts in the
worst case?
As a mathematician (instead of a carpenter) you would probably prefer to see
the problem stated as follows ([1]) :
Given two sets B = fb1 ; : : : ; bng and S = fs1; : : : ; sn g, where B is a set
of n distinct real numbers (representing the sizes of the bolts) and S is
a permutation of B , we wish to nd eciently the unique permutation
2 Sn so that bi = s(i) for all i, based on queries of the form compare
bi and sj . The answer to each such query is either bi > sj or bi = sj or
bi < s j .
? The authors were supported by the ESPRIT Basic Research Actions Program, under
contract No. 7141 (project ALCOM II).
2
The obvious information theoretic lower bound shows that at least (n log n)
comparisons are needed to solve the problem, even for a randomized algorithm. In
fact, there is a simple randomized algorithm which achieves an expected running
time of O(n log n), namely Quicksort : Pick a random nut, nd its matching bolt,
and then split the problem into two subproblems which can be solved recursively,
one consisting of the nuts and bolts smaller than the matched pair and one consisting of the larger ones. The standard analysis of randomized Quicksort gives
the expected running time as stated above (see for example [3]).
Unfortunately, it is much harder to nd an ecient deterministic algorithm.
The only one known to us is the algorithm by Alon et al. [1] which is also based
on Quicksort. To nd a good pivot element which splits the problem into two
subproblems of nearly the same size, they run log n iterations of a procedure which
eliminates half of the nuts in each iteration while maintaining at least one good
pivot; since there is only one nut left in the end, this one must be a good pivot.
This procedure uses the edges of a highly ecient expander of degree (log2 n)
to de ne its comparisons. Therefore, nding a good pivot takes (n log 3 n) time,
and the entire Quicksort takes (n log 4 n) time.
In this paper, we propose a simpler algorithm to nd a good pivot (see Section
3 for details). First, we connect the set of nuts with the set of bolts via some
expander of constant degree and compare each nut to all the bolts to which it
is connected by an edge of the expander. We discard all nuts which are only
connected to smaller bolts or only connected to larger bolts. Then we play a
simple knockout tournament on the remaining nuts (where in each round half of
the nuts are eliminated) which guarantees that the winner of the tournament is a
good pivot. Since we can play each round of the tournament in O(n) time, we can
nd a good pivot in O(n log n) time. Therefore, we can solve the nuts and bolts
matching problem in O(n log 2 n) time.
Alon et al. [1] mention two potential applications of this problem: the rst is
local sorting of nodes in a given graph [4], and the second is selection of read only
memory with a little read/write memory [6].
In the next section, we describe the Quicksort algorithm more formally and
recall some facts about expanders. In Section 3, we show how we can eciently
nd a good pivot. And we conclude with some remarks in Section 4.
2
Basic De nitions
Let S = fs1; : : :; sng be a set of nuts of di erent sizes and B = fb1; : : : ; bng be a
set of corresponding bolts. For a nut s 2 S de ne rank(s) as jft 2 B j s tgj . The
rank of a bolt is de ned similarly. For a constant c < 21 , s is called a c-approximate
median if cn rank(s) (1 ? c)n . Similarly, de ne the relative rank of s with
respect to a subset T B as rankT (s) := jft 2 TjTj js tgj . If T is a multiset
then the relative rank of s with respect to T is de ned analogously, where each
t 2 T is counted according to its multiplicity in T .
3
The algorithm for matching nuts and bolts works as follows.
(1) Find a c-approximate median s of the n given nuts (we will determine
c later).
(2) Find the bolt b corresponding to s.
(3) Compare all nuts to b and all bolts to s. This gives two piles of nuts
(and bolts as well), one with the nuts (bolts) smaller than s and one
with the nuts (bolts) bigger than s.
(4) Run the algorithm recursively on the two piles of the smaller nuts and
bolts and the two piles of the bigger nuts and bolts.
In the next section, we will show how to nd a c-approximate median in
O(n log n) time, where c is a small constant. Then our main result follows immediately.
Theorem 1. We can match n nuts with their corresponding bolts in O(n log n)
2
time.
Proof. The correctness of the algorithm above follows immediately from the correctness of Quicksort. For the running time observe, that each subproblem has size
at most (1?c)n, hence the depth of the recursion is only O(log n), and in each level
of the recursion we spend at most O(n log n) time to compute the c-approximate
median and O(n) time to split the problem into the two subproblems.
tu
We now recall some facts about expanders (see for example [5] if you want
to learn more about expanders). Let 0 < 21 and c < 1. An (n; k; ; c)expander is a k-regular bipartite graph on vertices I (inputs) and O (outputs),
where jI j = jOj = n, such
that everysubset A I of size at most n is joined by
edges to at least jAj 1 + c(1 ? jAn j ) di erent outputs. The constant c is called
the expansion factor of the graph.
Theorem 2 (Alon, Galil, and Milman [2], Cor. 2.3). If n = m for some
2
integer m, then we can construct an (n; 9; 21 ; 0:41)-expander in O(n) time.
Corollary 3. Let 0 < and
= 53(1?5?) . Then there exists an integer q
such that for any n, where n = m2 for some integer m, we can construct an
(n; q ; ; )-expander in O(n) time. In such an expander, any subset of the inputs
of size n is connected to at least 53 n di erent outputs.
1
2
Proof. We take a series of the expanders of Theorem 2 and identify the outputs
O1 of the rst one with the inputs I2 of the second one, the outputs O2 of the
second one with the inputs I3 of the third one, and so on. Then there is an integer
k (independent of n) such that any set of n inputs of I1 is connected to at least
4
di erent outputs of Ok and hence to at least (1 + 0:241 ) n2 > 35 n di erent outputs
of Ok +1. We can easily calculate k by computing the series de ned by a0 :=
and ai+1 := ai (1 + 0:41(1 ? ai)); then k is the smallest index i such that ai 21 .
Hence, to get the desired bipartite graph, we only have to connect each node
v of I1 to all nodes w of Ok +1 which can be reached from v by traversing a path
which uses exactly one edge from each of the k +1 expanders. Then the degree of
any node is clearly at most q := 9k +1. To make the graph q-regular we can add
arbitrary dummy edges without destroying the expansion property. Further, the
expansion factor of the graph is at least because (1 + (1 ? )) n = 53 n (note
that subsets of the inputs of size smaller than n are even better expanded). tu
n
2
3 Finding a c-Approximate Median
Our algorithm to nd a c-approximate median is based on a knockout tournament
played on some subset of the nuts. We start with a subset S1 S of the nuts
where each nut s 2 S1 has a set T1(s) of two bolts associated with it; for all s,
the sets T1(s) need not be disjoint, but every bolt may appear only in a constant
number of them. We describe later how S1 is constructed.
We then play dlog jS1je rounds of the tournament, where in each round half the
nuts survive for the next one. Intuitively, we take any two nuts together with their
sets of associated bolts, determine which nut splits the union of both sets of bolts
less equally, eliminate that nut, and give both sets of bolts to the surviving nut.
Unfortunately, pairing the nuts arbitrarily does not quite work, i.e., the winner of
the tournament would not necessarily be a c-approximate median, but there is a
simple way to overcome that diculty.
In general, let Si be the set of nuts before we start round i. For each nut s 2 Si
let Ti(s) be the multiset of bolts associated with s and let ri(s) := rankTi (s)(s) be
the relative rank of s with respect to its set of bolts Ti(s). Let Si1 := fs 2 Si j
ri (s) 21 g and Si2 := fs 2 Si j ri (s) < 21 g.
We play the knockout tournament as follows.
i := 1;
while jSi j > 2 do
(1) Pair the nuts of Si1 arbitrarily. If jSi1j is odd then we eliminate the
single nut without a partner.
(2) Let (s1; s2) be a pair of nuts from Si1. Compute the relative ranks
of s1 and s2 with respect to the multiset T := Ti(s1) [ Ti(s2). Note
that it is sucient to compare s1 with all bolts in Ti(s2) and s2 with
all bolts in Ti(s1), because rankT (sj ) = 21 (ri(sj ) + rankTi (s3?j )(sj )),
for j = 1; 2 (here we use Observation 4 (c)).
Whichever nut s has relative rank closer to 12 survives in Si+1 and is
associated with the multiset Ti+1(s) := T .
5
(3) Repeat steps (1) and (2) with Si2 instead of Si1.
od
Let l be the value of i after the while-loop terminates, i.e., jSlj 2. We
claim that if S1 was suciently large then every nut in Sl is a c-approximate
median, where c is a small constant (see Lemma 5). But rst we make a few
simple observations.
Observation 4. Assume we play the tournament starting with some set
S1 of
nuts. Then
(a) dlog jS1je ? 1 l dlog jS1 je.
(b) Sl 6= ;.
(c) For i = 1; : : : ; l and all s 2 Si, jTi (s)j = 2i . In particular, jTl(s)j jS21j for all
s 2 Sl .
(d) Each round needs O(jS1 j) time.
Proof.
(a) In each round, we eliminate half of the nuts which could be paired, and at most
two unpaired nuts, i.e., jSi+1j jS 2j?2 . We stop if at most two nuts remain. It
is easy to show by induction on jS1j that l must be at least log(jS1j + 2) ? 1.
This proves the rst inequality.
The second inequality follows directly from jSi+1j jS2 j .
(b) We never eliminate all nuts.
(c) By induction on i.
(d) Observe that in each round, every bolt is involved in at most one comparison
(in step (2)). Since there are a total of 2jS1j bolts in the rst round and we
never let additional bolts enter the game, we do at most 2jS1j comparisons
in each round. Further, pairing the nuts, computing the relative ranks, and
merging two multisets of bolts do not increase the asymptotic complexity. tu
i
i
Lemma 5. Let S1 S and = jSn j . Suppose, each nut s 2 S1 lies between the two
bolts in T1(s), i.e., blow (s) < s < bhigh(s) if T1(s) = fblow (s); bhigh(s)g, and every
bolt appears at most q times in the sets T1(s). Then any s 2 Sl is a c-approximate
1
median, where c = 8q .
Proof. Before the rst round, we have r1 (s) = 12 for all s 2 S1, and hence 41
r2 (s) 43 for all s 2 S2. We now prove by induction, that this inequality holds
after each round.
Assume we know that 41 ri(s) 34 for all s 2 Si. Let (s1; s2) be a pair from
Si1 , where w.l.o.g. s1 < s2. Let T be the multiset Ti (s1) [ Ti (s2). Since s1 is larger
6
than half of the bolts in Ti(s1), it must be larger than a quarter of the bolts in T .
On the other hand, it is smaller than a quarter of the bolts in Ti(s1) and smaller
than a quarter of the bolts in Ti(s2) (because it is smaller than s2); hence it is
smaller than a quarter of the bolts in T . Therefore, the inequality holds for s1,
and we only eliminate s1 if the relative rank of s2 with respect to T is even closer
to 12 .
Let s 2 Sl. Since Tl(s) contains at least 2n bolts by Observation 4 (c), we
conclude from the inequality above that s is larger than 8n bolts and smaller
than another 8n bolts. Since each bolt may have up to q copies in Tl(s), 8nq
rank (s) (1 ?
tu
8q )n, i.e., s is a 8q -approximate median.
Now we can give our algorithm to nd a c-approximate median.
p
(1) If n is not the square of an integer, then add at most 2 n dummy
nuts and bolts to make n the square of an integer.
(2) Let B = I and S = O be the sets of vertices of the (n; q; 201 ; 220
19 )expander of Corollary 3. We compare each bolt with all the nuts to
which it is connected by an edge of the expander.
Let S1 be the set of nuts which are compared to at least one smaller
bolt and at least one larger bolt. For all s 2 S1, pick arbitrarily one
of the smaller and one of the larger bolts and put them into the set
T1 (s).
(3) Now play the knockout tournament starting with S1. Let Sl be the set
of (at most two) winners of the tournament. Choose any s 2 Sl as a
good pivot.
Theorem 6. This algorithm computes in ( log ) time a -approximate meO n
n
dian, where c = 801q is a small constant not depending on n.
c
Proof. Construction of the expander and hence of set S1 takes O(n) time (note
that enlarging n slightly in step(1) does not increase the asymptotic complexity of
the following steps). And the tournament takes O(jS1j log jS1j) = O(n log n) time
by Observation 4.
It remains to show that we really compute a c-approximate median for some
constant c. First, observe that jS1j 10n . To see this, let B1 be the set of bolts
with rank at most 20n and B2 be the set of bolts with rank at least 19
20 n. Since the
n
3
expander connects any subset of B of size 20 to at least 5 n di erent nuts in S ,
there must be a set of at least n5 nuts which are connected to bolts in both B1
and B2. But then at least n5 ? 2 20n = 10n of the nuts must have rank between 20n
and 19
20 n, which means they are in the set S1 .
7
Next, observe that each bolt appears in at most q of the sets T1(s) because the
expander connects each bolt to exactly q nuts. Hence by Lemma 5, any s 2 Sl is
a c-approximate median, where c = 801q .
tu
4
Conclusions
We have presented an O(n log 2 n) time deterministic algorithm for matching nuts
and bolts. This improves the rst O(n(log n)O(1))-time solution of this problem,
given by Alon et al.[1], by a factor of log2 n. As already mentioned in [1], the
methods described in this (and their) paper seem not to be sucient to reduce
the complexity below O(n log 2 n).
Unfortunately, the constants hidden in our O-notation are incredibly large (far
beyond the constant 108 in [1]). This is mainly due to the iterative construction
in Corollary 3 which produces an expander of enormous, but still constant, degree
(a short calculation shows that the degree is q 201 = 99 because we must build
the expander from 9 copies of the simple expander of Theorem2). On the other
hand, the standard counting argument shows that there is a family of bipartite
24-regular graphs on 2n nodes which connect any subset of the inputs of size n4
to at least 87 n di erent outputs. Using these graphs in the construction of the set
S1 would give us an algorithm with fairly reasonable constants. However, we do
not know an explicit construction for them. But it is clear that any improvement
to our Corollary 3 can drastically reduce the running time of our algorithm (and
make it more practical).
References
1. N. Alon, M. Blum, A. Fiat, S. Kannan, M. Naor and R. Ostrovsky. Matching nuts
and bolts. Proceedings of the 5th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'94), 1994, pp. 690{696.
2. N. Alon, Z. Galil and V.D. Milman. Better expanders and superconcentrators. Journal of Algorithms 8 (1987), pp. 337{347.
3. T.H. Cormen, C.E. Leiserson and R.L. Rivest. Introduction to algorithms. MIT
Press, 1990.
4. W. Goddard, C. Kenyon, V. King and L. Schulman. Optimal randomized algorithms
for local sorting and set-maxima. SIAM Journal on Computing 22, 1993, pp. 272{
283.
5. A. Lubotzky. Discrete groups, expanding graphs and invariant measures. Birkhauser
Verlag, 1994.
6. J.I. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical
Computer Science 12, 1980, pp. 315{323.
7. G.J.E. Rawlins. Compared to what ? An introduction to the analysis of algorithms.
Computer Science Press, 1992.