Academia.eduAcademia.edu

Matching Nuts and Bolts Faster

1995

The problem of matching nuts and bolts is the following: Given a collection of n nuts of distinct sizes and n bolts such that there is a one-to-one correspondence between the nuts and the bolts, find for each nut its corresponding bolt. We can only compare nuts to bolts. That is we can neither compare nuts to nuts, nor bolts to bolts. This humble restriction on the comparisons appears to make this problem very hard to solve. In fact, the best deterministic solution published to date is due to Alon et al. [2] and takes θ(n log4 n) time. Their solution uses (efficient) graph expanders. In this paper, we give a simpler O(n log2 n) time algorithm which uses only a simple (and not so efficient) expander.

Matching Nuts and Bolts Faster? Phillip G. Bradford Rudolf Fleischer Max-Planck-Institut fur Informatik, Im Stadtwald, 66123 Saarbrucken, Germany. E-mail: fbradford,rudolfg@mpi-sb.mpg.de. Abstract. The problem of matching nuts and bolts is the following : Given a collection of n nuts of distinct sizes and n bolts such that there is a oneto-one correspondence between the nuts and the bolts, nd for each nut its corresponding bolt. We can only compare nuts to bolts. That is we can neither compare nuts to nuts, nor bolts to bolts. This humble restriction on the comparisons appears to make this problem very hard to solve. In fact, the best deterministic solution to date is due to Alon et al . [1] and takes (n log4 n) time. Their solution uses (ecient) graph expanders. In this paper, we give a simpler O(n log2 n) time algorithm which uses only a simple (and not so ecient) expander. 1 Introduction In [7], page 293, Rawlins posed the following interesting problem : We wish to sort a bag of n nuts and n bolts by size in the dark. We can compare the sizes of a nut and a bolt by attempting to screw one into the other. This operation tells us that either the nut is bigger than the bolt; the bolt is bigger than the nut; or they are the same size (and so t together). Because it is dark we are not allowed to compare nuts directly or bolts directly. How many tting operations do we need to sort the nuts and bolts in the worst case? As a mathematician (instead of a carpenter) you would probably prefer to see the problem stated as follows ([1]) : Given two sets B = fb1 ; : : : ; bng and S = fs1; : : : ; sn g, where B is a set of n distinct real numbers (representing the sizes of the bolts) and S is a permutation of B , we wish to nd eciently the unique permutation  2 Sn so that bi = s(i) for all i, based on queries of the form compare bi and sj . The answer to each such query is either bi > sj or bi = sj or bi < s j . ? The authors were supported by the ESPRIT Basic Research Actions Program, under contract No. 7141 (project ALCOM II). 2 The obvious information theoretic lower bound shows that at least (n log n) comparisons are needed to solve the problem, even for a randomized algorithm. In fact, there is a simple randomized algorithm which achieves an expected running time of O(n log n), namely Quicksort : Pick a random nut, nd its matching bolt, and then split the problem into two subproblems which can be solved recursively, one consisting of the nuts and bolts smaller than the matched pair and one consisting of the larger ones. The standard analysis of randomized Quicksort gives the expected running time as stated above (see for example [3]). Unfortunately, it is much harder to nd an ecient deterministic algorithm. The only one known to us is the algorithm by Alon et al. [1] which is also based on Quicksort. To nd a good pivot element which splits the problem into two subproblems of nearly the same size, they run log n iterations of a procedure which eliminates half of the nuts in each iteration while maintaining at least one good pivot; since there is only one nut left in the end, this one must be a good pivot. This procedure uses the edges of a highly ecient expander of degree (log2 n) to de ne its comparisons. Therefore, nding a good pivot takes (n log 3 n) time, and the entire Quicksort takes (n log 4 n) time. In this paper, we propose a simpler algorithm to nd a good pivot (see Section 3 for details). First, we connect the set of nuts with the set of bolts via some expander of constant degree and compare each nut to all the bolts to which it is connected by an edge of the expander. We discard all nuts which are only connected to smaller bolts or only connected to larger bolts. Then we play a simple knockout tournament on the remaining nuts (where in each round half of the nuts are eliminated) which guarantees that the winner of the tournament is a good pivot. Since we can play each round of the tournament in O(n) time, we can nd a good pivot in O(n log n) time. Therefore, we can solve the nuts and bolts matching problem in O(n log 2 n) time. Alon et al. [1] mention two potential applications of this problem: the rst is local sorting of nodes in a given graph [4], and the second is selection of read only memory with a little read/write memory [6]. In the next section, we describe the Quicksort algorithm more formally and recall some facts about expanders. In Section 3, we show how we can eciently nd a good pivot. And we conclude with some remarks in Section 4. 2 Basic De nitions Let S = fs1; : : :; sng be a set of nuts of di erent sizes and B = fb1; : : : ; bng be a set of corresponding bolts. For a nut s 2 S de ne rank(s) as jft 2 B j s  tgj . The rank of a bolt is de ned similarly. For a constant c < 21 , s is called a c-approximate median if cn  rank(s)  (1 ? c)n . Similarly, de ne the relative rank of s with respect to a subset T  B as rankT (s) := jft 2 TjTj js  tgj . If T is a multiset then the relative rank of s with respect to T is de ned analogously, where each t 2 T is counted according to its multiplicity in T . 3 The algorithm for matching nuts and bolts works as follows. (1) Find a c-approximate median s of the n given nuts (we will determine c later). (2) Find the bolt b corresponding to s. (3) Compare all nuts to b and all bolts to s. This gives two piles of nuts (and bolts as well), one with the nuts (bolts) smaller than s and one with the nuts (bolts) bigger than s. (4) Run the algorithm recursively on the two piles of the smaller nuts and bolts and the two piles of the bigger nuts and bolts. In the next section, we will show how to nd a c-approximate median in O(n log n) time, where c is a small constant. Then our main result follows immediately. Theorem 1. We can match n nuts with their corresponding bolts in O(n log n) 2 time. Proof. The correctness of the algorithm above follows immediately from the correctness of Quicksort. For the running time observe, that each subproblem has size at most (1?c)n, hence the depth of the recursion is only O(log n), and in each level of the recursion we spend at most O(n log n) time to compute the c-approximate median and O(n) time to split the problem into the two subproblems. tu We now recall some facts about expanders (see for example [5] if you want to learn more about expanders). Let 0 <  21 and c < 1. An (n; k; ; c)expander is a k-regular bipartite graph on vertices I (inputs) and O (outputs), where jI j = jOj = n, such  that everysubset A  I of size at most n is joined by edges to at least jAj 1 + c(1 ? jAn j ) di erent outputs. The constant c is called the expansion factor of the graph. Theorem 2 (Alon, Galil, and Milman [2], Cor. 2.3). If n = m for some 2 integer m, then we can construct an (n; 9; 21 ; 0:41)-expander in O(n) time. Corollary 3. Let 0 <   and = 53(1?5?) . Then there exists an integer q such that for any n, where n = m2 for some integer m, we can construct an (n; q ; ; )-expander in O(n) time. In such an expander, any subset of the inputs of size n is connected to at least 53 n di erent outputs. 1 2  Proof. We take a series of the expanders of Theorem 2 and identify the outputs O1 of the rst one with the inputs I2 of the second one, the outputs O2 of the second one with the inputs I3 of the third one, and so on. Then there is an integer k (independent of n) such that any set of n inputs of I1 is connected to at least 4 di erent outputs of Ok and hence to at least (1 + 0:241 ) n2 > 35 n di erent outputs of Ok +1. We can easily calculate k by computing the series de ned by a0 :=  and ai+1 := ai (1 + 0:41(1 ? ai)); then k is the smallest index i such that ai  21 . Hence, to get the desired bipartite graph, we only have to connect each node v of I1 to all nodes w of Ok +1 which can be reached from v by traversing a path which uses exactly one edge from each of the k +1 expanders. Then the degree of any node is clearly at most q := 9k +1. To make the graph q-regular we can add arbitrary dummy edges without destroying the expansion property. Further, the expansion factor of the graph is at least  because (1 +  (1 ? )) n = 53 n (note that subsets of the inputs of size smaller than n are even better expanded). tu n 2 3 Finding a c-Approximate Median Our algorithm to nd a c-approximate median is based on a knockout tournament played on some subset of the nuts. We start with a subset S1  S of the nuts where each nut s 2 S1 has a set T1(s) of two bolts associated with it; for all s, the sets T1(s) need not be disjoint, but every bolt may appear only in a constant number of them. We describe later how S1 is constructed. We then play dlog jS1je rounds of the tournament, where in each round half the nuts survive for the next one. Intuitively, we take any two nuts together with their sets of associated bolts, determine which nut splits the union of both sets of bolts less equally, eliminate that nut, and give both sets of bolts to the surviving nut. Unfortunately, pairing the nuts arbitrarily does not quite work, i.e., the winner of the tournament would not necessarily be a c-approximate median, but there is a simple way to overcome that diculty. In general, let Si be the set of nuts before we start round i. For each nut s 2 Si let Ti(s) be the multiset of bolts associated with s and let ri(s) := rankTi (s)(s) be the relative rank of s with respect to its set of bolts Ti(s). Let Si1 := fs 2 Si j ri (s)  21 g and Si2 := fs 2 Si j ri (s) < 21 g. We play the knockout tournament as follows. i := 1; while jSi j > 2 do (1) Pair the nuts of Si1 arbitrarily. If jSi1j is odd then we eliminate the single nut without a partner. (2) Let (s1; s2) be a pair of nuts from Si1. Compute the relative ranks of s1 and s2 with respect to the multiset T := Ti(s1) [ Ti(s2). Note that it is sucient to compare s1 with all bolts in Ti(s2) and s2 with all bolts in Ti(s1), because rankT (sj ) = 21 (ri(sj ) + rankTi (s3?j )(sj )), for j = 1; 2 (here we use Observation 4 (c)). Whichever nut s has relative rank closer to 12 survives in Si+1 and is associated with the multiset Ti+1(s) := T . 5 (3) Repeat steps (1) and (2) with Si2 instead of Si1. od Let l be the value of i after the while-loop terminates, i.e., jSlj  2. We claim that if S1 was suciently large then every nut in Sl is a c-approximate median, where c is a small constant (see Lemma 5). But rst we make a few simple observations. Observation 4. Assume we play the tournament starting with some set S1 of nuts. Then (a) dlog jS1je ? 1  l  dlog jS1 je. (b) Sl 6= ;. (c) For i = 1; : : : ; l and all s 2 Si, jTi (s)j = 2i . In particular, jTl(s)j  jS21j for all s 2 Sl . (d) Each round needs O(jS1 j) time. Proof. (a) In each round, we eliminate half of the nuts which could be paired, and at most two unpaired nuts, i.e., jSi+1j  jS 2j?2 . We stop if at most two nuts remain. It is easy to show by induction on jS1j that l must be at least log(jS1j + 2) ? 1. This proves the rst inequality. The second inequality follows directly from jSi+1j  jS2 j . (b) We never eliminate all nuts. (c) By induction on i. (d) Observe that in each round, every bolt is involved in at most one comparison (in step (2)). Since there are a total of 2jS1j bolts in the rst round and we never let additional bolts enter the game, we do at most 2jS1j comparisons in each round. Further, pairing the nuts, computing the relative ranks, and merging two multisets of bolts do not increase the asymptotic complexity. tu i i Lemma 5. Let S1  S and = jSn j . Suppose, each nut s 2 S1 lies between the two bolts in T1(s), i.e., blow (s) < s < bhigh(s) if T1(s) = fblow (s); bhigh(s)g, and every bolt appears at most q times in the sets T1(s). Then any s 2 Sl is a c-approximate 1 median, where c = 8q . Proof. Before the rst round, we have r1 (s) = 12 for all s 2 S1, and hence 41  r2 (s)  43 for all s 2 S2. We now prove by induction, that this inequality holds after each round. Assume we know that 41  ri(s)  34 for all s 2 Si. Let (s1; s2) be a pair from Si1 , where w.l.o.g. s1 < s2. Let T be the multiset Ti (s1) [ Ti (s2). Since s1 is larger 6 than half of the bolts in Ti(s1), it must be larger than a quarter of the bolts in T . On the other hand, it is smaller than a quarter of the bolts in Ti(s1) and smaller than a quarter of the bolts in Ti(s2) (because it is smaller than s2); hence it is smaller than a quarter of the bolts in T . Therefore, the inequality holds for s1, and we only eliminate s1 if the relative rank of s2 with respect to T is even closer to 12 . Let s 2 Sl. Since Tl(s) contains at least 2n bolts by Observation 4 (c), we conclude from the inequality above that s is larger than 8n bolts and smaller than another 8n bolts. Since each bolt may have up to q copies in Tl(s), 8nq  rank (s)  (1 ? tu 8q )n, i.e., s is a 8q -approximate median. Now we can give our algorithm to nd a c-approximate median. p (1) If n is not the square of an integer, then add at most 2 n dummy nuts and bolts to make n the square of an integer. (2) Let B = I and S = O be the sets of vertices of the (n; q; 201 ; 220 19 )expander of Corollary 3. We compare each bolt with all the nuts to which it is connected by an edge of the expander. Let S1 be the set of nuts which are compared to at least one smaller bolt and at least one larger bolt. For all s 2 S1, pick arbitrarily one of the smaller and one of the larger bolts and put them into the set T1 (s). (3) Now play the knockout tournament starting with S1. Let Sl be the set of (at most two) winners of the tournament. Choose any s 2 Sl as a good pivot. Theorem 6. This algorithm computes in ( log ) time a -approximate meO n n dian, where c = 801q is a small constant not depending on n. c Proof. Construction of the expander and hence of set S1 takes O(n) time (note that enlarging n slightly in step(1) does not increase the asymptotic complexity of the following steps). And the tournament takes O(jS1j log jS1j) = O(n log n) time by Observation 4. It remains to show that we really compute a c-approximate median for some constant c. First, observe that jS1j  10n . To see this, let B1 be the set of bolts with rank at most 20n and B2 be the set of bolts with rank at least 19 20 n. Since the n 3 expander connects any subset of B of size 20 to at least 5 n di erent nuts in S , there must be a set of at least n5 nuts which are connected to bolts in both B1 and B2. But then at least n5 ? 2  20n = 10n of the nuts must have rank between 20n and 19 20 n, which means they are in the set S1 . 7 Next, observe that each bolt appears in at most q of the sets T1(s) because the expander connects each bolt to exactly q nuts. Hence by Lemma 5, any s 2 Sl is a c-approximate median, where c = 801q . tu 4 Conclusions We have presented an O(n log 2 n) time deterministic algorithm for matching nuts and bolts. This improves the rst O(n(log n)O(1))-time solution of this problem, given by Alon et al.[1], by a factor of log2 n. As already mentioned in [1], the methods described in this (and their) paper seem not to be sucient to reduce the complexity below O(n log 2 n). Unfortunately, the constants hidden in our O-notation are incredibly large (far beyond the constant 108 in [1]). This is mainly due to the iterative construction in Corollary 3 which produces an expander of enormous, but still constant, degree (a short calculation shows that the degree is q 201 = 99 because we must build the expander from 9 copies of the simple expander of Theorem2). On the other hand, the standard counting argument shows that there is a family of bipartite 24-regular graphs on 2n nodes which connect any subset of the inputs of size n4 to at least 87 n di erent outputs. Using these graphs in the construction of the set S1 would give us an algorithm with fairly reasonable constants. However, we do not know an explicit construction for them. But it is clear that any improvement to our Corollary 3 can drastically reduce the running time of our algorithm (and make it more practical). References 1. N. Alon, M. Blum, A. Fiat, S. Kannan, M. Naor and R. Ostrovsky. Matching nuts and bolts. Proceedings of the 5th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'94), 1994, pp. 690{696. 2. N. Alon, Z. Galil and V.D. Milman. Better expanders and superconcentrators. Journal of Algorithms 8 (1987), pp. 337{347. 3. T.H. Cormen, C.E. Leiserson and R.L. Rivest. Introduction to algorithms. MIT Press, 1990. 4. W. Goddard, C. Kenyon, V. King and L. Schulman. Optimal randomized algorithms for local sorting and set-maxima. SIAM Journal on Computing 22, 1993, pp. 272{ 283. 5. A. Lubotzky. Discrete groups, expanding graphs and invariant measures. Birkhauser Verlag, 1994. 6. J.I. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science 12, 1980, pp. 315{323. 7. G.J.E. Rawlins. Compared to what ? An introduction to the analysis of algorithms. Computer Science Press, 1992.