A Distributed Graph Algorithm"
Knot Detection
J. MISRA and K. M. CHANDY
University of Texas at Austin
A knot in a directed graph is a useful concept in deadlock detection. A distributed algorithm for
identifying a knot in a graph by using a network of processes is presented. The algorithm is based on
the work of Dijkstra and Scholten.
Categories and Subject Descriptors: C.2.4 [Computer-Communication Systems]: Distributed
Systems--distributed applications, network operating systems; D.1.3 [Programming Techniques]: Concurrent Programming; F.2.2 [Analysis of Algorithms and Problem Complexity]:
Nonnumerical Algorithms and Problems--sequencing and scheduling; G.2.2 [Discrete Mathematics]: Graph Theory--graph algorithms, network problems
General Terms: Algorithms
Additional Key Words and Phrases: Distributed algorithms, message communication, knot
1. INTRODUCTION
A v e r t e x vi in a d i r e c t e d g r a p h is in a k n o t if for e v e r y v e r t e x vj r e a c h a b l e f r o m vi,
vi is r e a c h a b l e f r o m vj. C h a n g [2] s h o w s t h a t k n o t is a u s e f u l c o n c e p t in d e a d l o c k
d e t e c t i o n . D i j k s t r a [3] h a s p r o p o s e d a d i s t r i b u t e d a l g o r i t h m for d e t e c t i n g w h e t h e r
a g i v e n p r o c e s s in a n e t w o r k o f p r o c e s s e s is in a k n o t . H i s a l g o r i t h m is b a s e d o n
his p r e v i o u s w o r k w i t h S c h o l t e n [4] o n t e r m i n a t i o n d e t e c t i o n o f d i f f u s i n g c o m p u t a t i o n s . W e p r o p o s e a n a l g o r i t h m for k n o t d e t e c t i o n w h i c h is a l s o b a s e d o n [4]
b u t is c o n c e p t u a l l y s i m p l e r . W e also d i s c u s s t h e e x t e n s i o n s o f o u r a l g o r i t h m t o a
m o r e g e n e r a l class o f p r o b l e m s .
2. MODEL OF A NETWORK OF COMMUNICATING PROCESSES
A p r o c e s s is a s e q u e n t i a l p r o g r a m w h i c h c a n c o m m u n i c a t e w i t h o t h e r p r o c e s s e s
by sending/receiving messages. Two processes P and Q are said to be neighbors
if t h e y c a n c o m m u n i c a t e d i r e c t l y w i t h o n e a n o t h e r w i t h o u t h a v i n g m e s s a g e s go
through intermediate processes. We assume that communication channels are
b i d i r e c t i o n a l : if P c a n s e n d m e s s a g e s t o Q, t h e n Q c a n s e n d m e s s a g e s t o P . A
p r o c e s s k n o w s its n e i g h b o r s b u t is o t h e r w i s e i g n o r a n t o f t h e g e n e r a l c o m m u n i cation structure of the network.
Supported in part by the Air Force under grant AFOSR 81-0205.
Authors' address: Department of Computer Sciences, University of Texas, Austin, TX 78712.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
© 1982 ACM 0164-0925/82/1000-0678 $00.75
ACM Transactionson ProgrammingLanguagesand Systems,Vol. 4, No. 4, October 1982, Pages 678-686.
Knot Detection
•
679
We assume a very simple protocol for message communication; this protocol is
equivalent to the one used by Dijkstra and Scholten [4]. Every process has an
input buffer of unbounded length. If process P sends a message to a neighbor
process Q, then the message gets appended at the end of the input buffer of Q
after a finite, arbitrary delay. We assume that (1) messages are not lost or altered
during transmission, (2) messages sent from P to Q arrive at Q's input buffer in
the order sent, and (3) two messages arriving simultaneously at an input buffer
are ordered arbitrarily and appended to the buffer. A process receives a message
by removing it from its input buffer.
The assumption of unbounded length buffers is for ease of exposition. We show
in Section 5.1 that the input buffer length of process Q can be bounded by the
number of neighbors of Q.
3. A DISTRIBUTED ALGORITHM FOR KNOT DETECTION
Consider a network of processes corresponding to a given directed graph G: there
is a one-to-one correspondence between processes in the network and vertices in
the graph; a process pi in the network represents vertex vi in G, for all i; and pi,
pj are neighbors if edge (vi, vj) or (vj, vi) exists in G. Process pl initiates a
computation to determine if vl is in a knot.
3.1 Local Variables of Processes
Every process pi maintains the following variables.
succeeding(i):
preceding(i):
subordinate(i):
cs(i):
This Boolean variable is set true when pi determines that vi is
reachable from vl. Initially this variable is false for all pi, i #
1, and is true for pl. Eventually, succeeding(i) will be true if
and only if vi is reachable from Vl.
Same as above, except that preceding(i) represents whether
vl is reachable from vi.
This is integer valued and will be set to 1 if and only if
succeeding(i) a n d notpreceding(i); else it will be set to 0. v~
is in a knot if and only if subordinate(i) is eventually zero for
every process i.
This is an integer-valued variable which keeps the partial sum
of some subordinate variables. A goal of the program is to
establish the following at termination:
cs(1) = ~ subordinate(i).
i
Therefore v~ is in a knot if and only if cs(1) -- 0 at termination.
We discuss in Section 3.2 the different types of messages sent among processes.
In short, a process pi may send a message to Pi, and pj sends an acknowledgment
(ack) to pi for every message that pj receives from pi. We introduce the following
variables related to message and ack transmission.
num(i):
This is the number of unacknowledged messages, that is, the number
of messages sent by this process pi for which acks have not been
received so far.
ACM Transactions on P r o g r a m m i n g Languages a n d Systems, Vol. 4, No. 4, October 1982.
680
J. Misra and K. M. Chandy
father(i):
T h i s is a process f r o m which pi, i ~ 1, received a message w h e n its
num(i) was last zero. father(i) is undefined initially.
Our goal is to m a i n t a i n a rooted tree structure at all times over processes whose
n u m > 0; father will denote the p a r e n t in this tree structure, a n d p l the root.
3.2 Messages Sent Among Processes
T h e r e are two types of messages sent b e t w e e n neighbors in this algorithm.
(i) Structure message, or message, has two c o m p o n e n t s , (type, p-), where type
= suc or pre and p is the identity of the sender process. Process pi sends (suc, pi)
to pj if there is a p a t h f r o m Vl to vj in which vi is the prefinal vertex. Process pi
sends (pre, pi) to pj if t h e r e is a p a t h f r o m vj to vl in which vi follows vj in the
path.
(ii) A c k n o w l e d g m e n t message, or ack, is of the f o r m (ack, c), where c is a n
integer. Acks are used to u p d a t e cs and num. T h e entire c o m p u t a t i o n t e r m i n a t e s
w h e n process p l receives acks for all messages t h a t it sent, t h a t is, w h e n num(1)
is d e c r e m e n t e d to zero. Acks for all messages are sent b a c k as soon as the
messages are received, except for messages received f r o m father; an ack to a
father is sent only when num next b e c o m e s zero.
Convention. I t is convenient for p u r p o s e s of p r o o f to define an atomic action
within which invariant assertions m a y be t e m p o r a r i l y violated and outside of
which the invariants m u s t hold. We write (A1; A2; . . . ; A , ) to show t h a t
executions of s t a t e m e n t s A1, A2 . . . . . An m u s t be considered as an atomic action.
We use PASCAL-like notation with the added c o m m a n d s send and receive to
write our programs.
3.3 Knot-Detection Algorithm
Convention. We write succeeding, preceding, etc., for succeeding(i), preceding(i), etc., w h e n the context is clear.
Overview of the Algorithm. As stated earlier, one goal of the algorithm is to
maintain a rooted directed tree structure over the set of processes p~ whose
num(i) > 0. T h e root of the tree will be pl, a n d father(i) will be the p a r e n t in the
tree for p~, i ~ 1. In order to m a i n t a i n the tree structure, we m u s t ensure t h a t (1)
a process p~, i ~ 1, acquires a father only if it does not h a v e one currently: this is
guaranteed, since a process acquires a father only w h e n its num(i) b e c o m e s
nonzero; and (2) a process pi can be r e m o v e d f r o m the tree (i.e., set its num(i)
= 0) only if it is a leaf node: this will be g u a r a n t e e d b y every process sending its
last ack to its father. C o m p u t a t i o n t e r m i n a t e s w h e n the tree is e m p t y .
We will also m a i n t a i n the invariant (1) given in L e m m a 4.2, which states t h a t
the s u m of cs over all processes plus the c's in the acks in transit equals the s u m
of subordinates over all processes. T h e algorithm will ensure t h a t if num(i) = 0
and i ~ 1, t h e n cs(i) = 0. Therefore, w h e n the tree is e m p t y , cs(i) = 0 for all i, i
1, and hence
cs(1) = ~ subordinate(i).
i
ACM Transactions on Programming Languages and Systems, Vol. 4, No. 4, October 1982.
Knot Detection
681
P r o c e s s p l is in a k n o t if a n d o n l y if cs(1) -- O.
Algorithm for Pl
Initialization
begin
f a t h e r is undefined;
subordinate := O; cs := O; n u m :-- O;
(succeeding :ffi true;
n u m := n u m + number of successors of vl;
send (suc, pi) to all successors} ;
(preceding :ffi true;
n u m := n u m + number of predecessors of Vl;
send (pre, p~) to all predecessors}
end
Upon receiving a structure message (type, p)
send (ack, O) t o p
(M1)
Upon receiving an a c k n o w l e d g m e n t (ack, c)
begin
cs := cs + c; n u m := nurn - 1;
i f nurn = 0 t h e n terminate computation
{vl is in a knot if cs = 0}
end
(M2)
Algorithm for pi, i # 1
Initialization
begin
f a t h e r is undefined; subordinate := 0; cs := 0; n u m := 0;
succeeding :-- false; p r e c e d i n g := false
end
Upon receiving a message (type, p )
begin
{update f a t h e r or send an ack immediately}
if num= 0
then f a t h e r := p
else b e g i n (send (ack, cs) t o p ; cs := 0} end;
(L1)
{update succeeding and p r e c e d i n g if necessary}
i f type = suc a n d n o t succeeding {For the first time pi has determined that vi is
reachable from Vl}
then
b e g i n (succeeding :-- true;
n u m := n u m + number of successors of vi;
send (suc, pi) to all successors)
end;
i f type = p r e a n d n o t p r e c e d i n g {For the first time p~ has determined that vl is
reachable from vi }
then
b e g i n (preceding := true;
n u m := n u m + number of predecessors of v~;
send (pre, pi) to all predecessors)
end;
{update subordinate if necessary. Also update cs to maintain the invariant in Lemma 4.2)
ACM Transactions on Programming Languages and Systems, Vol. 4, No. 4, October 1982.
682
J. Misra and K. M. Chandy
if succeeding and n o t preceding
then
begin ( cs := cs - subordinate + 1; subordinate := 1) end
else
begin (cs := cs - subordinate + 0; subordinate := O) end;
{send ack to father if n u m = 0}
if n u m = 0
t h e n begin {send {ack, cs) to father; cs := O) end
end
(L2)
(L3)
(L4)
Upon receiving an acknowledgment ( ack, c)
begin
cs := cs + c; num := num - 1;
if num = 0
then
begin (send (ack, cs) to father; cs := O) end
(L5)
(L6)
end
4. PROOF OF CORRECTNESS
LEMMA 4.1. A t a n y p o i n t in t h e c o m p u t a t i o n , t h e s e t o f p r o c e s s e s w i t h n u m
> 0 f o r m a r o o t e d tree w i t h p l a s t h e r o o t a n d t h e p a r e n t r e l a t i o n s p e c i f i e d by
the local variable "father."
PROOF. T h e l e m m a holds vacuously initially, n u m ( i ) and f a t h e r ( i ) m a y be
changed only u p o n receipt of a m e s s a g e or an a c k by process i. I f a process with
n u m > 0 receives a m e s s a g e , t h e n it does not alter its f a t h e r , t h u s preserving the
tree property. Similarly, if a process h a s n u m > 0 after processing an a c k , it does
not alter the tree structure. If a process P / c h a n g e s n u m ( j ) f r o m zero, t h e n it
m u s t h a v e received a m e s s a g e f r o m some other process pi on the tree and m u s t
have set f a t h e r ( j ) = i, thus preserving the tree property.
We now show t h a t only a leaf node can d e c r e m e n t its n u m to zero. If pi is on
the tree and is not a leaf, t h e n t h e r e is a process p/ with n u m ( j ) > 0 a n d f a t h e r ( j )
= i; t h e n pj will not r e t u r n an a c k to pi while p j r e m a i n s on the tree, and hence
n u m ( i ) > 0 while pj r e m a i n s on the tree. T h e r e f o r e only a leaf node can d e c r e m e n t
its n u m to 0, which preserves the tree property. T h i s c o m p l e t e s the proof. []
L e t T, at any point in computation, denote the set of a c k messages which are
in Transit, t h a t is, which h a v e b e e n sent b u t h a v e not y e t b e e n received.
LEMMA 4.2. T h e f o l l o w i n g is a n i n v a r i a n t :
cs(i) +
i
~
(ack, c ) E T
c = ~ subordinate(i).
(1)
i
PROOF. T h e l e m m a holds initially, since all the t e r m s in the equation are zero.
For pi, i ~ 1, the t e r m s in the equations are modified only at p r o g r a m points L 1 L6, and for pl, these t e r m s can be modified only at M1 or M2. T h e r e a d e r m a y
easily convince himself t h a t the equation is left invariant b y the execution of the
s t a t e m e n t s at these p r o g r a m points. []
THEOREM 4.3. A s s u m e t h a t p r o c e s s p l t e r m i n a t e s c o m p u t a t i o n (in s t e p M2).
cs(1) = 0 i f a n d o n l y i f vl is i n a k n o t .
ACM Transactions on P r o g r a m m i n g Languages a n d Systems, Vol. 4, No. 4, October 1982.
Knot Detection
683
PROOF. We first show that when pl terminates computation (i) cs(i) = 0 for i
# 1, (ii) subordinate(i) is correctly set, and (iii) the set T is empty. T h e t h e o r e m
follows directly from the invariant proved in L e m m a 4.2.
(i) When pl terminates computation in step M2, num(1) = O. T h e n the tree is
empty, since pl was the root of the tree. T h e r e f o r e n u m ( i ) = 0 for all i. If n u m ( i )
= 0, then cs(i) = 0 for all i, i # 1, because every change to n u m ( i ) is followed by
the code to set cs(i) to 0 i f n u m ( i ) is 0 (steps IA and L6).
(ii) If vi is reachable from Vl, it follows by induction on p a t h length to vi t h a t
pi will eventually receive a message which will result in succeeding(i) set true;
succeeding(i) remains true thereafter. Similarly for preceding(i). T h e r e f o r e
subordinate(i) will eventually be set to its correct value. W h e n assignment is
made to succeeding(i) or preceding(i), pi has not r e t u r n e d an ack to its father,
and hence the computation could not be over. T h e r e f o r e these variables are
assigned their correct values before the termination of computation.
(iii) Since the tree is empty, every process must have received acks corresponding to all messages sent. T h e r e f o r e there can be no ack in transit, t h a t is, set T
is empty. []
LEMMA 4.4. pl will terminate computation in finite time.
PROOF. A processpi sends at most two messages (type, pi) to any other process
pj because (1) a message is sent only when succeeding or preceding is set to true,
and (2) succeeding and preceding are never reset to false. Because the graph is
finite, the total n u m b e r of messages sent is bounded. Hence the total n u m b e r of
acks sent is also bounded. Observe t h a t every process must send or receive either
a message or an ack every time it starts to execute. T h e r e f o r e a process can
switch from idle to executing only a finite n u m b e r of times. T h e r e are no loops in
the program; therefore every executing process will become idle in finite time.
Hence every process in the network will cease to execute in finite time, and no
more messages or acks will be sent or received from t h e n on.
We now show t h a t the tree must be e m p t y at this point. If not, let p~ be a leaf
node of the tree; n u m ( i ) > 0, since pi is on the tree. T h e r e is no pj on the tree for
which f a t h e r ( j ) = pi, and hence pi must have received all its outstanding acks;
therefore n u m ( i ) = 0, a contradiction! []
5. NOTES ON THE KNOT-DETECTION ALGORITHM
5.1 Bounding the Buffer Size
We assumed earlier for purposes of exposition t h a t buffers are of u n b o u n d e d
length. In the knot-detection algorithm a process sends at most two messages to
any neighbor process, and therefore no process sends more t h a n two acks to any
other process. Hence the buffer length for any process need not exceed four times
the n u m b e r of neighbors of the process.
5.2 Efficiency
This algorithm is superior to the brute-force algorithm in which process p l (1)
computes successor*, the set of vertices reachable from vl; (2) computes predeACM Transactions on Programming Languages and Systems, Vol. 4, No. 4, October 1982.
684
J. Misra and K. M. Chandy
cessor*, the set of vertices that can reach vl; and (3) then declares that vl is in a
knot if and only if successor* C__predecessor*. T he computation of successor*
(predecessor*) can be done by using an algorithm similar to the one proposed
here--every ack carries with it a set of successors (predecessors). Therefore a
successor at distance d from v~ will have its identity transmitted through d
processes to reach vl. The total message length will be at least O ( N 2) for an Nvertex graph, as opposed to O ( E ) for our algorithm, where E is the number of
edges.
6. EXTENSIONS
We show in this section that the ideas in the knot-detection algorithm can be
extended to solve a very general class of problems. Consider a distributed
computation which is initiated by process pl sending messages to some of its
neighbors. Any other process can send messages only after receiving a message.
Th e computation terminates when no process has any more messages to send
and all messages that have been sent have been received. Dijkstra and Scholten
[4] were the first to identify this class of computations, which they call diffusing
computations. T he y proposed an algorithm, using the growing and shrinking tree,
to detect termination of diffusing computations. Our contribution is to show how
the same idea may be exploited to compute a networkwide function of locally
computed results.
Let local-result(i) denote some computed result at process pi, at termination
of the entire computation. It is required to compute global-result at the termination of computation, where
global-result = f(local-result(i), for all i),
(2)
f being any arbitrary computable function.
The knot-detection algorithm computed the global-result cs(1), with localresult(i) =- subordinate(i), and
cs(1) = ~ subordinate(i),
(3)
i
that is, f ~ ~.
We propose two schemes for computing networkwide functions. Note that our
algorithm can be used to develop distributed algorithms according to the following
methodology. In order to compute some global-result, invent a function f and
• local-result(i) satisfying (2) and then design a distributed algorithm to compute
local-result(i) at processpi, for all i. T h e n superimpose our algorithm to compute
the global-result. A variation of this idea appears in [1], where a number of other
problems amenable to this approach are listed.
One difficulty with a straightforward implementation is that a process cannot
know when network computation has terminated. Process pi knows t hat network
computation can terminate only when num(i) --- 0; however, pi cannot assert the
converse, that is, that network computation may not have terminated even if
num(i) = 0. Hence pi must send back its current value of local-result(i) to its
father every time that it decrements num(i) to zero. This causes a problem: pi
may send back a local-result to its father and subsequently get another message
ACM Transactions on Programming Languages and Systems, Vol. 4, No. 4, October 1982.
Knot Detection
685
which causes it to compute a new local-result. Therefore pi must cancel the old
local-result value. We propose two mechanisms for canceling out-of-date local
results: bags and time stamps.
To simplify exposition in our discussion of cancellation schemes, we will assume
that there is no delay between sending and receiving a message, that is, that there
is never any message in transit. The reader can easily convince himself that the
arguments also apply when the transmission delay is not zero.
6.1 Bags
Each process p/maintains two bags, all(i) and canceled(i). Each bag element is
of the form (j, local-result(j)). If (j, x) is an element in canceled(i), then process
pj has definitely canceled an out-of-date local-result x. If (j, x) is an element of
all(i), then at some time pj posted a local-result x. The elements in all(i) are not
necessarily current. Every local-result that P1 has posted appears in the union of
bags all(i), for every i. Similarly, all local-results that pj has canceled appear in
the union of canceled(i), for every i. Therefore pj's current local-result is in the
difference of these two bag unions. In other words, the goal is to maintain the
following invariant. Let r ( j ) denote the current local-result of process j, and let
U denote the union operation over bags. Then
U (j, r(j)) = U all(i) - U canceled(i).
j
i
i
Initially, all(i) holds the initial local-result ofpi, and canceled(i) is empty. To
post a current local-result x and cancel the previous local-result y, process pi
adds (i, x) to all(i) and (i, y) to canceled(i).
Two bags a bag and c bag are returned with every ack in the form ( ack, a bag,
c bag). When pj sends an ack, it takes the elements out of bag all(j) and puts
them into a bag, and similarly puts elements from canceled(j) into c bag, and
then sends abag and cbag along with the ack. Ifpi receives (ack, abag, cbag),
it adds the contents of a bag to all(i) and c bag to canceled(i).
At termination, all(i) and canceled(i) will be empty for i ~ 1, canceled(l) will
contain tuples corresponding to all canceled local-results, and all(l) will contain
tuples corresponding to all local-results, current and canceled. By removing the
canceled results (i.e., elements of canceled(l)) from all(l), pl can determine the
current local-results for all processes. The knot-detection algorithm of Section 3
uses the bag idea; the information in the two bags has been condensed into a
single integer cs. Adding an element (j, x) to all(i) is implemented by incrementing cs(i) by x. Adding an element (j, y) to canceled(i) is achieved by
decrementing cs (i) by y.
Efficiency. The sizes of the bags returned with acks can be reduced by having
each process pi remove all elements common to all(i) and canceled(i) from both
all(i) and canceled(i).
6.2 Time Stamps
Each process Pi maintains a set S (i) of triples of the form (j, n (j), local-result(j)),
where n ( j ) is a time stamp local to process pj. When a process pi wishes to post
ACM Transactions on Programming Languages and Systems, Vol. 4, No. 4, October 1982.
686
J. Misra and K. M. Chandy
a new local-result x (and cancel an out-of-date result), it increments n(i) and
adds (i, n(i), x) to S.
Whenpi sends an ack, it sends (ack, S(i)) and then sets S(i) to empty. Upon
receiving an ack, (ack, B), pi sets S(i) t o / t h e union of S(i) and B. Upon
termination, S(i) will be empty for all i ~ 1, and S(1) will contain all tuples
(i, n(i), S(i)) that have been sent. pl can identify the current local-results
because they will be associated with the latest time stamps.
Efficiency. The sizes of the sets returned with acks can be reduced by having
each process pi discard all elements in S(i) that it can identify as being out of
date.
ACKNOWLEDGMENTS
We gratefully acknowledge the suggestions of E. W. Dijkstra and C. S. Scholten,
on whose work this paper is based. We are also grateful to two anonymous
referees for their valuable comments.
REFERENCES
1. CHANDY, K.M., AND MISRA, J. Distributed computation on graphs: Shortest path algorithms.
Commun. ACM. 25, 11 (Nov. 1982).
2. CHANG, E. Decentralized deadlock detection in distributed systems. Tech. Rep., Univ. of
Victoria, Victoria, B.C., Canada.
3. DIJKSTRA, E.W. In reaction to Ernest Chang's Deadlock Detection. EWD702, Plataanstraat 5,
5671 AL Nuenen, The Netherlands, Feb. 21, 1979.
4. DIJKSTRA, E.W., AND SCHOLTEN, C.S. Termination detection for diffusing computation. Inf.
Process Left. 11, 1 (Aug..1980), 1-4.
Received September 1981; revised May 1982; accepted May 1982
ACM Transactions on Programming Languages and Systems, Vol. 4, No. 4, October 1982.