Cooper 2004

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Combinatorics, Probability and Computing

http://journals.cambridge.org/CPC

Additional services for Combinatorics, Probability and


Computing:

Email alerts: Click here


Subscriptions: Click here
Commercial reprints: Click here
Terms of use : Click here

The Size of the Largest Strongly Connected Component of


a Random Digraph with a Given Degree Sequence

COLIN COOPER and ALAN FRIEZE

Combinatorics, Probability and Computing / Volume 13 / Issue 03 / May 2004, pp 319 - 337
DOI: 10.1017/S096354830400611X, Published online: 28 April 2004

Link to this article: http://journals.cambridge.org/abstract_S096354830400611X

How to cite this article:


COLIN COOPER and ALAN FRIEZE (2004). The Size of the Largest Strongly Connected
Component of a Random Digraph with a Given Degree Sequence. Combinatorics, Probability and
Computing, 13, pp 319-337 doi:10.1017/S096354830400611X

Request Permissions : Click here

Downloaded from http://journals.cambridge.org/CPC, IP address: 130.56.64.29 on 16 Mar 2015


Combinatorics, Probability and Computing (2004) 13, 319–337. 
c 2004 Cambridge University Press
DOI: 10.1017/S096354830400611X Printed in the United Kingdom

The Size of the Largest Strongly Connected


Component of a Random Digraph with a
Given Degree Sequence

C O L I N C O O P E R1 and A L A N F R I E Z E2†
1 Department of Mathematical and Computing Sciences, Goldsmiths College, London SW14 6NW, UK
(e-mail: cooper@dcs.kcl.ac.uk)
2 Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh PA15213, USA
(e-mail: alan@random.math.cmu.edu)

Received 18 March 2002; revised 7 October 2003

We give results on the strong connectivity for spaces of sparse random digraphs specified
by degree sequence. A full characterization is provided, in probability, of the fan-in and
fan-out of all vertices including the number of vertices with small (o(n)) and large (cn)
fan-in or fan-out. We also give the size of the giant strongly connected component, if
any, and the structure of the bow-tie digraph induced by the vertices with large fan-in
or fan-out. Our results follow a direct analogy of the extinction probabilities of classical
branching processes.

1. Introduction
One of the most important questions in the theory of random graphs concerns the size
of the largest component of such a graph. In their formative paper [8], Erdős and Rényi
proved a strong dichotomy for the size C1 of the largest component of the random
graph Gn,m when m = cn/2, c constant.1 Erdős and Rényi2 showed that, in Gn,m , if c < 1
then w.h.p. C1 = O(log n) and that if c > 1 then C1 ∼ G(c)n for some function G(c) > 0.
A component of order n is called a giant component. When c = 1 the situation is more
complicated and much effort has gone into an analysis of this case. See, e.g., Bollobás [4],
L
 uczak [12], L
 uczak, Pittel and Wierman [13], Janson, Knuth, L uczak and Pittel [9] and
the books by Bollobás [3] and by Janson, L uczak and Ruciński [10].
Molloy and Reed [14, 15] consider a model of random graphs with a fixed degree
sequence. In this model, the number of vertices of degree j is approximately λj n. More

† Supported in part by NSF grant CCR-9818411.


1 The random graph Gn,m has vertex set [n] and m randomly chosen edges.
2 An event E occurs with high probability (w.h.p.) if Pr(E) tends to 1 as n → ∞.
320 C. Cooper and A. Frieze

precisely, the limiting proportion λj of vertices of a given degree j is fixed, and defines a
sequence of nonnegative real numbers λ0 , λ1 , . . . , λj , . . . where λ0 + λ1 + · · · + λj + · · · = 1.

Let Q = j j(j − 2)λj . Molloy and Reed prove that if Q < 0 then w.h.p. a random graph
Gλ with the implied degree sequence has maximum component size C1 = O(∆2 log n).
However, if Q > 0 then w.h.p. Gλ has a unique giant component of size C1 ∼ cλ n for
some constant cλ > 0, and the second-largest component has size C2 = O(∆2 log n). This
interesting result has recently proved to be useful in the study of random graphs with
degree sequences differing from those seen in the classic theory; for example see Aiello,
Chung and Lu [1] in the context of massive graphs.
When we consider the connectivity of digraphs we find that much less has been done.
The formative paper in this area, by Karp [11], considers the size of the strongly connected
components in the random digraph Dn,p . See also Uno and Ibaraki [16].

1.1. Definitions, theorems and informal derivation of results


 
Let l = (li,j , i, j  0) be a set of nonnegative integers satisfying i,j li,j = n and i,j ili,j =

i,j jli,j . Let D(l) be the space of simple digraphs D with vertex set V = [n] and with the
following property: the degree sequence of D is fixed and there are li,j vertices v with
in-degree d− (v) = i and out-degree d+ (v) = j. Let D denote a digraph chosen uniformly at
random (u.a.r.) from the space D(l).
Let θn be the number of arcs of D. We assume that θ > 0. Then, by counting the total
in-degree, and total out-degree respectively, we find
 
θn = ili,j = jli,j .
i,j i,j

Let
pi,j = li,j /n
and let
 li,j  li,j
p+
j = i and p−
i = j .
θn θn
i j
+
Thus p is the distribution of the out-degree of the terminal vertex of a randomly chosen
arc. Similarly p− is the distribution of the in-degree of the initial vertex of a randomly
chosen arc. Let
 li,j  
d= ij = jp+
j = ip−
i
i,j
θn j i

be the average directed degree.


We will first give an intuitive description of the model and state the main results.
Subsequent sections deal with the technical issues and make our description precise.
We will use a sequence of absolute constants A0 , A1 , . . . whose exact values are not
always specified, but for which precise values can easily be computed.

Definition. Let B + be the independent branching process with a single initial node in
which the probability distribution of the number of descendants of a node is p+ . Let η +
be the probability that B + continues indefinitely.
The Size of the Largest Strongly Connected Component of a Random Digraph 321

Thus 1 − η + , the extinction probability of B + , is given by the smallest positive solution


 + j
of x = pj x and satisfies


1 − η+ = j (1 − η ) .
p+ + j
(1.1)
j=0

If d < 1 then the smallest positive solution of (1.1) is at 1 − η + = 1, whereas if d > 1 there
is a unique solution with 1 − η + ∈ (0, 1).
For a fixed vertex v of D, the fan-out of v, R + (v) is the set of vertices w (including
v) reachable from v by a directed (v, w)-path. Similarly, the fan-in of v, R − (v), is the set
of vertices u which can reach v by a directed (u, v)-path. We will say that v has a small
fan-out if |R + (v)|  A0 ∆2 log n and that v has a small fan-in if |R − (v)|  A0 ∆2 log n. If the
fan-out is not small then w.h.p. it will be of size A1 n, in which case we say it is large. Let
L+ denote the set of vertices with a large fan-out and L− denote the set of vertices with
a large fan-in.
If v has out-degree k then by doing a breadth-first search from v we see that the
arcs of a small R + (v) will be (approximately) the union of k independent copies of the
branching process B + . Thus a root vertex of out-degree k has probability (1 − η + )k of a
finite progeny. Let

1 − π+ = pi,k (1 − η + )k . (1.2)
i,k

Then 1 − π is (approximately) the probability that a randomly chosen vertex v will have
+

a small fan-out. We should therefore expect that


|L+ | ∼ π + n.
Similarly, we expect that
|L− | ∼ π − n,
where

1 − π− = pi,k (1 − η − )k , (1.3)
i,k

and η − is the smallest positive value satisfying



1 − η− = p− − i
i (1 − η ) . (1.4)
i

Thus 1 − η − is the extinction probability of the branching process B − where the distri-
bution of the number of progeny is given by p− .
If d < 1 then η + = η − = 0 and hence π + = π − = 0. This suggests that
d < 1 implies L+ = L− = ∅.
For the number of vertices in a large fan-out we argue as follows. If v has a large
fan-out and w has a large fan-in then it is very likely that there will be an arc directed
from the fan-out of v to the fan-in of w which of course implies that w would be in the
322 C. Cooper and A. Frieze

fan-out of v. Conversely, if w has a small fan-in then this is unlikely. Thus we expect that

R + (v) is large ⇒
R + (v) ⊇ L− and |R + (v) \ L− | = o(n) and so |R + (v)| ∼ π − n. (1.5)

By analogy, we expect

R − (v) is large ⇒
R − (v) ⊇ L+ and |R − (v) \ L+ | = o(n) and so |R − (v)| ∼ π + n. (1.6)

Assume now that p+ 0 , p0 > 0 and d > 1. Let us now consider the expected number of arcs
in a large fan-in or fan-out. We say that an arc (v, w) has a large fan-in (resp. fan-out)
if v has a large fan-in (resp. w has a large fan-out). By analogy with the vertex case,
we would expect that the number of arcs in the fan-out (resp. fan-in) of a vertex with
a large fan-out (resp. fan-in) would be close to the number of arcs with a large fan-in
(resp. fan-out). To estimate the number of arcs with a large fan-in we need to consider
the branching process with progeny distribution p− . Here we consider each node of the
process to correspond to an arc of the fan-in of our initial root arc. Thus we expect the
number of arcs in a large fan-out to be approximately

ξ+
ξ + n where = η− . (1.7)
θ
Again, there are θn arcs and each has probability ∼ η − of having a large fan-in.
Similarly, the number of arcs in a large fan-in is w.h.p. approximately

ξ−
ξ − n where = η+ . (1.8)
θ
Finally, we consider the size of the largest strongly connected component. It follows
from (1.5), (1.6) that w.h.p. L+ ∩ L− is contained in a strongly connected component S
of D. In fact L+ ∩ L− induces a maximal strongly connected component w.h.p. This is
because any vertex in L+ ∩ L− must have a large fan-in and a large fan-out. The size of
this strong component is approximately (π + + π − + ψ − 1)n, where

ψ= pi,j (1 − η − )i (1 − η + )j . (1.9)
i,j

The right-hand side of (1.9) is explained as follows. Choose a random vertex. It has
probability pi,j of having in-degree i and out-degree j. The expression (1 − η − )i (1 − η + )j is
an estimate of the probability that all of the i + j associated branching processes become

extinct, that is, n i,j pi,j (1 − η − )i (1 − η + )j is a good estimate of |L+ ∩ L− |.
The above giant strongly connected component will be unique. Every other strong
component will be of size  A0 ∆2 log n w.h.p. since every vertex not in S either has a
small fan-out or a small fan-in.
This concludes the intuitive description. Now comes the hard work of making the above
discussion precise.
The Size of the Largest Strongly Connected Component of a Random Digraph 323

1.2. Main results


The following theorems summarize what will be rigorously verified. Some notes on the
structure of the proof of these theorems are given in Section 1.4. The theorem refers to
proper degree sequences. The definition of proper is deferred until Section 1.3.

Theorem 1.1. Let the sequence (li,j ) be proper. If d < 1, then w.h.p. L+ = L− = ∅.

Let the parameters η + , π + , ξ + , η − , π − , ξ − , ψ be as defined in (1.1)–(1.9).

Theorem 1.2. Assume that (li,j ) is proper, that p− +


0 , p0 > 0, and that d > 1. Then w.h.p. we
have the following.
(i) |L+ | ∼ π + n and |L− | ∼ π − n.
(ii) If v ∈ L+ then |R + (v)| ∼ π − n. If v ∈ L− then |R − (v)| ∼ π + n.
(iii) If v ∈ L+ then R + (v) contains ∼ ξ + n arcs. If v ∈ L− then R − (v) contains ∼ ξ − n arcs.
(iv) If v ∈ L+ then R + (v) ⊇ L− , and if w ∈ L− then R − (w) ⊇ L+ .
(v) There is a unique giant strongly connected component, with vertex set S = L+ ∩ L− of
size |S| ∼ (π + + π − + ψ − 1)n.

In fact, in Section 4, we prove rather more than this. In the notation of [6] there is a bow-
tie digraph B = D[L+ ∪ L− ] of expected size (1 − ψ)n induced by the union of the vertices
with large fan-out or fan-in. This digraph B consists of a maximal strongly connected
component S with vertex set L+ ∩ L− and wings K + with vertex set L+ ∩ L− and K −
with vertex set L− ∩ L+ . The wing K + has arcs directed towards S and the wing K − has
arcs directed away from S. The size of any branching in the wings is  A0 ∆2 log n. The
wings are of expected size asymptotic to (1 − π + − ψ)n and (1 − π − − ψ)n respectively.
Finally we consider digraphs where there are no vertices of in-degree zero or out-degree
zero. The only significant obstruction to the digraph being strongly connected is the
existence of small directed cycles consisting entirely of vertices of out-degree 1, or entirely
of vertices of in-degree 1.
 
Theorem 1.3. Let (li,j ) be proper, and let li,0 = l0,j = 0, then w.h.p. the structure of D
is as follows.

(i) There is a unique giant strongly connected component S in D, of size n − O(∆ n log n).
(ii) There is a collection C of vertex-disjoint directed cycles. The vertices of any such cycle
are all of out-degree 1 or all of in-degree 1. The subset of C , consisting of those cycles
of in-degree 1 and out-degree 1, has vertex set L+ ∩ L− .
(iii) Each cycle in C is connected to S by zero or more directed induced paths, all such paths
having the same direction with respect to the given cycle.
(iv) Any directed path between two cycles in C goes through S.
(v) The expected number of vertices on cycles in C is β, where
1 1 −
+ − (p1 + p1 ).
+
β = log + log
1 − p−
1 1 − p1

(vi) limn→∞ Pr(D is strongly connected ) = e−β .


324 C. Cooper and A. Frieze

1.3. Proper degree sequences


Definition. We say the sequence l = (li,j , i, j  0) is proper if:
P1: θ = (1 + o(1))θ0 and d = (1 + o(1))d0 where either d0  1 −  or 1 +   d0 for absolute
constants θ0 , d0 , ,
  2
P2: i,j i2 pi,j , i,j j pi,j  A3 ,
 i2 jli,j  j 2 ili,j
P3: Let ρ = max( i,j θn , i,j θn ); if ∆ → ∞ with n then ρ = o(∆),
P4: ∆  n1/12 / log n.

1.4. Structure of the proof


In Section 2 we establish Theorem 1.2(iii) from a gap theorem for the number of arcs in
R + (v) (resp. R − (v)) which establishes (1.7) (resp. (1.8)).
In Section 3 we prove Theorem 1.2(i). We show that if a vertex of out-degree k has a
small fan-out, the number of arcs in this fan-out is well approximated by the number of
nodes of k independent copies of B + . This establishes (1.1), (1.2). A similar analysis of
small fan-in establishes (1.4), (1.3).
In Section 4, we prove Theorem 1.2(iii), (iv) and (v). We show that w.h.p. L+ ∩ L− forms
the vertex set S of a maximal strongly connected component.
In Section 5, we give an outline proof of Theorem 1.3.

2. A gap theorem for the number of arcs of the fan-out R+ (v)


for all v ∈ V , the number of arcs of R + (v) is
Theorem 2.1. With probability 1 − O(1/n2 ),
either  A0 ∆ log n or equal to ξ n(1 + O(∆ (log n)/n)).
2 +

2.1. Configuration model


We use the configuration model of Bollobás [5]. In this model a vertex v ∈ V is repres-
ented by two sets of configuration points Wv+ , Wv− where |Wv+ | = d+ (v), |Wv− | = d− (v). Let
 
W + = v Wv+ and W − = v Wv− . If a ∈ Wu+ then the underlying vertex of point a is u,
and we write φ+ (a) = u as a shorthand for this. Similarly, if b ∈ Wv− we write φ− (b) = v.
A configuration F is a random matching of W + with W − . Let F be written as
F = {(a, b) : a ∈ W + , b ∈ W − }.
Let F denote the space of bipartite matchings of W + with W − with the uniform measure.
Associated with F is an underlying multi-digraph DF ∈ M(l) with vertex set V and arc
set
AF = {(u, v) : (a, b) ∈ F, u = φ+ (a), v = φ− (b)}.
The uniform measure on F induces a probability measure Pr(DF ) on multi-digraphs DF .
To justify working with configurations in this paper, we need to argue (i) that if l is proper
then
Pr(DF is simple)  χ > 0, (2.1)
where χ is a constant independent of n, and (ii) if D1 , D2 are two simple digraphs with
vertex set V and the same degree sequence d− , d+ , then Pr(D1 ) = Pr(D2 ).
The Size of the Largest Strongly Connected Component of a Random Digraph 325

Now it is easily seen that


1  +  −
n n
Pr(DF = D1 ) = Pr(DF = D2 ) = di ! di !. (2.2)
(θn)!
i=1 i=1

Let λ(F) denote the number of loops in DF . Let (a, b) ∈ F be redundant if there exists
(a , b ) ∈ F such that φ(a ) = φ(a), φ(b ) = φ(b) and a < a. Let m(F) denote the number of
redundant pairs. The expected number of loops λ(F) is d, and the expected number of
redundant arcs is bounded above by
  
1  2  2 
µ= 2 i pi,j j pi,j .
2θ i,j i,j

We argue next that we can find χ in (2.1) satisfying


e−(d+2µ)
χ . (2.3)
8(d + µ + 1)2
Let A = 2(d + µ). Then
1
Pr(λ(F) + m(F)  A) . (2.4)
2
Next let M be the set of all possible configurations F and let Ma,b = {F ∈ M : λ(F) =
a, µ(F) = b}. We show that, for a + b  A,
|Ma−1,b | a
 (1 − o(1)) , a > 0, (2.5)
|Ma,b | d
|M0,b−1 | b |M0,b | 1
 (1 − o(1)) or  A−b . (2.6)
|M0,b | 2µ |M| n
To prove (2.5) we consider the set of pairs (F, F ) ∈ Ma−1,b × Ma,b such that F is obtained
from F by replacing 2 pairs (x, y), (z, x ) by (x, x ), (z, y), where (x, x ) is a loop. Observe
that each F ∈ Ma−1,b is in at most dθn such pairs and that each F ∈ Ma,b is in at least
a(θn − o(n)) such pairs, and (2.5) follows.
To prove (2.6) we consider the set of pairs Πb = (F, F ) ∈ M0,b−1 × M0,b such that F
is obtained from F by replacing 2 pairs (x , w), (z, y ) by (x , y ), (z, w), where (x, y) is an
existing edge. Observe that each F ∈ M0,b−1 is in at most

Γ(F) = d+ (φ(x))d− (φ(y))
(x,y)∈F

such pairs. Now if F is chosen uniformly from M then EΓ(F) = 2µθn. Changing (x, y),
(x , y ) ∈ F to (x, y ), (x , y) only changes Γ by at most 2∆2 , and we see by the Azuma–
Hoeffding martingale inequality that
t2
Pr(|Γ(F) − EΓ(F)|  t)  exp − .
4θn∆4
|M |
0,b 1
Suppose then that |M| > nA−b . Then, putting t = n/ log n into the above we see that for
almost all F ∈ M0,b we have Γ(F )  2µθn + o(n). Now let M0,b denote such matchings
and let M0,b−1 be those members F of M0,b−1 which occur with a member F of M0,b in
326 C. Cooper and A. Frieze

a pair (F, F ) ∈ Πb . Then Γ(F)  2µθn + o(n) + 2∆2 = 2µθn + o(n). Now each F ∈ M0,b is
in at least b(θn − o(n)) pairs and we deduce that |M0,b−1 |/|M0,b |  (1 − o(1))b/(2µ) and
(2.6) follows.
It follows from (2.4) that we can choose a0 + b0  A such that
|Ma0 ,b0 | 1
 .
|M| 2(A + 1)2
Applying (2.5) we see that
|M0,b0 | e−d
 .
|M| 2(A + 1)2
Applying (2.6) we obtain
|M0,0 | e−(d+2µ)
 .
|M| 2(A + 1)2
This completes the proof of (2.3) and justifies our use of the configuration model.

2.2. The process C + (v)


The process C + (v) exposes matching pairs of F, and hence arcs of DF , one at a time. We
use s to count the number of exposed arcs.
We fix a vertex v and generate F, starting with those pairs in F which are the edges
used in the construction of R + (v) when fanning out from v.
Let M(0) = ∅, and let M(s) = {(ai , bi ), i = 1, . . . , s} be the partial matching generated by
the end of step s. Let U + (0) = U − (0) = ∅, and let U + (s) = {ai : i = 1, . . . , s}, U − (s) = {bi :
i = 1, . . . , s}.
Let A(0) = {v} and let A(s) = {x ∈ V : φ− (bi ) = x, 1  i  s} be the set of vertices
acquired by the process up to this point. Thus A(s) ⊆ R + (v).
Let I + (s) ⊆ W + be the out-points of those vertices acquired by the process. Thus
I (0) = Wv+ and I + (s) = ∪u∈A(s) Wu+ . The elements of I + (s) are considered to be ordered
+

with respect to the step in which they are acquired. The ordering of elements acquired
in the same step is arbitrary. In terms of the digraph DF each step will either add a new
vertex vs reachable from v or add an arc to a vertex already reached. Let DF (v, s) denote
the subgraph of DF induced by the arcs {φ+ (ai ), φ− (bi )}.
At step s + 1, an element as+1 of I + (s) \ U + (s) is chosen and matched with a point bs+1
chosen u.a.r. from W − \ U − (s). Then
M(s + 1) = M(s) ∪ {(as+1 , bs+1 )},
U + (s + 1) = U + (s) ∪ {as+1 },
U − (s + 1) = U − (s) ∪ {bs+1 },
and if vs+1 = φ− (bs+1 ) then
A(s + 1) = A(s) ∪ {vs+1 },
I + (s + 1) = I + (s) ∪ Wv+s+1 .
We note that, of course, vs+1 may already be an element of A(s).
The Size of the Largest Strongly Connected Component of a Random Digraph 327

Define X(s) = |I + (s)| − s and let


σ = min{s : X(s) = 0}. (2.7)
At time σ we will have computed the fan-out of v, R + (v) = A(σ) and R + (v) induces σ arcs
in DF .
The configuration F can then be completed by randomly pairing up the the elements
of W + \ U + (σ) with W − \ U − (σ).

2.3. Branching process lemma


This lemma summarizes some standard results about probability-generating functions.

Lemma 2.2. Let f(x) = pi xi be a probability-generating function with expected value d.
(i) f(x) is monotone increasing and convex in [0, 1].
(ii) If p1 = 1, the equation f(x) = x has one solution at x = 1 and at most one other solution
β in [0, 1).
(iii) If d > 1 there are exactly two solutions to f(x) = x in [0, 1]. If p0 = 0 then 0, 1 are the
solutions. If p0 > 0 there is a solution β ∈ (0, 1).
(iv) Let d > 1 and let β be the solution in [0, 1) of x = f(x). Then f (x) < 1 for all x ∈ [0, β].

Proof. (iv) If d > 1 then p1 < 1 and there are exactly two solutions, β and 1, to x = f(x)
in [0, 1]. If p0 = 0 then β = 0 and f (0) = p1 < 1. If p0 > 0 then f(0) = p0 > 0, and we have
that f(x) > x for 0  x < β and f(x) < x for β < x < 1. Thus, at x = β, the slope of y = f(x)
at x = β is less than the slope of y = x. Thus f (β) < 1. Moreover f (x) is an increasing
function of x.

2.4. Approximations for the expected value of X (s)


Let the vertex set V = [n] be partitioned into fixed sets Vi,j = {v : d− (v) = i, d+ (v) = j},
where li,j = |Vi,j |.
The set V \ A(s) is the set of vertices of DF not acquired by the process after step s. Let
δi,j (s) = |Vi,j \ A(s)| be the number of un-acquired vertices of in-degree i and out-degree j.
Thus

X(s) = θn − s − jδi,j (s). (2.8)
i,j

Let u be a vertex of in-degree i. Theprobability that no point in Wu− has been matched
by the process after step s is θns− i / θns . This follows because the sequence S − = (b1 , . . . , bs )
has been selected u.a.r. without replacement, from W − \ Wu− .
We see that X(s) is a function only of the initial vertex v0 and the sequence S − . Thus,
for 0  s  θn,

  θns− i
EX(s) = θn − s − j li,j − 1v0 ∈Vi,j θn
 ,
i,j s

where 1v0 ∈Vi,j is an indicator variable for the event that v0 ∈ Vi,j .
We now give some approximations for EX(s).
328 C. Cooper and A. Frieze

Lemma 2.3.
(i) If s∆ = o(n) then

EX(s) = ((d − 1)s + d+ (v0 ))(1 + o(1)).

(ii) If s > ∆ then


 s
i
EX(s) = θn − s − θn p−
i 1− + O(∆).
i
θn

(iii) Suppose d > 1, p−


0  0 and let

s  s
i
g(s) = θn 1 − − p−
i 1− .
θn i
θn

Let f(x) = p− −
i x , and let β = 1 − η be the solution in [0, 1) to x = f(x).
i

(a) 1 − f (β) > 0.


(b) If ξ + is as in (1.7) then g(ξ + n) = 0, and provided h = o(n),

g(ξ + n + h) = −h(1 − f (β) + o(1)).

(c) The function g(s) has a unique maximum s∗ in (0, ξ + n) at s∗ = cn, c > 0.
(iv) If d+ (v0 ) > 0 and d > 1, then the unique solution s0 to EX(s) = 0 in 0  s  θn satisfies
s0 = ξ + n + O(∆).

Proof. (i) Provided s∆ = o(n),


(θn − i)s i i i
= 1− 1− ··· 1 − (2.9)
(θn)s θn θn − 1 θn − s + 1

s−1
1  1 1 s 3 i3
=1−i + i2 +O 3 3
θn − j θn − j θn − k θ n
j=0 j<k
2 2
si s i
=1− +O (2.10)
θn θn2
= 1 − o(1).

Thus, using (2.10),


 (θn − i)s
EX(s) = θn − s + d+ (v0 )(1 − o(1)) − jli,j
i,j
(θn)s
ρs2
= (d − 1)s + d+ (v0 )(1 − o(1)) + O
θn
= ((d − 1)s + d+ (v0 ))(1 − o(1)),

as l is proper.
The Size of the Largest Strongly Connected Component of a Random Digraph 329

(ii) If s > ∆,

(θn − i)s  θn − s − j
i−1
=
(θn)s θn − j
j=0
i
s s i(i − 1) s
= 1− exp − 1+O .
θn (θn)2 2 n

This follows because

θn − s − j θn − s sj 1
= 1− .
θn − j θn (θn)2 (1 − s/θn)(1 − j/θn)

Thus
 s
i
ρs
EX(s) = θn − s − θn p−
i 1− +O + O(∆).
θn θ2 n
i

As l is proper, if ∆ → ∞, ρ = o(∆) the result follows.

(iii) (a) See Lemma 2.2.

(b) Certainly g(ξ + n) = 0, by definition of β. Thus, for some δ ∈ [0, 1],

h2
g(ξ + n + h) = g(ξ + n) + hg (ξ + n) +
g (ξ + n + δh))
2
 jli,j ξ+
i−1
h2 ρ
= h −1 + i 1− +O
θn θ θn

= −h 1 − f (β) + O ,
n
 − i−1
where f (β) = ipi β . As 0  f (β) < 1 (by Lemma 2.2), the Taylor expansion above is
of order h for any β ∈ [0, 1) and h = o(n). Thus

∂g(s)  li,j s
i−1
= −1 + ij 1−
∂s i,j
θn θn
s
= −1 + f 1− .
θn

(c) Now f (0) = p− 1 , f (1) = d and f (1 − s/θn) is strictly monotone decreasing in s if


p−
0 + p−
1 < 1. So provided d > 1, g(s) has a unique maximum at s∗ = cn, c > 0.

(iv) Let s = ξ + n + h, where h = o(n). Then

EX(ξ + n + h) = −h(1 − f (β) + o(1)) + O(∆).


330 C. Cooper and A. Frieze

2.5. Results on the stopping time σ of C + (v)


Theorem 2.4. Let F ∈ F .For v ∈ V let σ(v) be the stopping time of C + (v) defined in (2.7).
With probability 1 − O n12 , for all v ∈ V ,
2

(i) if d < 1 then σ(v) < 6 (d−1)2 log n;

(ii) if d > 1 then


2

(a) either σ(v) < 6 (d−1)2 log n, or

√ √
4∆ n log n 4∆ n log n
(b) ξ + n − 1−f (β) < σ(v) < ξ + n + 1−f (β) .

Proof. (i) If s∆ = o(n) we see from Lemma 2.3(i) that



EX(s) = d+ (v0 ) + (d − 1)s (1 + o(1)),

and so if d < 1, EX(s) < 0 when s > d+ (v0 )/(1 − d).


Now, X(s) depends only on the sample S − = (b1 , . . . , bs ) of length s, which is obtained
by independent sampling without replacement from the set W − . Changing the point bs
selected can only change X(s) by at most ∆. Applying the Azuma–Hoeffding martingale
inequality, we get

Pr(X(s) > 0)  Pr(|EX(s) − X(s)|  |EX(s)|)


(EX(s))2
 2 exp −
2s∆2
(d − 1)2 s
 2 exp − .
3∆2
2
√ √
(ii) Let sL = 6 (d −

1)2
log n, sU = ξ + n − 4∆1 − nf log n + 4∆ n log n
(β) and sUU = ξ n + 1 − f (β) .
We prove that, if X(s) > 0 for all s < sL , then with probability 1 − o(n−3 ), we have
X(s) > 0 for sL  s  sU and X(sUU ) < 0.
For ∆s = o(n),

(d − 1)2 s
Pr(X(s) = 0)  Pr(|EX(s) − X(s)|  EX(s))  2 exp − ,
2∆2

and so the proposition holds for sL  s = o(n/∆). As g(s) = EX(s) − O(∆) is increasing
from s = 0 up to s∗ = cn, we need only consider s∗  s  sU . However, using the Azuma–
Hoeffding martingale inequality once again we see that
 9n
Pr(|EX(s) − X(s)| > 3∆ n log n)  2 exp − log n . (2.11)
2s
√ √
If cn  s < ξ + n − 4∆1 − nf log n
(β) the Lemma 2.3(iii(b)) implies EX(s)  (4 − o(1))∆ n log n and
so (2.11) implies σ(v) is unlikely to be smaller than claimed. A similar argument using
the√ fact that g(ξ + n) = 0 shows that with high enough probability, X(s) < 0 for s = ξ + n +
4∆ n log n
1 − f (β) .
The Size of the Largest Strongly Connected Component of a Random Digraph 331

3. The number of vertices with a large fan-in or fan-out


We make a detailed analysis of the process C + (v).
For s < A0 ∆2 log n, let
qj+ (s + 1) = Pr(|I + (s + 1)| − |I + (s)| = j), for j  0.
Then
 
ili,j − i i|Vi,j ∩ A(s − 1)|
qj+ (s) = i
, j  1, (3.1)
θn − (s − 1)
 
i ili,0 + i,j1 i|Vi,j ∩ A(s − 1)| − (s − 1)
q0+ (s) = . (3.2)
θn − (s − 1)
We see immediately that we can simplify this to
s∆
qj+ (s) = p+
j +O . (3.3)
n
We note that, once a vertex has branched for the first time, it is assigned an out-degree
of zero in subsequent branchings. The value of q0+ (s) is updated to account for its unused
configuration in-points. This covers the case where the fan-out of the vertex is not an
arborescence.
Recall that L+ (resp. L− ) is the set of vertices of D with large ( A1 n) fan-out (resp.
fan-in).

Theorem 3.1. The following results hold w.h.p.


(i) If d < 1 then L+ , L− = ∅.
(ii) If d > 1 then E|L+ | = (1 + o(1))π + n and E|L− | = (1 + o(1))π − n.

Proof. (i) This follows from Theorem 2.4(i).


(ii) Let B + (k) denote k = d+ (v) independent copies of B + rooted at v. We compare the
process C + (v) with B + (k). We consider both processes to grow in a breadth-first manner.
As there is a minor problem associated with indices j for which p+ j is too 
small, we
consider a modified process in which these probabilities become zero. Let ζ = ∆ log n/n,

j  ζ} and let κ =
let J = {j : p+ +
j∈J pj = O(ζ∆). We then define

+ 0 j ∈ J,
p̃j = (3.4)
p+
j /(1 − κ) j∈/ J, j  ∆.

Remark 1. The probability that either C + (v) or B + (k) add a node with out-degree j ∈ J
during the first s steps is at most κs.

So for small s we can ‘safely’ restrict our attention to processes C  + (k) which
 + (v), B
+
are free from small positive pj . In particular, B  (k) is the branching process with root
+

node v of degree k and progeny distribution p̃+ . Similarly C  + (v) is C + (v) conditional
on |I (s + 1)| − |I (s)| ∈
+ +
/ J for s  0. Using q̃j (s) to denote the conditional equivalent of
+
332 C. Cooper and A. Frieze

qj+ (s), we see from (3.3) and (3.4) that


s∆
q̃j+ (s) = p+
j 1+O + ∆ζ . (3.5)

In the case where the branching process has reached extinction after s steps, it forms
a tree T with s edges, rooted at v, in which v has out-degree k. The out-degrees of the
leaves are zero, and in general the out-degrees of the nodes of T are k = k0 , k1 , . . . , ks ,
where nodes are labelled in their order of addition to the branching process. We compare
the probabilities of the two processes in terms of these trees. From (3.5),

s
 + (k) = T ) = p+
Pr(B kj (1 + O(∆ζ)),
i=1

s
i∆
 + (v) = T ) =
Pr(C p+
kj 1 + O + ∆ζ .

i=1

Thus

 + (k) = T ) = Pr(C
 + (v) = T ) s2 ∆
Pr(B 1+O + s∆ζ . (3.6)

Let |B + (k)| denote the number of edges in the tree generated by B + (k). Let ρ+
E (v) denote
the number of edges in the fan-out of v in the digraph D. Remark 1 and (3.6) imply that
+
E (v) = s) = O(κs) + (1 − O(κs)) Pr(|C (v)| = s)
Pr(ρ+
s2 ∆
= Pr(|B + (k)| = s) + O + s∆ζ . (3.7)

Next let τ = 6∆2 log n/(d − 1)2 . We will show (below) that
Pr(τ  |B + (k)| < ∞) = O(n−1 ). (3.8)
Clearly,
Pr(|B + (k)| < ∞) = (1 − η + )k . (3.9)
It follows from (3.7), (3.8) and (3.9) that

Pr(|R + (v)|  τ + 1) = Pr(ρ+
E (v) = s)
sτ

τ3 ∆
= Pr(|B + (k)|  τ) + O + τ2 ∆ζ

= (1 − η + )k + o(1).
Part (ii) now follows from Theorem 2.1 and (1.2).

Proof of (3.8). We consider a regenerative branching process rooted at vertex v, which


permits positive or negative levels of spare nodes. Specifically,
Z(0) = k,
Z(s + 1) = Z(s) + P (s + 1) − 1,
The Size of the Largest Strongly Connected Component of a Random Digraph 333

where P (s + 1); the progeny at step s + 1 is an independent random variable with


distribution p+ . Thus EZ(s) = (d − 1)s + k and
Pr(Z(s)  0)  Pr(|Z(s) − (d − 1)s|  (d − 1)s)
(d − 1)2 s
 2 exp − .
2∆2
Hence
Pr(∃s  τ : Z(s)  0) = O(n−1 )
and (3.8) follows.

4. Structure of the giant component


If d < 1 then w.h.p. any strongly connected component is of size O(∆2 log n). We prove
below that if d > 1 then w.h.p. D contains a giant strongly connected component.
The condition given in [14], for the existence (w.h.p.) of a giant component in a graph

G with degree sequence (λk n), is that Q > 0 where Q = k k(k − 2)λk . In our notation,
  li,j
Q= k(k − 2)
n
k i+j=k
 li,j
= 4θ(d − 1) + (i − j)2 ,
i,j
n

after some re-arrangement.


Thus if d > 1 then Q > 0, and the underlying graph G of the digraph D has a giant com-
ponent w.h.p. We note that the converse is untrue. For example, let |V0,∆ | = n/2, |V∆,0 | = n/2.
Then Q = ∆(∆ − 2) > 0, so for ∆ > 2 w.h.p. there is a giant component in G. However,
there can never be a giant strongly connected component in D.

Lemma 4.1. Let v ∈ V and let u ∈ R + (v). Then


∆5 log2 n

(i) Pr(R − (v) is small | R + (v) is small ) = Pr(R − (v) is small ) + O ,
n
∆5 log2 n

(ii) Pr(R + (u) is small | R + (v) is small ) = Pr(R + (u) is small ) + O n .

Proof. We give a proof of part (i); the proof of part (ii) is similar. Let R + (v) be
small, so that σ = O(∆2 log n). Given the matching M(σ) determined by step σ, the
remaining configuration F  = F \ M(σ) is random. We now start a process C  − (v) with
+ −
W = W + \ U + (σ) and W = W − \ U − (σ). Let τ be the step at which the construction
of R − (v) halts. The results of the previous sections still hold, provided that at no step
t  τ we hit a vertex with a configuration point in U − (σ). Let U − (τ) be the points of W −
used in the construction of R − (v) and let S = ∪x∈R + (v) Wx− be the points we wish to avoid.
Thus

 (τ) = ∅) = O(∆2 log n × ∆)
Pr(S ∩ U O(∆2 log n).
θn
334 C. Cooper and A. Frieze

Corollary 4.2.
(i) E(|L+ ∩ L− |) ∼ ψn.
(ii) E(|L+ ∩ L− |) ∼ (1 − π − − ψ)n.
(iii) E(|L+ ∩ L− |) ∼ (1 − π + − ψ)n.
(iv) E(|L+ ∩ L− |) ∼ (π + + π − + ψ − 1)n.

(v) |L+ |, |L+ |, |L− |, |L− | and the quantities in (i)–(iv) above are concentrated within n∆5/2
(log n)2 of their expected value, with probability 1 − O(1/ log2 n).
Proof. We note that Corollary 4.2(v) follows from Lemma 4.1 and the Chebyshev
inequality.
Lemma 4.3. With high probability, the following conditions hold simultaneously for all
u, v ∈ V .
2
(i) |R + (u)|, |R + (v)| > A0 (d −

1)2
log n implies R + (u) ∩ R + (v) = ∅.
(ii) |R + (u)|, |R − (v)| > A0 (d − log n implies R + (u) ∩ R − (v) = ∅.
2

1)2

Proof. Let R + (u) be large, and let J − (s) be the unpaired ‘in-points’ of A(s) immediately
after step s of C + (u). Thus

J − (s) = ∪x∈A(s) Wx− \ U − (s).
The expected value of J − (s) is  
 θn − i

EJ (s) = ili,j 1− s
θn
 − s.
i,j s

Let qi− =
ili,j
j θn . As R + (u) is large, and so w.h.p. σ ∼ ξ + n then, from Lemma 2.3(ii), we
can write
ξ+  −
i
ξ+
EJ − (σ) = θn(1 + o(1)) 1 − − qi 1 − .
θ i
θ

We now prove that EJ − (σ) ∼ cn for some c > 0.


 
We note that 0  qi−  1 and i qi− = j p+ −
j = 1. Thus (qi ) is a probability distribution

with expected value d > 1. As q0− = 0, the only positive solutions of x = i qi− xi are 0
 
and 1. Moreover i qi− xi is convex, so x > i qi− xi for x ∈ (0, 1).
Given M(σ) determined by halting C + (u) after step σ, we can start C  + (v) on the
+ −
configuration F  with W = W + \ U + (σ) and W = W − \ U − (σ), and consider R  + (v).
+ +
 (v) is completed in s = O(∆2 log n) steps. This implies either R
Possibly R  (v) = R + (v)
+
 (v)) is nonempty.
is small or R + (u) ∩ (R + (v) \ R
+
Suppose now that R  (v) is large. At any step there are at most θn − σ available in-
points. Thus the probability that, at step s, there is a matching arc (as , bs ) with bs ∈ J − (σ)
stochastically dominates the corresponding probability for a binomial B(s, c/θ) random
variable. Thus, when s = (2θ/c) log n,
s

 (s) ∩ J − (σ) = ∅)  c
Pr(U 1− = O(n−2 ).
θ

Let D[X] denote the sub-digraph of D induced by the vertex set X.


The Size of the Largest Strongly Connected Component of a Random Digraph 335

Corollary 4.4.
(i) Let S = L+ ∩ L− . Then w.h.p. |S| ∼ (π + + π − + ψ − 1)n.
(ii) D[S] is a maximal strongly connected component.
(iii) (a) If u, v ∈ S then R + (u) = R + (v).
(b) Let R + (S) be the fan-out of (any vertex of ) S. Let K + (v) = R + (v) \ R + (S). Then
w.h.p. for all v ∈ L+ \ L− , |K + (v)| = O(∆2 log n).
(iv) With high probability, for all v ∈ L+ , R + (v) ⊇ L− .

Proof. (i) follows from Corollary 4.2.


(ii) For all u, v ∈ S, we have R + (u) ∩ R − (v) = ∅ by Lemma 4.3(ii). Hence there is a
directed (u, v)-path, and it follows that S is strongly connected. Let w ∈ V and suppose
there is a directed path from S to w, and from w to S. Thus R + (w) and R − (w) are large,
so w ∈ S.
 + (v) in the proof of Lemma 4.3.
(iii) K + (v) is R
(iv) This follows from Lemma 4.3(ii).

In the notation of [6] there is a bow-tie digraph B = D[L+ ∪ L− ] induced by the union
of the vertices with large fan-out or fan-in. This digraph B consists of a maximal strongly
connected component S with vertex set L+ ∩ L− and wings K + with vertex set L+ ∩ L−
and K − with vertex set L− ∩ L+ . The wing K + consists of directed paths from L+ ∩ L−
terminating at vertices of S. Similarly the wing K − consists of paths directed away from
S and passing through vertices of L− ∩ L+ .
Each vertex v of K + has a small out-branching K + (v) of which the sub-branching
K (v) ∩ K + points from v towards S. Similarly, each vertex v of K − has a small in-
+

branching K − (v) of which the sub-branching K − (v) ∩ K − points away from S towards v.
The maximum size of the branchings K + (v), K − (v) is O(∆2 log n).
For a vertex v ∈ L+ , R + (v) = K + (v) ∪ S ∪ K − . Very possibly K + (v) \ K + = ∅. In fact
these vertices comprise a substantial part of V (C1 ) − V (B) = M , the vertices of the giant
component C1 which do not lie within the bow-tie B.
For sequences satisfying the conditions of [14], we can estimate the size of M . Let

lk = i+j=k li,j , and let γ be the smallest nonnegative solution of
 klk
x= xk−1 .
2θn
k1

From [14], the size of the giant component C1 is asymptotic to αn where


 lk
α=1− γk .
n
κ1

Thus we have the following corollary.

Corollary 4.5. For sequences (lk ) satisfying [14], w.h.p. the size of M is (1 + o(1))(α − 1 +
ψ)n.

More information on the k-cores of the giant is given in [7].


336 C. Cooper and A. Frieze

5. The case where p0− = p0+ = 0: Proof of Theorem 1.3


We first prove a lemma which shows that w.h.p. small fan-outs (resp. fan-ins) are at most
unicyclic.

Lemma 5.1. With high probability, for all v ∈ V , the number of edges induced by R + (v)
during the first O(∆2 log2 n) steps of C + (v) is at most |R + (v)|.

Proof. Let A(s) be the vertices of R + (v) acquired by the end of step s of the process C + (v).
Consider the matching edge (as+1 , bs+1 ) selected at step s + 1. Let J(s) = (∪u∈A(s) Wu− ) −
U − (s) be the set of unused configuration points of A(s) in W − . The probability that an
element of J(s) is chosen as bs+1 is O(∆s/θn).
The probability that there exists a vertex v such that this event occurs twice during the
first O(∆2 log2 n) steps of C + (v) is at most
2
∆3 log2 n
O n(∆2 log2 n)2 = o(1).
θn

Because the minimum in-degree and out-degree of the digraph is at least 1, at least one
of the following is true for every vertex v. The vertex v is either on a directed cycle C,
or on a directed path P which originates in a directed cycle C1 , and leads to a directed
cycle C2 .
If v ∈ L+ ∩ L− then v must be on an isolated directed cycle C. For if not, the edge
density of the fan-out of a vertex of C1 contradicts Lemma 5.1.
Let v ∈ L+ ∩ L− , so that v ∈ K − . Suppose first that v lies on a cycle C. If there is
an edge incident with and directed away from C, this leads to a cycle C contradicting
Lemma 5.1. Next, let v be on a path P originating at a vertex w ∈ S. P terminates in a
cycle C. If any vertex of P , except w, has out-degree at least 2, we obtain a contradiction
to Lemma 5.1.
We now consider the expected number m+ (k) of subsets of vertices of out-degree 1,
forming directed cycles of length k  2. Thus
 (k − 1)!  li,1
m+ (k) = ifi
 (θn)k fi
fi =k i1

1  k 1  fi
= i (li,1 )fi
k f1 , . . . , fi , . . . (θn)k
fi =k i1

1   ili,1
2 fi
k k
= 1+O
n k f1 , . . . , fi , . . . θn
fi =k
2
k 1 +k
= 1+O (p ) .
n k 1
Theorem 1.3(i), (ii) follow from standard techniques. Theorem 1.3(iii) follows by applying
the methods of [2].
The Size of the Largest Strongly Connected Component of a Random Digraph 337

References
[1] Aiello, W., Chung, F. R. and Lu, L. (2000) A random graph model for massive graphs. In Proc.
32nd Annual ACM Symposium on Theory of Computing, pp. 171–180.
[2] Arratia, R., Goldstein, L. and Gordon, L. (1989) Two moments suffice for Poisson approx-
imations: The Chen–Stein method. Ann. Probab. 17 9–25.
[3] Bollobás, B. (2001) Random Graphs, 2nd edn, Academic Press.
[4] Bollobás, B. (1984) The evolution of random graphs. Trans. Amer. Math. Soc. 286 257–274.
[5] Bollobás, B. (1980) A probabilistic proof of an asymptotic formula for the number of labelled
regular graphs. Europ. J. Combin. 1 311–316.
[6] Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. and
Wiener, J. (2000) Graph structure in the web. Proc. 9th WWW Conf. 309–320.
[7] Cooper, C. (2002) The size of the cores of a random graph with a given degree sequence.
Submitted to RSA.
[8] Erdős, P. and Rényi, A. (1960) On the evolution of random graphs. Publ. Math. Inst. Hungar.
Acad. Sci. 5 17–61.
[9] Janson, S., Knuth, D. E., L  uczak, T. and Pittel, B. G. (1993) The birth of the giant component.
Random Struct. Alg. 4 233–358.
[10] Janson, S., L uczak, T. and Ruciński, A. (2000) Random Graphs, Wiley, New York.
[11] Karp, R. M. (1990) The transitive closure of a random digraph. Random Struct. Alg. 1 73–94.
[12] L
 uczak, T. (1990) Components behavior near the critical point of the random graph process.
Random Struct. Alg. 1 287–310.
[13] L
 uczak, T., Pittel, B. G. and Wierman, J. C. (1994) The structure of a random graph near the
point of the phase transition. Trans. Amer. Math. Soc. 341 721–748.
[14] Molloy, M. and Reed, B. A. (1995) A critical point for random graphs with a given degree
sequence. Random Struct. Alg. 6 161–180.
[15] Molloy, M. and Reed, B. A. (1998) The size of the largest component of a random graph on a
fixed degree sequence. Combin. Probab. Comput. 7 295–306.
[16] Uno, Y. and Ibaraki, T. Reachability problems of random digraphs. Graphs and Networks,
E81-A 2694.

You might also like