Routing Betweenness Centrality
by
Shlomi Dolev, Yuval Elovici, and Rami Puzis
Technical Report #2009-09
August 2009
1
Routing Betweenness Centrality
Shlomi Dolev, Yuval Elovici, and Rami Puzis
Ben-Gurion University of the Negev
August 10, 2009
Abstract
Betweenness centrality measure is often used in social and computer communication networks to
estimate the potential monitoring and control capabilities a vertex may have on data flowing in the
network. In this paper we define the Routing Betweenness Centrality (RBC) measure which generalizes
previously well known Betweenness measures such as the Shortest Path Betweenness, Flow Betweenness,
and Traffic Load Centrality by considering network flows created by arbitrary loop-free routing strategies.
We present algorithms for computing RBC of all the individual vertices in the network and algorithms
for computing the RBC of a given group of vertices, where the RBC of a group of vertices represents
their potential to collaboratively monitor and control data flows in the network. Two types of collaborations are considered: (i) conjunctive – the group is a sequences of vertices controlling traffic where all
members of the sequence process the traffic in the order defined by the sequence and (ii) disjunctive –
the group is a set of vertices controlling traffic where at least one member of the set processes the traffic.
The algorithms presented in this paper also take into consideration different sampling rates of network
monitors, accommodate arbitrary communication patterns between the vertices (traffic matrices), and
can be applied to groups consisting of vertices and/or edges.
For the cases of routing strategies that depend on both the source and the target of the message, we
present algorithms with time complexity of O(n2 m) where n is the number of vertices in the network and
m is the number of edges in the routing tree (or the routing directed acyclic graph (DAG) for the cases
of multi-path routing strategies). The time complexity can be reduced by an order of n if we assume
that the routing decisions depend solely on the target of the messages.
Finally we show that a preprocessing of O(n2 m) time, supports computations of RBC of sequences
in O(kn) time and computations of RBC of sets in O(k3 n) time, where k in the number of vertices in
the sequence or the set.
1
Introduction
Networks are commonly used to represent a domain, a problem, or a complex dynamic system in a large
variety of scopes [31]. Representing, for example, social networks [33], protein interactions [9], urban structure [27], and computer communication networks [15, 35]. Various centrality measures such as Degree,
Closeness, and Betweenness [8, 17] were introduced in order to analyze networks and understand both the
global dynamics of the networks and the roles played by individual nodes. Many naturally evolved complex
networks are characterized by a power-law distribution of Degree and Betweenness-Centrality measures of
their nodes [5, 15]. Such networks, referred to as Scale-Free Networks [3, 4], are highly resistant to random
damages but are easily partitioned by removing the most central nodes [6].
The centrality characteristics of nodes are also important in the field of computational epidemiology,
where it has been shown that immunizing central nodes can significantly reduce the impact of epidemics [26].
In the scope of the Internet, Jackson et al. [22] suggest placing monitors on links of the autonomous system
level topology of the Internet, with end nodes having the highest Degree. The computation of Group
Betweenness-Centrality was suggested in [30] to identify groups of autonomous systems which can collaborate
to trace the communications of as many Internet users as possible.
2
In this paper we concentrate on Betweenness-Centrality measures [2, 16], originally defined to estimate
the control an individual may have over communication flows in social networks. Betweenness-Centrality
measures may be used to estimate the monitoring capabilities, control capabilities, and/or functionality
importance of nodes in communication networks. The concept of Betweenness-Centrality evolved into a
broad class of diverse measures that consider different types of network flows [7]. Betweenness, which is now
referred to as Shortest Path Betweenness-Centrality (SPBC) assumes that only shortest paths are used to
transfer the network flow. Traffic Load Centrality (TLC) [11, 19, 24] is a variant of betweenness that also
assumes that traffic flows over shortest paths, but uses a different routing mechanism. When devising a
routing strategy in a commercial communication network, factors such as load balancing, fault tolerance,
and service level agreements must be considered. Unfortunately, these factors may lead to traffic flows that
are not routed along shortest paths to the target and, therefore, ignored by SPBC and TLC.
Table 1: Time and space complexity of the proposed algorithms.
Section / Alg.
4.1 / 1
4.2 / 2
4.3 / 3
5.1 / 4
5.2 / 5
5.2 / 6
5.3 / 7
5.3 / 8
5.4 / 9
5.4 / 10
Scope
All nodes
Sequence
Set (nodes)
All nodes
Sequence
Sequence
Set (nodes)
Set (nodes)
Set (links)
Set (mixed)
Routing depends on
source and target
source and target
source and target
only target
only target
only target
only target
only target
only target
only target
Space
O(m)
O(m)
O(m)
O(m)
O(m)
O(n3 )
O(m)
O(n3 )
O(n3 )
O(n3 )
Preproc. time
O(n2 m)
O(n2 m)
O(n2 m)
O(n2 m)
Query time
O(n2 m)
O(n2 m)
O(n2 m)
O(nm)
O(nm)
O(nk)
O(nm)
O(k3 n)
O(k3 n)
O(k3 n)
n – number of nodes in the network; m – maximal number of edges in routing trees (or a routing directed acyclic graphs (DAG) for
multi-path routing schemes); k – number of nodes in a sequence or a set.
Flow Betweenness-Centrality (FBC) proposed by Freeman et al. [18] equally considers routes of all lengths
and assumes that routes are simple (containing no cycles). While simple routes is a reasonable assumption
for communication networks, routing strategies in computer network usually do prefer shorter paths over
longer paths. Random Walk Betweenness-Centrality (RWBC), proposed by Newman [25], assumes that
shorter paths are used more than longer ones. However, RWBC assumes that routes may contain cycles,
which is not the case in most communication networks. Besides the path length issue, each one of the above
Betweenness-Centrality measures assumes a fixed communication model that does not fully match routing
strategies used in communication networks such as the Internet. In this paper we propose a more flexible
and realistic measure called Routing Betweenness Centrality, that accommodates a wide class of routing
strategies.
Most routing protocols create routing tables that match the destination address of a packet with one
output port. Occasionally routing tables are changed if one of the links, attached to a router, is unavailable
due to malfunctions or congestions. During the time period when routing tables do not change they create
spanning trees rooted at every target node in the network. Some routing protocols maintain shortest paths
to the target while others balance the traffic load on the network by forwarding superfluous traffic though
less loaded routes which are not necessarily shortest [1, 23]. Routing protocols may utilize multiple paths
from source to target which are not necessarily shortest, but it is important to note that, in the stable state
they do not contain loops.
Routing Betweenness-Centrality (RBC), as defined in this paper accommodates arbitrary loop-free routing schemes where routing decisions depend on the packet target alone or on both the source and the target
of the packet. It is easy to show that for computing SPBC, TLC, and FBC are particular cases of RBC. We
elaborate on SPBC, TLC, and FBC in Section 2 and show how to define a routing strategy that will match
their communication model in Section 3. We also add the notion of sampling rate which was not considered by prior algorithms for computing Betweenness-Centrality measures. We present a set of algorithms
for computing RBC of individual nodes, sequences of nodes (e.g. links), and sets of nodes and/or links.
Table 1 summarizes the algorithms discussed in this paper. In Section 4 we present algorithms that, given a
loop-free routing scheme, compute RBC by topologically sorting all nodes between each source-target pair.
3
In Section 5 we reduce the complexity of RBC computations for routing schemes where the routing decisions
are affected only by the target of the packet and how to efficiently compute RBC of sets consisting of both
links and nodes. Conclusions appear in Section 6. Symbols and notation principles used in this paper are
summarized in the Appendix.
2
Preliminaries on Betweenness-Centrality
Shortest Path Betweenness Centrality (SPBC) was introduced in social sciences to measure the potential
influence of an individual over the information flow in a social network [2, 16]. SPBC is defined as the sum
of fractions of all shortest paths between each pair of nodes in a network which traverse a given node:
X σs,t (v)
σs,t
SP BC(v) =
s6=v6=t
, where σs,t is the number of shortest paths connecting s and t and σs,t (v) is the number of paths between
s and t that traverse v. Assume, for example, that the network uses a shortest-path routing scheme where
the route is randomly chosen out of all shortest paths from source to target. Assume also that every node
sends one packet to every other node. In this case, SPBC of a node v is the expected number of packets
that traverse v. Brandes has shown in [10] that shortest paths from a single source s to all other nodes can
be efficiently aggregated by traversing the nodes in the order of a non-increasing distance from s. Efficient
aggregation of shortest paths is used to compute SPBC of all nodes in a network in O(|V ||E|) time, where
V is the set of nodes and |E| is the set of links in the network. SPBC can naturally be extended to Group
Betweenness-Centrality (GBC) [14]. GBC of a set of nodes M is defined as the sum of fractions of all shortest
paths which traverse at least one node in M . GBC of a single group can be computed in O(|V ||E|) [11, 30]
or in O(|M |3 ) time following a preprocessing that takes O(|V |3 ) time.
The definition of GBC resembles the definition of the effectiveness of a group of distributed network
monitors which is defined as the probability that a random packet is sampled at least once by at least one of
the monitors [12]. Moreover, Holme has shown in [212], that the SPBC of a node is highly correlated with the
fraction of time that the node is occupied by traffic. SPBC was also used by Yan et al. [34] for predicting and
avoiding congestions. These findings indicate that SPBC can be used as a heuristic in many network related
tasks such as designing routing protocols, optimizing deployment of network monitors, finding bottlenecks
in the network, etc. Unfortunately, in practice, not all the shortest paths between source and target have
the same probability to transfer a packet as assumed by SPBC.
Traffic Load Centrality (TLC) assumes a more realistic routing strategy, where every node forewords
the packet to a neighbor chosen randomly out of the neighbors that are closest to the target. like SPBC,
TLC can be computed in O(|V ||E|) time for all nodes in the network [11, 24]. The group variant of TLC
can also be computed in O(|V ||E|) as was indicated in [11]. To the best of our knowledge, there are no
algorithms that reduce the time required to compute TLC of a group of nodes using preprocessing. SPBC
and TLC possess similar statistical properties, however, normalized SPBC and TLC can differ up to 30%
for individual nodes in large networks [36]. The drawback of SPBC and TLC measures is that they are both
limited to shortest-path routes while in practice traffic flows may deviate from the shortest paths to increase
the network performance.
There are some Betweenness-Centrality measures that are not limited to shortest paths. Freeman et
al. [18] introduce Maximal-Flow Betweenness-Centrality (FBC) that equally considers all paths from source
to target. Roughly speaking, FBC of the node v is the sum of fractions of the maximal flows betweens each
pair of nodes, that is transferred by the node v:
F BC(v) =
X φs,t (v)
φs,t
s6=v6=t
, where φs,t is maximal flow between s and t and φs,t (v) is the portion of this flow that is transferred by the
node v. Since the maximal flow can utilize different routes from s to t, φs,t (v) should be averaged over all
4
the possibilities. The main drawback of FBC when applied to communication networks is that it does not
prioritize routes according to their lengths, while in practice, most of the traffic is routed through shortest
paths.
The work of Borgatti and Everett [8] categorizes Betweenness-Centrality measures according to the
types of routes assumed and provides valuable insights into the formulation and computation of generic
Betweenness-Centrality measures. Routing Betweenness-Centrality (RBC) defined in this paper is a generalization of SPBC, FBC, and TLC. We present algorithms for computing RBC of individual nodes as well as
sets and sequences of nodes. Algorithms presented in this paper are applicable to a general case of loop-free
routing schemes where the routing decisions depend on both the source and the target of a packet, and
to source-oblivious schemes. For the latter routing schemes we show that RBC can be computed with the
same time complexity as SPBC and TLC (namely in O(|V ||E|) time). We also show how these times can
be reduced using preprocessing when the size of the evaluated group is small compared to the size of the
network.
3
Routing scheme representation
Throughout this paper we assume a loop-free routing scheme. We ignore temporary loops created by routing
oscillations and treat routing oscillations as an unavoidable noise in the system. Instead, we are interested
in a superposition of all stable state routing tables. Each routing decision made along a network path is
dictated by the network topology and the status of the network. Link failures and congestions cause routing
decisions across the network to change from time to time. We assume that either the routing decisions are
deterministic or the probabilities for specific routing decisions can be determined (for example, by analyzing
historical behavior of the network).
Formally, let G = (V, E) be a communication network topology where V is a set of n nodes and E is
a set of links between the nodes. We do not allow self loops such as (v, v) ∈ E. Let T be a traffic matrix
where T (s, t) is the number of packets sent from a source node s to a target node t. In general, T (s, t) can
represent any quantity of interest such as the number of bytes, number of sessions, or the importance of
communication between s and t.
Assume, for example, that a group of monitors is installed on nodes in a network. The total number
of bytes or the total importance of the communication passing through this group can be regarded as its
monitoring potential. However the actual volume of information being monitored depends on the sampling
rates of the monitors (0 ≤ ρv ≤ 1). Our goal is to compute the total expected number of packets sampled by
groups of collaborating monitors. We distinguish between two types of groups: sequences – packets should
be sampled by all the members in the order defined by the sequence – and sets – packets should be sampled
by at least one member.
Let R(s, u, v, t) = p be a quaternary function representing the averaged routing scheme where p is the
probability that u will forward to v a packet with source address s and target address t. Note that we assume
that all routing decisions (such as (s, u, v, t)) are independent. We will use “don’t care” to indicate any
value. For example, R(, u, v, ) = 0 if there is no link from u to v and R(, v, v, ) = 1 by convention. R
defines a directed acyclic graph DAG for each source-target pair. Complexity of the algorithms described
in this paper depends on the number of links in these DAGs. We denote the maximal number of links in all
routing DAGs relevant to the network as m.
The routing scheme R can represent various policies of message or flow transfer methods. We can embed
in R some message transfer methods assumed by different Betweenness-Centrality measures. In these cases
RBC will produce the same values as would be produced by the original Betweenness-Centrality measure.
We will use Figure 1 as a sample network for the following examples.
TLC nodes forward packets to one of the neighbors which are closest to the target with equal probability.
In this case R(s, u, v, t) is equal to one divided by the number of v’s neighbors that are closest to t.
For example in Figure 1 R(s, v1 , v2 , t) = 0.5.
SPBC nodes forward packets to one of the neighbors which are closest to the target. The probability of
5
v1
1/3
v2
v1
v3
s
v3
v1
t
v2
0.5
0.5
1/3
v3
0.25
0.25
0.25
s
t
s
t
0.5
1/3
v4
v2
0.25
1/3
2/3
0.25
v5
2/3
v4
(a) SPBC
v5
0.75
(b) TLC
0.5
v4
v5
(c) FBC
Figure 1: Sample network with traffic flowing from s to t according to SPBC, TLC, and FBC flow models.
u to forward to v a packet targeted at t is equal to the fraction of shortest paths from u to t that
σ (v)
pass through v R(s, u, v, t) = u,t
σu,t . For example in Figure 1 R(s, s, v1 , t) = 2/3 since there are three
shortest paths between s and t, two of which pass through v1 .
FBC For each s, t pair nodes forward packets from s to one of their neighbors to produce maximal flow
between s and t. The probability of u to forward to v a packet from s to t is proportional to the portion
φ ((u,v))
of the s-t-flow carried by the undirected link (u, v): R(s, u, v, t) = s,t
φs,t (u) . For example, if we assume
that in Figure 1 the capacity of all links is 0.5, then R(s, v1 , v2 , t) = 0.5 since the link (v1 , v5 ) is not
utilized by the maximal flow between s and t.
4
Routing Betweenness-Centrality
In this section we define Routing Betweenness-Centrality (RBC), focusing on routing schemes where the
routing decisions depend on the source and the target of a packet. In Section 5 we will show how the
computation of RBC can be optimized when the routing decisions are source-oblivious. In the next three
subsections we present algorithms for computing RBC of individual nodes, sequences of nodes, and sets of
nodes.
4.1
RBC of individual nodes
Assume that a packet is introduced to the network by source node s and destined to leave the network at
target node t. Let δs,t (v) be the probability that this packet will pass through the node v. We will refer
to δs,t (v) and its variants as pairwise dependency of s and t on the intermediate v. δs,t (v) · T (s, t) is the
expected number of packets sent from s to t that pass through v. Note that for special cases where v equals
s, t, or both it holds that δs,t (s) = δs,t (t) = 1. δs,t (v) can be recursively computed for arbitrary v ∈ V based
on the loop-free routing strategy R(s, u, v, t). Let P reds,t (v) be a set of all immediate predecessors of v on
the way to t: P reds,t (v) = {u|R(s, u, v, t) > 0}. Let u be a predecessor of v on the way from s to t. The
probability that a packet will pass through v after visiting u is R(s, u, v, t). Hence, the pairwise dependency
of s and t on v can be computed using pairwise dependency of s and t on v’s predecessors.
δs,t (v) =
X
δs,t (s) = 1
(1)
δs,t (u) · R(s, u, v, t)
u∈P reds,t (v)
Since we assume loop-free routing, P reds,t defines a directed acyclic graph (DAG) [20] as shown in Figure 2a.
Therefore, we can compute δs,t (v) for all v ∈ V in O(m) in the worst case. All we need to do is topologically
sort the DAG induced by P reds,t and iteratively apply Equation 1 on all nodes starting from s.
6
s2
v2
t2
s2
v2
v1 2.5 v3
5
2.5
t2
s2
v2
v1 2.5 v3
2.5
5
2.5
v1
2.5
5
v4 7.5
s1
t1
s1 v1 t2 v3 v4 s2 v2 t1
delta 10 5 0 2.5 7.5 0 0 10
5
v4
s1
t1
s1 v1 t2 v3 v4 s2 v2 t1
delta 10 5 0 2.5 0 0 0 2.5
(a) Single nodes.
(b) Sequence (v1 , v3 )
t2
v3
5
5
v4 5
s1
t1
s1 v1 t2 v3 v4 s2 v2 t1
delta 10 5 0 0 5 0 0 5
(c) Set {v1 , v3 }
Figure 2: Example of a routing DAG from s1 to t1 (dashed gray arrows). In this example we assume T (s1 , t1 ) = 10. The
numbers on the arrows in sub-figures (a), (b), and (c) indicate the delta values contributed by the topologically sorted nodes
to their successors in Algorithms 1, 2, and 3 respectively.
Let RBC of a node v (δ•,• (v)) be the expected number of packets that pass through v.
X
δ•,• (v) =
δs,t (v) · T (s, t)
(2)
s,t∈V
δ•,• (v) can be regarded as the potential of v to inspect or alter communications in the network. Equation
2 resembles the original definition of SPBC with two exceptions. First, each δs,t (v) is multiplied by the
number of packets sent from s to t to compute the traffic load on v. Second, end points are included in
the summation to accommodate communications originating from (or destined to) the investigated node.
Algorithm 1 computes the RBC of all individual nodes in O(n2 m) time using Equations 1 and 2.
Algorithm 1: RBC of nodes
Input: G(V, E), R, T
Output: RBC[1..|V |]
Data: delta[1..|V |]
∀v∈V , RBC[v] = 0
for s, t ∈ V do
H topological sort
E ′ = {(u, v)|R(s, u, v, t) > 0}
D = directed acyclic graph (V, E ′ )
{s = v0 ¹ v1 ¹ . . . ¹ vn = t}
topologically sorted nodes of D
H init delta
∀v∈V , delta[v] = 0;
delta[s] = T (s, t)
H accumulate δ•,• (v)
for i = 0 to n do
for vj ∈ successors(vi ) do
delta[vj ]+ =
delta[vi ] · R(s, vi , vj , t)
for v ∈ V do
RBC[v]+ = delta[v]
return RB
Algorithm 1 is composed of an outer loop that iterates over all s-t pairs of nodes and of three inner
stages. In the first stage the algorithm creates the routing DAG with single source s and single sink t. In the
second stage the delta array is initialized (bold number in Figure 2). Entry delta[v] of this array represents
the expected number of packets from s to t that pass through v: δs,t (v) · T (s, t). Finally in the third stage
7
the expected number of packets from s to t that pass through each one of the nodes is computed and these
probabilities are accumulated according to Equation 2 to form RBC values of all nodes. Most of the following
algorithms will use the same template and similar content.
4.2
RBC of ordered sequences
In this subsection we define RBC of ordered sequences of nodes. A link is a private case of a sequence of
size two where the members of the sequence are connected. Betweenness-Centrality of a sequence measures
the extent to which packets traverse all the nodes in the sequence in a given order. For example, RBC of a
sequence of monitors can reveal the level of redundant traffic inspection. The SPBC of sequences was first
mentioned in [28] as a technique to speed up the computation of shortest-path group Betweenness-Centrality
(GBC) in an order of magnitude. We will also use the concept of sequence Betweenness-Centrality to speed
up the computation of RBC of sets in Section 5.3.
Let S = (s1 , . . . , sk ) be a sequence of nodes. Let δ̃s,t (S) be the probability that a single packet emanating
from s and targeted at t will pass through all nodes in the sequence S, first through s1 then through s2 and
so on until sk . δ̃s,t (S) · T (s, t) is the expected number of packets sent from s to t that pass through S. The
sequence S can be any finite sequence of nodes. If the same node appears more than once, all successive
appearances of the node can be reduced to one instance, for example δ̃s,t ((u, v, v, v, w)) = δ̃s,t ((u, v, w)). On
the other hand, if two appearances of a node in the sequence S are separated by a different node this will create
a cycle and δ̃(S) will be equal to zero according to the assumption of loop-free routing. For the same reason,
δ̃s,t (S) is equal to zero if S contains s following some other nodes, for example δ̃s,t ((v, . . . , s, . . .)) = 0. The
following set of equations recursively computes the probability that a packet will pass through the sequence
S:
δ̃s,t ((s)) = 1
(3)
(vk−1 = vk ) δ̃s,t ((. . . , vk−1 , vk )) = δ̃s,t ((. . . , vk−1 ))
X
δ̃s,t ((. . . , vk−1 , u)) · R(s, u, vk , t)
(vk−1 6= vk ) δ̃s,t ((. . . , vk−1 , vk )) =
u∈P reds,t (vk )
The set of predecessors (P reds,t (r)) remains the same as in previous subsection. Therefore, the Equation 3
can also be solved in O(m) time similarly to Equations 1.
Let Sρ = (s1 , . . . , sk ) be a sequence of nodes with sampling rates ρs1 , . . . , ρsk respectively. For simplicity
of the following discussion we assume that all nodes in S are different. We will denote by S the same sequence
of nodes disregarding their sampling rates. The probability that a packet from s to t will be sampled by
all nodes in Sρ is the probability that it will pass through S multiplied by the product of sampling rates of
all nodes in the sequence. RBC of an ordered sequence of nodes Sρ (denoted by δe•,• (Sρ )) is defined as the
expected number of packets sampled by all nodes in Sρ in a given order.
Y
X
δ̃s,t (S) · T (s, t)
(4)
δe•,• (Sρ ) =
ρr ·
r∈S
s,t∈V
Note that RBC of a directed link (u, v) ∈ E and a single node w ∈ V is simply RBC of the sequences
(u, v) and (w) respectively. Equations 3 and 4 can be used to compute RBC of one sequence of nodes in
O(n2 m). δe•,• (Sρ ) is computed by Algorithm 2, by propagating only the portion of traffic that was sampled
by the monitors in Sρ .
4.3
RBC of sets
In this subsection we define the set variant of RBC. Generally, Betweenness-Centrality of a group of nodes
measures the extent to which packets traverse at least one of the nodes in the group. The concept of centrality
was first applied to groups and classes of nodes in networks by Everett and Borgatti in [14]. The set variant
of RBC can be used, for example, for estimating the expected effectiveness of distributed monitors.
8
Algorithm 2: RBC of sequences (with sampling)
Input: G(V, E), R, T, ρ, S = (s0 , s1 , . . . , sl )
(i 6= j ⇒ si 6= sj )
Output: RBC of S
Data: delta[1..|V |]
RBCof S = 0
for s, t ∈ V do
◮ topological sort
◮ init delta
H accumulate δ̃•,• (Sρ )
k=0
for i = 0 to n do
if vi = sk then k+=1
for vj ∈ successors(vi ) do
if vj ≺ sk or vj is sk then
delta[vj ]+ =
delta[vi ] · R(s, vi , vj , t);
Q
RBCof S+ = delta[t] · si ∈S (ρsi );
return RBCof S
Algorithm 3: RBC of sets (with sampling)
Input: G(V, E), R, T, ρ
Output: RBC
Data: delta[1..|V |], totalTraffic
RBC = 0
for s, t ∈ V do
◮ topological sort
◮ init delta
H accumulate δ̈•,• (Mρ )
Pn−1
totalTraffic= i=0 delta[i]
for i = 0 to n do
delta[vi ] = delta[vi ] · (1 − ρvi )
for vj ∈ successors(vi ) do
delta[vj ]+ =
delta[vi ] · R(s, vi , vj , t);
RBC+ = (totalTraffic−delta[t]);
return RBC
9
Let M = {v0 , . . . , vk } be a set of nodes. Let δ̈s,t (M ) be the probability that a packet from s to t will
pass through at least one of the nodes in M . δ̈s,t (M ) · T (s, t) is the expected number of packets sent from s
to t that pass through M . If we disregard sampling rates, RBC of set M is:
X
δ̈s,t (M ) · T (s, t).
δ̈•,• (M ) =
s,t∈V
Let ρv be the sampling rate of the monitor installed on the node v. Let Mρ = {v|ρv > 0} be a set of nodes
with positive sampling rates. Mρ can be regarded as a fuzzy set where ρv is the extent to which v belongs to
Mρ . Let δ̈s,t (Mρ ) be the probability that a packet from s to t will be sampled by at least one of the nodes
in M . For the sake of simplicity we prefer to compute δ̈s,t (Mρ ) using its inverse probability, namely the
probability that a packet from s to t will not be sampled by monitors in M . Assume, for example, that each
M
sampled packet is marked by the monitors. Let λs,tρ (v) be the probability that a packet from s to t will pass
through v without being marked neither before arriving to v nor by v itself. The probability that a packet
from s to t will not be market by v is 1 − ρv . Therefore, the probability that the packet will leave s without
M
being marked is 1 − ρs . Let u be a predecessor of v. A product λs,tρ (u) · R(s, u, v, t) is the probability that
the packet will reach v through u without being marked. Summing these products over all predecessors of
v will result in the probability that the packet will get to v without being marked as shown in Equation 5.
M
M
λs,tρ (v)
= (1 − ρv ) ·
X
λs,tρ (s) = (1 − ρs )
M
λs,tρ (u)
(5)
· R(s, u, v, t)
u∈P reds,t (v)
M
λs,tρ (t) is the probability that the packet from s to t will not be sampled by any of the monitors. Therefore
M
(1 − λs,tρ (t)) · T (s, t) is the expected number of distinct packets from s to t captured by the monitors:
M
δ̈s,t (Mρ ) = 1 − λs,tρ (t).
The RBC of the fuzzy set Mρ is the expected number of packets sampled by at least one node in Mρ and
can be computed using the inverse probabilities as described in Equation 6.
X
M
(6)
δ̈•,• (Mρ ) =
(1 − λs,tρ (t)) · T (s, t)
s,t∈V
Assume a node v ∈ V and sampling rates ρ such that ρv = 1 and for each u 6= v, ρu = 0. In this case
δ•,• (v) = δ̈•,• ({v}ρ ) making RBC of sets a valid generalization of RBC of single nodes. In the following
discussions we will occasionally omit the subscript ρ notation when referring solely to the nodes in M or
when sampling rates are assumed to be 0 or 1.
Equations 5 and 6 can be used to compute RBC of one group of monitors with given sampling rates in
O(n2 m) as shown in Algorithm 3. In the input to Algorithm 3 we use ρ to represent nodes with positive
sampling rates. In the propagation stage of Algorithm 3 only the traffic that was not sampled propagates
until it reaches t.
Algorithm 3 is composed of an outer loop with three inner stages similarly to Algorithm 1. The first
two phases remain intact. The third phase implements Equation 5 to fill the delta array with the expected
number of packets from s to t that were not captured before or at the respective node. In addition, instead of
computing RBC of all nodes in the networks, the algorithm computes the total expected number of packets
that were captured by at least one monitor according to Equation 6.
This concludes the definition of RBC and its computation methods for routing strategies where the routing
decisions depend on both the source and the target of a packet. Next we will show how the assumption of
source-oblivious routing reduces the time complexity of the presented algorithms from O(n2 m) to O(nm).
10
5
Computing RBC for source-oblivious routing
In this section we will describe how the computation of RBC can be optimized when assuming a sourceoblivious routing scheme. We will revise the computation of RBC of single nodes, sets, and sequences and
present their respective algorithms with minimal changes.
5.1
RBC of individual nodes
Let δ•,t (r) be the expected number of packets targeted at t that pass through the node r as defined by
Equation 7.
X
δ•,t (r) =
δs,t (r) · T (s, t)
(7)
s∈V
δ•,t (r) estimates the ability of r to monitor traffic flows targeted at t. We will refer to δ•,t (r) as target
dependency of t on r. In this and following subsections we will show how to compute RBC of individual
nodes, sequences, and sets by aggregating target dependencies. Since target dependency is a summation of
pairwise dependencies over all sources, RBC of the node r is a summation of target dependencies over all
targets as shown in Equation 8.
X
δ•,• (r) =
δ•,t (r)
(8)
t∈V
If we are able to compute target dependency directly without using Equation 7 the computation of δ•,• (r)
can be accelerated by replacing the loop over all s − t pairs in Algorithm 1 by a loop over all target nodes t
only. Next, we will show that target dependency can be computed recursively similarly to the computation
of pairwise dependency. The similarity between these computations will allow us introducing only minimal
changes to the pseudo code of Algorithm 1 in order to adapt it to source-oblivious routing strategies and
reduce its complexity.
Let P redt (v) be a set of all predecessors of v on the way to t: P redt (v) = {u|R(, u, v, t) > 0}. In
contrast to P reds,t (v) defined in Section 4, here the set of the possible predecessors of v is not influenced
by the source of communication. Let u be a predecessor of v on the way to t. The probability of a packet
to pass through v after visiting u is R(, u, v, t). The expected number of packets targeted at t that can
be monitored by v include packets introduced to the network by v (T (v, t)) and all packets introduced or
forwarded by v’s predecessors as described by Equation 9.
X
δ•,t (v) = T (v, t) +
δ•,t (u) · R(, u, v, t)
(9)
u∈P redt (v)
This equation can be derived directly from Equations 1 and 7 which describe the computation of δs,t (v) and
define δ•,t (v) respectively.
Since we assume loop-free routing P redt defines a DAG similarly to P reds,t , but this time the DAG has
multiple sources and a single sink t as shown in Figure 3. Equation 9 allows computing the values of δ•,t (v)
for all v ∈ V in O(m), in the worst case. Structural similarity of Equations 1 and 9 suggests that the same
process can be used to compute δs,t (v) and δ•,t (v). In fact, by changing the “init delta” stage of Algorithm
1 as shown in Algorithm 4 we make the accumulation stage fill the delta array with target dependencies
instead of pairwise dependencies. Algorithm 4 initializes each entry of the array delta[v] with T (v, t) instead
of assigning T (s, t) to delta[s] and zero to all other entries.
In contrast to Algorithms 1, 2, and 3, that loop through all s-t pairs of nodes, we need to loop only
through all target nodes to compute RBC given the source-oblivious routing strategy. Algorithm 4 loops
once through all target nodes t ∈ V , performing a three-stage operation similar to Algorithm 1. In the first
stage, the algorithm builds the routing DAG with multiple sources and a single sink (opposed to the single
source and single sink DAG, built by the algorithms in the previous section), sorting its nodes. In the second
stage, the delta array is initialized to T (v, t). For example, in Figure 3 T (s1 , t1 ) = T (s2 , t1 ) = 10. Finally,
in the third stage, the algorithm traverses the topologically sorted nodes of the network and aggregates
11
5
s2
v2
5
t2
5
s2
5
t2
5
s2
5
5
v1 5 v3
5
v2
5
v1 5 v3
10
5
v4 10
s1
t1
s1 s2 t2 v2 v1 v4 v3 t1
delta 10 10 0 5 10 5 15 20
5
5
v4
s1
t1
s1 s2 t2 v2 v1 v4 v3 t1
delta 10 10 0 5 10 5 5 5
(a) Single nodes
(b) Sequence (v1 , v3 )
t2
5
v1
10
v2
v3
5
5
v4 5
s1
t1
s1 s2 t2 v2 v1 v4 v3 t1
delta 10 10 0 5 10 5 5 5
(c) Set {v1 , v3 }
Figure 3: Example of a source-oblivious routing DAG with a single sink t and two sources (dashed gray lines). In this example
we assume T (s1 , t1 ) = T (s2 , t1 ) = 10 and T (vi , t1 ) = 0. The numbers on the arrows in sub-figures (a), (b), and (c) indicate the
delta values contributed by the topologically sorted nodes to their successors in Algorithms 1, 2, and 3 respectively.
Algorithm 4: s-oblivious RBC of nodes
Input: G(V, E), R, T
Output: RBC[1..|V |]
Data: delta[1..|V |]
∀v∈V , RBC[v] = 0
for t ∈ V do
H topological sort
E ′ = {(u, v)|R(, u, v, t) > 0}
D = directed acyclic graph (V, E ′ )
{v0 ¹ v1 ¹ . . . ¹ vn = t}
topologically sorted nodes of D;
H init delta
∀v∈V , delta[v] = T (v, t);
◮ accumulate δ•,• (v);
return RB
RBC values. The third stage remains the same as in Algorithm 1, despite the fact that the delta array now
represents target dependencies and not pairwise dependencies.
Algorithm 4 iterates once over all nodes in the network, and performs for each one of them a computation
that takes at most O(m) steps. Thus, the overall complexity of the algorithm is O(nm). This is an order
of magnitude faster than Algorithm 1, whose complexity is O(n2 m). Next we present the equations which
adapt RBC computation of sequences and sets to the semantics of target dependencies.
5.2
RBC of ordered sequences
Employing target dependency. Let Sρ = (s1 , . . . , sk ) be a sequence of nodes with sampling rates
ρs1 , . . . , ρsk respectively. Let δ̃•,t (S) be the expected number of packets targeted at t that pass through all
nodes in the sequence S:
X
δ̃s,t (S) · T (s, t).
δ̃•,t =
s∈V
12
Q
Accordingly, δ̃•,t (S) · v∈S (ρv ) is the expected number of packets targeted at t that are sampled by all nodes
in the sequence. Equations 10 and 11 describe RBC of the sequence Sρ in terms of δ̃•,t (Sρ ).
δ̃•,t ((v)) = δ•,t (v)
(10)
(vk−1 = vk ) : δ̃•,t ((. . . , vk−1 , vk )) = δ̃•,t ((. . . , v))
(vk−1 6= vk ) : δ̃•,t ((. . . , vk−1 , vk )) =
X
=
δ̃•,t ((. . . , vk−1 , u)) · R(, u, vk , t)
u∈P redt (vk )
δe•,• (Sρ ) =
Y
ρv ·
v∈S
X
δ̃•,t (S).
(11)
t∈V
Algorithm 5: s-oblivious RBC of sequences (with sampling)
Input: G(V, E), R, T, ρ, S = {s0 , . . . , sk }
Output: RBC
Data: delta[1..|V |]
RBC = 0
for t ∈ V do
◮ topological sort
H init delta
for v ∈ V do
if v ≺ s0 or v is s0 then
delta[v] = T (v, t);
else
delta[v] = 0;
◮ accumulate δ̃•,• (Sρ );
return RBC
Algorithm 5 computes RBC of a sequence of monitors in O(nm) time using Equations 10 and 11. During
the iteration over all target nodes this algorithm sorts nodes, in the same way as Algorithm 4 and accumulates
betweenness, in the same way as Algorithm 3. Entries of the delta[v] array represent the expected number
of packets sampled by all monitors in the sequence preceding v in the topological order. In particular, all
entries delta[v] preceding the first element in the sequence are initialized to T (v, t).
Using precomputed data. Next we will closely examine the probability that a packet sent from s to t
will pass through u and then through v (δ̃s,t ((u, v))). Consider Figure 4 as an example. Assume a packet
targeted at t that has reached u. The probability that this packet will pass through v on its way to t does
not depend on the source of the packet and on routing decisions made this far. Therefore, we can multiply
the probability that the packet from s to t will reach u (δs,t (u)) by the probability that a packet from u to
t will reach v (δu,t (v)) to get the probability that a packet from s to t will pass through both u and v:
δ̃s,t ((u, v)) = δs,t (u) · δu,t (v).
We can add more nodes to the sequence (u, v) using the following lemma:
Lemma 1 (Dependency chaining) Let S = (s1 , . . . , sk ) be an ordered sequence of nodes. The probability
that a packet sent from a node s to a different node t will pass through all nodes in S in a given order is:
δ̃s,t ((s1 , . . . , sk )) = δs,t (s1 ) · δ̃s1 ,t ((s2 , . . . , sk )).
13
Proof: The following proof is based on the fact that the probability of a packet passing through (s2 , . . . , sk ),
assuming that the packet already visited s1 , does not depend on the source of the packet since we assume that
the routing scheme under investigation is source-oblivious.
First we will prove the lemma for δ̃s,t ((s1 , . . . , sk )) = 0. δ̃s,t ((s1 , . . . , sk )) is the probability that a packet
emanating from s and targeted to t will first pass through v1 , then through s2 , and so on, until sk . This is
a non-zero probability if, and only if, there is at least one route from s to t that passes through (s1 , . . . , sk )
in this order. Such a route exists if, and only if, there is a route from s to t traversing s1 and there is a
complement route from s1 to t that includes the nodes s2 , . . . , sk .
δ̃s,t ((s1 , . . . , sk )) = 0 ⇔ δs,t (s1 ) · δ̃s1 ,t ((s2 , . . . , sk )) = 0.
Before we continue the proof for a more general case δ̃s,t ((s1 , . . . , sk )) 6= 0 we will now show that for any
set of nodes there is at most one permutation L of these nodes for which δ̃s,t (L) ≥ 0. Note that δ̃ is defined
as probability and therefore cannot be negative.
Proposition 1 Let s, t ∈ V be two nodes in the network. Let M ⊆ V be a subset of nodes. Let L1 and
L2 be two permutations of M . Then for any loop-free routing strategy where routing decisions depend solely
on s and t (or only t in case of source-oblivious routing), the following two options are mutually exclusive
unless L1 = L2 :
1. δ̃s,t (L1 ) > 0
2. δ̃s,t (L2 ) > 0.
Proof: Let L1 = (v1 , . . . , vl ) and L2 = (u1 , . . . , ul ) be two different permutations of M such that δ̃s,t (L1 ) >
0 and δ̃s,t (L2 ) > 0. Let i be the lowest integer such that vi 6= ui . Let j be the index of the node vi in L2
(vi = uj ). Let k be the index of the node ui in L1 (ui = vk ). Since all nodes appear only once in both
permutations and i is the lowest index for which nodes are different it holds that i < j and i < k. Without
loss of generality assume that j ≤ k. This means that for each one of the permutations there is at least one
route from s to t passing through all the nodes in the order defined by the permutation. In particular it holds
that there is at least one route from s to t passing through (vi , . . . , vk ) and similarly for (ui , . . . , uj ). Since
routing decisions depend solely on s and t there is a non-zero probability that a packet from s to t will reach
uj = vi through (ui , . . . , uj = vi ) and continue back to ui = vk through (vi , . . . , vk = ui ) in contradiction to
the assumption that the routing strategy is loop-free.
Note that the existence of a route from s to t through (s1 , . . . , sk ) implies that there is no route that
passes through these nodes in a different order, according the above the above Proposition 1. Therefore, if
δ̃s,t ((s1 , . . . , sk )) > 0 then the order of nodes s, s1 , . . . , sk , t is well defined (with s being the first node).
Assume that δ̃s,t ((s1 , . . . , sk )) > 0. Let the event Zv represent all cases where the packet passes through
node v. Let the event Tv represent all cases where the packet is targeted toward v. ”Targeting” here is
different from ”passing through” since the target of a packet has an affect on routing decisions along the
traversed path.
#
" k
\
(12)
Zsi |Zs ∩ Tt
δ̃s,t ((s1 , . . . , sk )) = P r
i=1
The next equation immediately follows from Equation 12 since
Tk
events Zs1 and i=2 Zsi .
δ̃s,t ((s1 , . . . , sk )) = P r [Zs1 |Zs ∩ Tt ] · P r
14
"
Tk
i=1
k
\
i=2
Zsi can be decomposed into two joint
Zsi |Zs ∩ Zs1 ∩ Tt
#
(13)
Since we are dealing with source-oblivious routing the nodes that the packet passed prior to passing through
s1 (in particular the source node s) have no effect on the remaining routing decisions. Therefore s has no
effect on the probability of a packet targeted at t passing through s2 , . . . , sk after visiting s1 .
#
" k
\
(14)
Zsi |Zs1 ∩ Tt
δ̃s,t ((s1 , . . . , sk )) = P r [Zs1 |Zs ∩ Tt ] · P r
i=2
Finally, according to the definitions of δ and δ̃, the proof of Lemma 1 can be completed.
δ̃s,t ((v0 , . . . , vl )) = δs,t (s1 ) · δ̃s1 ,t ((s2 , . . . , sk ))
(15)
Using Lemma 1, pairwise dependency on a sequence can be represented as a product of pairwise dependencies on single nodes:
s
u
v
t
Figure 4: In this figure assume that packets are sent from s to t and are forwarded by u and v from the left to the right.
The probability that an arbitrary packet sent from s to t will pass through u and v is smaller than the probability that it will
pass through u. δs,t (u) = 31 , δs,t (v) = 12 , δs,v (u) = 12 , and δu,t (v) = 21 . δ̃s,t ((u, v)) = 31 · 12 = 61 since we have two decision
points: first on s and then on u. Note that 61 = δ̃s,t ((u, v)) 6= δs,v (u) · δs,t (v) = 14 . This is because δs,v (u) does not consider
the ultimate target (t) and ignores one possible path from s to t.
δ̃s,t ((s1 , . . . , sk )) = δs,t (s1 ) · δs1 ,t (s1 ) · . . . · δsk−1 ,t (sk )
(16)
Multiplying the Equation 16 by T (s, t) and summing it over all sources s ∈ V results in a target dependency chain as the following:
δ̃•,t ((s1 , . . . , sk )) = δ•,t (s1 ) ·
k
Y
δvi−1 ,t (si )
(17)
i=1
Equations 16 and 17 can be used to compute δs,t (S) and δ̃•,t (S) respectively in O(|S|) steps given the
values of δs,t (si ) and δ•,t (s1 ). Consequently δ̃•,• (S) can be computed in O(n · |S|) steps using the summation
over all target nodes:
X
δ̃•,t (S).
δe•,• (S) =
t∈V
The pseudo-code for the computation can be found in Algorithm 6. The pseudo-code is straight forward
and contains two nested loops where the first one iterates over all target nodes in the network. The second
loop iterates over the sequence members multiplying the pairwise dependencies.
5.3
RBC of sets
M
Employing target dependency. Let λ•,tρ (v) be the expected number of packets targeted at t that reach
v without being captured by any of the nodes in Mρ :
X M
M
λs,tρ (v) · T (s, t).
λ•,tρ (v) =
s∈V
15
Algorithm 6: s-oblivious RBC of sequences (with sampling, after preprocessing)
Input: G(V, E), R, T, ρ, S = {s0 , . . . , sk }, δs,t (v), δ•,t (v)
Output: RBC
Data: delta
RBC = 0
for t ∈ V do
delta = δ•,t (s0 ) · ρs0
for i = 0 to k − 1 do
delta∗ = δsi ,t (si+1 ) · ρsi+1 ;
RBC+ = delta;
return RBC
M
The following equations describe RBC of the fuzzy set Mρ in terms of λ•,tρ (v):
M
λ•,tρ (v) = (1 − ρv ) · T (v, t) +
δ̈•,• (Mρ ) =
X
t∈V
Ã
X
M
λ•,tρ (u) · R(, u, v, t) · (1 − ρv )
(18)
u∈P redt (v)
X
T (s, t) −
M
λ•,tρ (t)
s∈V
!
.
(19)
Algorithm 7 computes RBC of a set of monitors installed on nodes in a communication network with sourceoblivious routing strategy, given the sampling rates of the monitors. Thus, the time complexity of Algorithm
is O(nm). This algorithm iterates over all nodes in the network and in each iteration, sorts nodes is the
same way as Algorithm 4. It initializes each entry of the delta[v] array to (1 − ρv ) · T (v, t) and accumulates
betweenness similarly to Algorithm 3.
Algorithm 7: s-oblivious RBC of sets (with sampling)
Input: G(V, E), R, T, ρ, M
Output: RBC
Data: delta[1..|V |]
RBC = 0
for t ∈ V do
◮ topological sort
H init delta
∀v∈V , delta[v] = (1 − ρv ) · T (v, t);
◮ accumulate δ̈•,• (Mρ );
return RBC
Contribution to RBC of a set. In this subsection we assume that a set of monitors X is installed on nodes
in a network and their sampling rates are specified by ρ. We investigate the expected number of unsampled
packets that can be sampled by additional monitors. We will refer to this measure as the contribution of
X
individual nodes, sets of nodes, or sequences of nodes to RBC of Xρ . In Section 4.3 we have defined λs,tρ (v)
as the probability that a packet from s to t will pass through v without being sampled by monitors in Xρ .
This probability gives no information regarding the probability that this packet will be sampled after passing
through v.
X
Let χs,tρ (w) be the probability that a packet from s to t will pass through w and will not be sampled
by any of the monitors in Xρ (neither before nor after visiting w). Assume that v monitors the traffic with
X
sampling rate ρw > 0. Then χs,tρ (w) · ρv · T (s, t) is the expected number of packets from s to t that were
X
sampled only by w and not by other monitors. In other words, χs,tρ (w) · ρw is the contribution of w to the
16
X
capability of the monitors to sample traffic between s and t. χs,tρ (u) can be computed for any u ∈ V by
starting with X = ∅ and adding nodes to X one at a time using the following lemma:
Lemma 2 (Pairwise dependency contribution) Let X = {v1 , . . . , vk } be a set of nodes with sampling
rates specified by ρv1 , . . . , ρvk respectively. Let w be a node with sampling rate ρw . For any u ∈ V it holds
that:
X ∪{w}ρ
(u) = χs,tρ (u) · (1 − ρw )
X ∪{w}ρ
(u) = χs,tρ (u) − χs,tρ (u) · χu,tρ (w) · ρw − χs,tρ (w) · χw,tρ (u) · ρw
(u = w)
χs,tρ
(u 6= w)
χs,tρ
X
X
X
X
X
X
(20)
Proof: This lemma describes the computation of the probability that a packet from s to t will pass through
u without being sampled neither by monitors in X nor by w. The guiding principle of the computation is: to
X
discard packets that were sampled by w we need to subtract from χs,tρ (u) the probability of the packet being
sampled by w either before or after passing through u.
The probability of a packet passing through w without being sampled by w or any other node in X
X∪{w}
(χ̃s,t
(w)), equals its probability of passing through w without being sampled by any node in X and not
being sampled by w. Being sampled by w and being sampled by any node in X are independent events
(assuming that w ∈
/ X). Hence, the first case of the lemma.
Assume w 6= u. Let the event Yv represent the cases where the packet from s to t passes through the node
v. Let the event Zv represent the cases where this packet was sampled by the node v. Let Z v represent the
X
cases where this packet was not sampled by the node v. χs,tρ (u) is the probability that a packet from s to t
was not sampled by any node in X, but passes through u:
\
X
χs,tρ (u) = P r[
Z v ∩ Yu ].
(21)
v∈X
T
P r[ v∈X Z v ∩ Yu ] can be decomposed into two cases: packets that were sampled by w and packets that were
not:
\
\
X
χs,tρ (u) = P r[
Z v ∩ Zw ∩ Yu ] + P r[
Z v ∩ Z w ∩ Zu ].
(22)
v∈X
v∈X
Assume a packet from s to t that was not sampled by any node in X. The above equation yields that
the probability of this packet passing through u without being sampled by w (case Z w ∩ Yu ) is equal to the
probability of the packet passing through u minus the probability of the packet passing through u while being
sampled by w (case Zw ∩ Yu ).
According to Proposition 1 the packet can pass through u and w by either passing first through u and then
through w, or vice-versa. Note that packet can be sampled by w only if it passes through w. Therefore, the
case Zw ∩ Yu can be represented as the sum of two sequence dependencies multiplied by the sampling rate of
w (δ̃s,t ((w, u)) · ρw + δ̃s,t ((u, w)) · ρw ). Moreover, the proof of Proposition 1 can easily be translated from δ
to χ by excluding packets that are sampled by some node in X.
By replacing the probabilities in Equation 22 with the respective pairwise dependencies we get:
´
³
X ∪{w}ρ
X
X
X
(u).
(23)
χs,tρ (u) = χ̃s,tρ ((u, w)) · ρw + χ̃s,tρ ((w, u)) · ρw + χs,tρ
According to the Dependency Chaining Lemma, which can also be adjusted to χ by considering only packets
X
that were not sampled by X, χ̃s,tρ ((u, w)) can be decomposed into a product of two pairwise dependencies,
completing the proof.
X∪{w}
X
X
X
X
χX
s,t (u) − χs,t (u) · χu,t (w) · ρw − χs,t (w) · χw,t (u) · ρw = χs,t
17
(w).
(24)
X
Let χ•,tρ (u) =
P
X
s∈V
χs,tρ (u)·T (s, t) be the expected number of packets targeted at t that will pass through
X ∪{w}ρ
v and will not be sampled by any of the monitors in Xρ . Lemma 2 can be used to compute χ•,tρ
multiplying Equation 20 by T (s, t) and summing it over all sources.
X
X ∪{w}ρ
(u) = χ•,tρ (u)(1 − ρw )
X ∪{w}ρ
(u) = χ•,tρ (u) − χ•,tρ (u) · χu,tρ (w) · ρw − χ•,tρ (w) · χw,tρ (u) · ρw
(u = w)
χ•,tρ
(u 6= w)
χ•,tρ
Let χ•,•ρ (w) =
P
t∈V
(u) by
X
X
X
X
X
X
(25)
X
χ•,tρ (w) be the expected number of packets between all source-target pairs that pass
X
through w without being sampled by any node in X. χ•,•ρ (w) · ρw can be considered as the contribution of
w to RBC of Xρ :
X
δ̈•,• (Xρ ∪ {w}ρ ) = δ̈•,• (Xρ ) + χ•,•ρ (w) · ρw .
Using precomputed data. We assume that all δs,t (v), δ•,t (v), and δ•,• (v) values are computed using
Algorithm 1 and stored in a data structure with O(1) store and retrieval, such as matrices or a hash table.
The computation speed up methods presented here are valid for source-oblivious routing strategies. In
particular, we assume that all routing decisions specified by the probabilities R(, u, v, t) are independent.
In order to make the discussion more intuitive we will use an example from set theory. Let A, B, and C
be three sets. We need to compute the size of their union, namely, the number of elements belonging to at
least one of the sets. Assume that we can easily compute the size of intersection but not the size of union.
In order to overcome this difficulty we can use the Inclusion-Exclusion rule as following:
|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|.
Regrouping the addends will result in:
|A ∪ B ∪ C| = |A| + |B ∩ A| + |C ∩ A ∩ B|,
where |B ∩ A| = |B| − |A ∩ B| accounts for all elements that belong to B and do not belong to A.
In our case the size of a union can be associated with RBC of sets, where we account for packets sampled
by at least one monitor in the set. Size of an intersection can be associated with RBC of sequences, where
we account for packets sampled by all monitors in the sequence. Finally, the semantics of |B ∩ A| are similar
{u}
to semantics of χs,t (v) – the probability of a packet from s to t to pass through v without passing through
u. Next we will apply the technique demonstrated in the above example for computing the expected number
of packets sampled by a set of monitors. Pairwise dependency on a set of monitors can be computed by
summing contributions of the set members as described by the following lemma:
Lemma 3 (Summing dependency contributions) Let M = {v0 , . . . , vk } be a set of nodes with sampling
rates specified by ρv0 , . . . , ρvk respectively. Let M (i) = {v0 , . . . , vi } be a subset of M . It holds that:
δ̈s,t (Mρ ) =
k
X
M (i−1)
χs,tρ
(vi ) · ρvi .
i=0
Proof: Lemma 3 describes an iterative computation δ̈s,t (Mρ ). In each iteration we accumulate the contributions of vi , namely the uncovered traffic flows sampled by vi , to the pairwise dependency.
Let the event Zv represent all cases where the packet from s to t is sampled by node v. Let the event Zv
represent all cases where the packet from s to t is not sampled by the node v. Let the event Yv represent all
cases where the packet from s to t passes through node v. By definition, δ̈s,t ({v1 , . . . , vk }) is the probability
that the packet, from s to t, will pass through at least one of the k nodes v1 , . . . , vk
#
" k
[
Zvi .
δ̈s,t ({v1 , . . . , vk }ρ ) = P r
i=1
18
We can substitute the right term in the above equation by a summation as following:
i−1
k
\
X
Zvj .
δ̈s,t ({v1 , . . . , vk }ρ ) =
P r Zvi ∩
i=1
j=1
For each i the term inside the summation above is the probability that the packet from s to t will be sampled
by vi without being sampled by v1 , . . . , vi−1 . According to the definition of χ
i−1
\
P r Yvi ∩
Zvj = χ{v1 ,...,vi−1 } (vi )
j=1
The sampling rate of the node vi is ρvi and therefore we can substitute the term inside the summation by
χ{v1 ,...,vi−1 } (vi ) · ρvi completing the proof:
δ̈s,t ({v1 , . . . , vk }) =
k
X
δ {v1 ,...,vi−1 } (vi ) · ρvi .
i=1
Algorithm 8: s-oblivious RBC of sets (with sampling, after preprocessing)
Input: G(V, E), R, T , ρ, M = {v0 , . . . , vk }, δs,t (v), δ•,t (v)
Output: RBC
Data: pdep[k × k × n], tdep[k × n],
npdep[k × k × n], ntdep[k × n],
RBC = 0
for s, v ∈ M , t ∈ V do pdep[s, v, t] = δs,t (v)
for v ∈ M , t ∈ V do tdep[v, t] = δ•,t (v)
H account for M
for w ∈ M do
for t ∈ V do
RBC+=tdep[w, t] · ρw
for u ∈ M do
if u = w then
ntdep[u, t] =tdep[u, t] · (1 − ρw )
else
ntdep[u, t] = tdep[u, t]−
−tdep[u, t]· pdep[u, w, t] · ρw −
−tdep[w, t]· pdep[w, u, t] · ρw ;
for s ∈ M do
if u = w then
npdep[s, u, t] =pdep[s, u, t] · (1 − ρw )
else
npdep[s, u, t] =pdep[s, u, t]−
−pdep[s, u, t]·pdep[u, w, t] · ρw −
−pdep[s, w, t]·pdep[w, u, t] · ρw ;
tdep=ntdep; pdep=npdep;
return RBC
19
Lemma 3 provides us with a tool for iterative computation of pairwise dependency on sets of nodes.
Summing δ̈s,t (Mρ ) over all sources while multiplying each addend by T (s, t) results in Equation 26 that
describes iterative computation of target dependency on a set of nodes.
δ̈•,t (Mρ ) =
k
X
M (i−1)
χ•,tρ
(vi ) · ρvi
(26)
i=0
Summing Equation 26 over all targets results in iterative computation of RBC of a set of nodes:
δ̈•,• (Mρ ) =
k
XX
M (i−1)
χ•,tρ
(vi ) · ρvi
(27)
t∈V i=0
In the last algorithm presented in this paper (Algorithm 8), we compute RBC of a given set iterating over
all nodes in the set and summing their marginal contributions as described in Equation 27. The marginal
contributions are computed using Lemma 2. During the algorithm we maintain two matrices. One is the
three dimensional matrix of pairwise dependencies and the other is the two dimensional matrix of target
dependencies. The last dimension in these matrices is of size n while the other dimensions are of size k. We
use Equation 20 to update pairwise dependencies and Equation 25 to update target dependencies.
Algorithm 8 is composed of an initialization phase where precomputed values are copied into temporal
matrices and four nested loops that compute RBC of the input set of nodes. Temporal arrays pdep and
tdep maintain pairwise and target dependencies respectively. Initially the values in these arrays correspond
{}
{}
to χs,t (v) = δs,t (v) and χ•,t (v) = δ•,t (v). In each iteration of the outer loop we process one node from the
input set M . The marginal contributions of nodes to RBC of M are aggregated according to Equation 27
during the first inner loop (that iterates over all t ∈ V ). The second and third inner loops iterate over nodes
in M and update the entries of the tdep and pdep matrices respectively according to Equations 25 and 20.
M (1)
M (1)
After the first update of all the values in these matrices they correspond to χs,tρ (v) and χ•,tρ (v) where
M (1) contains only the first node in M . During the second iteration we process one more node from M
and update the matrices again and so on until we process all nodes in M . The overall time complexity of
Algorithm 6 is O(k 3 n) where k is the size of M and n is the number of nodes in the network.
5.4
RBC of sets of edges and mixed sets
In many applications the monitoring of the traffic is done by tapping the communication links and not the
nodes. The problem of monitoring links can easily be reduced to a problem of monitoring nodes by adding
a phantom node in the middle of the monitored link. When the routing scheme R and the traffic matrix T
are updated appropriately.
Algorithms that do not use pre-processing can be configured to avoid the phantom nodes in their main
loop. This is an intuitive optimization since they do not introduce traffic to the network. In this case, adding
phantom nodes does not increase the complexity of these algorithms but makes each iteration of the main
loop twice as long as before. In algorithms that use pre-processing, after adding phantom nodes, the size of
the pre-computed matrices will depend on the number of edges and not only the number of nodes. Thus,
adding phantom nodes increases both time and space complexity of the preprocessing stage in Algorithms 6
and 6 from O(n2 m) time and O(n3 ) space to O(|E|2 m) time and O(|E|3 ) space.
In order to avoid the addition of phantom nodes we should remember that the RBC of a (directed) link
is the RBC of the sequence consisting of both of its’ nodes. Therefore, the RBC of a sequence of directed
links is RBC of the sequence of nodes comprising the links and can be computed using Algorithms 2, 5, and
6. The RBC of a set of links can be computed iteratively, by taking into account one link at a time, similarly
to RBC of a set of nodes.
Let (u, v) be a link tapped by a monitor with sampling rate ρ(u,v) . The expected number of packets that
will pass through w and will not be sampled by that monitor is equal to δ•,• (w) minus RBC of sequences
(w, u, v) and (u, v, w). Note that if the sequence (u, v) represents a link and u 6= w 6= v, then the RBC
20
Algorithm 9: s-oblivious RBC of sets of links (with sampling, after preprocessing)
Input: G(V, E), R, T, ρ, Q = {(u1 , v1 ), . . . , (uk , vk )}, δs,t (v), δ•,t (v)
Output: RBC
Data: pdep, npdep, tdep, ntdep
RBC = 0
X = set of nodes comprising Q
for s, v ∈ X, t ∈ V do pdep[s, v, t] = δs,t (v)
for v ∈ X, t ∈ V do tdep[v, t] = δ•,t (v)
H account for Q
for (u, v) ∈ Q do
for t ∈ V do
RBC+=tdep[u, t]·pdep[u, v, t]
for w ∈ X do
ntdep[w, t] = tdep[w, t]−
−tdep[w, t]· pdep[w, u, t]· pdep[u, v, t]−
−tdep[w, t]· pdep[w, u, t]· pdep[u, v, t]
for s ∈ X do
npdep[s, w, t] = pdep[s, w, t]−
−pdep[s, w, t]· pdep[w, u, t]· pdep[u, v, t]−
−pdep[s, w, t]· pdep[w, u, t]· pdep[u, v, t];
tdep=ntdep; pdep=npdep;
return RBC
of the sequence (u, w, v) is zero. We can modify the recursive Equation 20 to compute the probability
that a packet from s to t will pass through w and will not be sampled by any link monitor in a given
set. Let Q(i) = {(u1 , v1 ), . . . , (ui , vi )} be a set of nodes with sampling rates specified by ρ(v1 ,u1 ) , . . . , ρ(ui ,vi )
respectively.
Q(i+1)
χs,tρ
Q(i)
ρ
(i)
Qρ
Q(i)
Q(i)
Q(i)
(w) = χs,tρ (w) − χ̃s,tρ ((w, u, v)) − χ̃s,tρ ((u, v, w))
Q(i)
ρ
(28)
Q(i)
ρ
Q(i)
ρ
Q(i)
ρ
Q(i)
ρ
where χ̃s,t ((w, u, v)) = χs,t (w) · χw,t (ui+1 ) · χui+1 ,t (vi+1 ) and χ̃s,t ((u, v, w)) = χs,t (ui+1 ) · χui+1 ,t (vi+1 ) ·
Q(i)
Q(i+1)
ρ
ρ
χvi+1
(w)) can be computed using
,t (w). Target dependency with respect to a set of link monitors (χ•,t
the equation above if we substitute pairwise dependencies with target dependencies.
Algorithm 10: s-oblivious RBC of sets of links and nodes (with sampling, after preprocessing)
Input: G(V, E), R, T, ρ, M, Q, δs,t (v),
δ•,t (v)
Output: RBC
Data: pdep, npdep, tdep, ntdep
RBC = 0
X = set of nodes comprising Q and M
for s, v ∈ X, t ∈ V do pdep[s, v, t] = δs,t (v)
for v ∈ X, t ∈ V do tdep[v, t] = δ•,t (v)
◮ account for M
◮ account for Q
return RBC
Algorithm 9 computes the RBC of a set of links similarly to the way that Algorithm 8 computes the
RBC of a set of nodes. But instead of implementing the Equations 20 and 25 it implements the Equation
21
28 and its target dependency variant. Another difference between the Algorithms 9 and 8 is the size of the
pdep and tdep matrices. The size of the first dimensions of these matrices is now equal to the number of of
nodes attached to the input links.
Algorithm 9 has the same four nested loops as Algorithm 8. In addition the number of nodes comprising
the links in a given set of size k is at least k + 1 and at most 2k. Therefore the time complexity and the
space complexity of Algorithm 9 is O(k 3 n) and O(k 2 n) respectively as well as to Algorithm 8.
Both algorithms can be combined together in order to compute the RBC of a set of monitors that includes
monitors installed on nodes and monitors installed on links as shown in Algorithm 10. Let M be the set of
monitored nodes and Q be the set of monitored links. Let X = M ∪ {u, v : (u, v) ∈ Q} be the set of nodes
comprising M and Q. First all the data relevant to X is copied into the tdep and pdep matrices. Afterwords,
the cores of both algorithms are executed consequently to compute RBC of the mixed set M ∪ Q.
6
Conclusions
In this paper we have defined a new Betweenness-Centrality measure called Routing Betweenness-Centrality
(RBC) which is a generalization of well known betweenness centrality measures such as Shortest-Path Betweenness Centrality, Traffic Load Centrality, and Flow Betweenness Centrality. RBC measures the extent
to which nodes or groups of nodes are exposed to the traffic given any loop-free routing strategy.
The algorithms presented in this paper are easily modified to compute RBC of groups consisting of
links and/or nodes (see Appendix 5.4). In fact a more sophisticated combinations of policies for traffic
monitoring/controlling are supported. Using the methods present in this paper we can compute the expected
number of packets each one of which satisfies a predicate in disjunctive normal form with at most one negation
clause. For example, packets each one of which is sampled by q, u, and v, or by w and x but is neither
sampled by y nor by w.
The required computation complexity of our algorithms depend on whether the routing scheme is source
dependent or source oblivious. Generally speaking, when the routing decisions in the network depend on
both the source and the target of a packet the time complexity of RBC computation is an order of n higher
than in the source-oblivious cases. For source oblivious routing schemes, our RBC algorithms can be used to
compute the Shortest Path Betweenness-Centrality and Traffic Load Centrality with complexity matching
the state of the art complexities; while our RBC algorithms are capable to compute a larger variety of
Betweenness-Centrality measures.
We show that prepossessing can dramatically reduce the time required for a single computation of RBC
of a sequence. Prepossessing can also reduce the time required to compute the RBC of sets, when the size
of the investigated set is smaller than the third square of m (the number of edges in the routing tree or the
routing DAG of the given routing scheme).
Since RBC is more general than existing known definitions of Betweenness-Centrality and capable of
better reflecting routing schemes in communication networks we believe that many applications of RBC will
be found in the near future. Currently we have already found RBC useful for predicting the effectiveness
and the cost of passive network monitoring. RBC can be used in conjunction with various combinatorial
optimization techniques and approximation algorithms such as those described in [13, 29, 32] for optimizing
placement of passive monitors within the communication network. Other obvious applications include simulation free prediction of congestions in communication networks, design and examination of routing strategies
and network layout, for, say, balancing the traffic load in the network and assuring service level agreements.
Acknowledgments
The authors would like to thank Omer Zohar for implementing and testing all RBC algorithms and department members who contributed valuable remarks on this paper.
22
References
[1] Optimized multipath. http://www.faster-light.net/omp/.
[2] J. M. Anthonisse. The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch
Centrum, Amsterdam, 1971.
[3] A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
[4] A.-L. Barabasi, R. Albert, and H. Jeong. Scale-free characteristics of random networks: the topology
of the world-wide web. Physica A, 281:69–77, 2000.
[5] M. Barthélemy. Betweenness centrality in large complex networks. The European Physical Journal B –
Condensed Matter, 38(2):163–168, March 2004.
[6] B. Bollobas and O. Riordan. Robustness and vulnerability of scale-free random graphs. Internet
Mathematics, 1(1):1–35, 2003.
[7] S. P. Borgatti. Centrality and network flow. Social Networks, 27:55–71, 2005.
[8] S. P. Borgatti and M. G. Everett. A graph-theoretic perspective on centrality. Social Networks,
28(4):466–484, Oct. 2006.
[9] P. Bork, L. J. Jensen, C. von Mering, A. K. Ramani, I. Lee, and E. M. Marcotte. Protein interaction
networks from yeast to human. Curr. Opin. Struct. Biol., 14(3):292–299, Jun. 2004.
[10] U. Brandes. A faster algorithm for betweenness centrality. Mathematical Sociology, 25(2):163–177, 2001.
[11] U. Brandes. On variants of shortest-path betweenness centrality and their generic computation. Social
Networks, 30(2):136–145, 2008.
[12] G. R. Cantieni, G. Iannaccone, C. Barakat, C. Diot, and P. Thiran. Reformulating the monitor placement
problem: optimal network-wide sampling. In CoNEXT ’06: Proceedings of the 2006 ACM CoNEXT
conference, pages 1–12, New York, NY, USA, 2006. ACM.
[13] S. Dolev, Y. Elovici, R. Puzis, and P. Zilberman. Incremental deployment of network monitors based
on group betweenness centrality. to appear in Information Processing Letters.
[14] M. G. Everett and S. P. Borgatti. The centrality of groups and classes. Mathematical Sociology,
23(3):181–201, 1999.
[15] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology.
SIGCOMM Comput. Comm. Rev., 29(4):251–262, 1999.
[16] L. C. Freeman. A set of measures of centrality based on betweenness. Sociometry, 40(1):35–41, 1977.
[17] L. C. Freeman. Centrality in social networks conceptual clarification. Social Networks, 1:215–239, 1979.
[18] L. C. Freeman, S. P. Borgatti, and D. R. White. Centrality in valued graphs: A measure of betweenness
based on network flow. Social Networks, 13(2):141–154, Jun. 1991.
[19] K.-I. Goh, B. Kahng, and D. Kim. Universal behavior of load distribution in scale-free networks. Phys.
Rev. Lett., 87(27):278701, Dec. 2001.
[20] F. Harary, R.Z. Norman, and D. Cartwright. Structural models. An introduction to the theory of directed
graphs. John Wiley and Sons, New York, 1965.
[212] P. Holme. Congestion and centrality in traffic flow on complex networks. Advances in Complex Systems,
6(2):163–176, 2003.
23
[22] A.W. Jackson, W. Milliken, C.A. Santivanez, M. Condell, and W.T. Strayer. A topological analysis of
monitor placement. Network Computing and Applications, 2007. NCA 2007. Sixth IEEE International
Symposium on, pages 169–178, July 2007.
[23] J. Moy. Rfc 2328 - osfp version 2. http://www.ietf.org/rfc/rfc2328.txt, Apr. 1998.
[24] M. E. J. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality.
Phys. Rev. E, 64:016132, 2001.
[25] M. E. J. Newman. A measure of betweenness centrality based on random walks. Social Networks,
27(1):39–54, Jan. 2005.
[26] R. Pastor-Satorras and A. Vespignani. Immunization of complex networks. Phys. Rev. E, 65:036104,
2002.
[27] S. Porta, P. Crucitti, and V. Latora. The network analysis of urban streets: a primal approach.
Environment and Planning B: Planning and Design, 33(5):705–725, September 2006.
[28] R. Puzis, Y. Elovici, and S. Dolev. Fast algorithm for successive computation of group betweenness
centrality. Phys. Rev. E, 76(5):056709, 2007.
[29] R. Puzis, Y. Elovici, and S. Dolev. Finding the most prominent group in complex networks. AI Comm.,
20:287–296, 2007.
[30] R. Puzis, D. Yagil, Y. Elovici, and D. Braha. Collaborative attack on internet users’ anonymity. Internet
Research, 19(1):60–77.
[31] S. H. Strogatz. Exploring complex networks. Nature, 410:268–276, March 2001.
[32] K. Suh, Y. Guo, J. Kurose, and D. Towsley. Locating network monitors: Complexity, heuristics, and
coverage. Computer Communications, 29:1564–1577, 2006.
[33] S. Wasserman and K. Faust. Social network analysis: Methods and applications. Cambridge, England:
Cambridge University Press., 1994.
[34] G. Yan, T. Zhou, B. Hu, Z.-Q. Fu, and B.-H. Wang. Efficient routing on complex networks. Phys. Rev.
E, 73:046108, 2006.
[35] S.-H. Yook, H. Jeong, and A.-L. Barabasi. Modeling the internet’s large-scale topology. Proceedings of
the National Academy of Science, 99(21):13382–13386, Oct. 2002.
[36] T. Zhou, J.-G. Liu, and B.-H. Wang. Notes on the algorithm for calculating betweenness. Chinese
Physics Letter, 23:2327–2329, Aug. 2006.
24
i, j, k, l
n
m
r, s, t, u, v, w, x, y, z
G(V, E)
M, X
S
ρv
Mρ , Xρ , Sρ
δs,t (v), δ̈s,t (M ), δ̃s,t (S)
δ•,t (v), δ̈•,t (M ), δ̃•,t (S)
δ•,• (v), δ̈•,• (M ), δ̃•,• (S)
X
X
λs,tρ (v), λ•,tρ (v)
χXρ
p, P r
σs,t , σs,t (v)
φs,t , φs,t (v)
Table 2: Notations
Natural numbers – indexes or sizes of sets and sequences.
The number of nodes in the network.
The maximal number of edges in the routing tree (or the routing DAG for
multi-path routing schemes).
Nodes.
A network with nodes V and edges E.
Sets of nodes.
Sequence of nodes.
Sampling rate of v.
Sets and sequences where the sampling rate of members is specified by ρ.
Pairwise dependency of s and t on a node v, a set M , or a sequence S
respectively.
Target dependency of t on a node v, a set M , or a sequence S respectively.
The Betweenness-Centrality of a node v, a set M , or a sequence S respectively.
Same as δs,t (v) and δ•,t (v) but accounts only for packets not sampled by
the monitors in X prior to reaching v.
Same as δ but accounts only for packets not sampled by the monitors in X.
Probability.
Number of shortest paths between s and t and the number of them that
pass through v (used for computing SPBC).
Maximal flow between s and t and flow transferred by v respectively (used
for computing FBC).
APPENDIX A. Notations
Next we will summarize the symbols and the notation principles used in this paper (see Table 2). The
uppercase Latin letter G represents a network. Other uppercase Latin letters are used to represent sets and
sequences. For example, V is the set of nodes in G, M ⊂ V is an arbitrary set of nodes in G, and S is an
arbitrary sequence of nodes. E denotes the set of edges in G. Lowercase Latin letter n denotes the number
of nodes in the network. The letter m denotes the number of edges in a routing DAG rooted at some target
vertex. When m is used to specify the complexity of an algorithm it represents the maximal number of edges
in any routing DAG. Natural numbers and indexes of arrays and sequences are denoted by i, j, k, or l. The
letters r, s, t, u, v, w, x, y, and z represent nodes. p and P r are used to represent a probability. R(s, u, v, t)
is a quaternary function that encodes the routing scheme in the network.
Greek letters represent an influence on the traffic. ρ is used to denote the sampling rate of monitors. In
previous subsections we defined several pairwise dependencies of a pair of source-target nodes on another
node (δs,t (v)), set of nodes (δ̈s,t (M )), or sequence of nodes (δ̃s,t (S)). In general the double-dot accent ¨
is added to functions related to RBC of sets and the tilde accent ˜ is added to functions related to RBC
of sequences. λ, that was defined in Section 4.3, and χ, that will be defined in section 5.3, also represent
pairwise dependencies like δ. However, λ refers to the probability of a packet not being sampled prior to
reaching the argument node or nodes and χ refers to the probability of a packet not being sampled at all.
We use a bullet (•) with δ, λ, or χ instead of the subscript indexes to indicate that the pairwise dependency
is summed over all sources and/or targets. RBC is denoted using bullets instead of both subscript indexes
(s and δ•,• (v)). Target dependency will be denoted in the next section using a bullet instead of the first
subscript index (δ•,t (v)).
25