On Variants of Shortest-Path Betweenness Centrality and Their Generic Computation
On Variants of Shortest-Path Betweenness Centrality and Their Generic Computation
On Variants of Shortest-Path Betweenness Centrality and Their Generic Computation
136-145
Abstract
Betweenness centrality based on shortest paths is a standard measure of control utilized in numerous studies and implemented in all relevant software tools for network
analysis. In this paper, a number of variants are reviewed, placed into context, and
shown to be computable with simple variants of the algorithm commonly used for
the standard case.
Key words: Betweenness centrality, algorithms, valued networks, load centrality
Introduction
Centrality is a core concept for the analysis of social networks, and betweenness is one of the most prominent measures of centrality. It was introduced
independently by Anthonisse (1971) and Freeman (1977), and measures the
degree to which a vertex is in a position of brokerage by summing up the
fractions of shortest paths between other pairs of vertices that pass through
it. Betweenness is therefore classified as a measure of mediation in Borgatti
and Everett (2006).
While some centrality indices impose requirements on the network data such
as undirectedness and connectedness, Anthonisse (1971) defines betweenness
on directed networks from the very beginning, and Freeman (1977) stresses
applicability to disconnected networks. In this paper, we review and propose
variations of the definition of betweenness that either generalize it to even more
types of network data, or modify slightly the embodied notion of brokerage.
1
12 November 2007
All of the variants treated in this paper, however, maintain the original model
of geodesic trajectories (Borgatti, 2005), i.e., only shortest paths are considered in their definition. This excludes alternative models of transportation
based on, e.g., network-flow (Freeman et al., 1991) or current-flow (Newman,
2005; Brandes and Fleischer, 2005), and is due to the original motivation for
writing this paper, namely to document that none of these variants poses additional algorithmic challenges, as all of them can be calculated with minor
modifications of the efficient betweenness algorithm introduced in Brandes
(2001).
The contribution of this paper is thus two-fold:
(1) A compilation of shortest-path betweenness variants.
(2) Algorithms for computing them efficiently.
As part of the first contribution we also clarify relations with other measures
like stress (Shimbel, 1953) and load (Goh et al., 2001), but there will be no
discussion in favor or against substantive applicability of these variant measures. The second contribution is especially valuable for software tools offering
betweenness computation, since existing implementations can be adapted with
little effort. In fact, most of the modifications discussed in the sequel can be
combined in a single implementation. Let me stress again, however, that an
evaluation of the appropriateness and usefulness of any of these variants, or
their combination, for substantive network research is beyond the scope of this
paper. For recent overviews of centrality measures and associated algorithms
see Brandes and Erlebach (2005, Chapters 35) and Borgatti and Everett
(2006).
In the next section, we recall the definition of shortest-path betweenness centrality and outline its efficient computation. The variations are presented in
Section 3 together with corresponding modifications to the basic algorithm,
and we conclude with a brief discussion.
Shortest-Path Betweenness
2.1
X
s,tV
(s, t|v)
(s, t)
2.2
Basic Algorithm
Efficient computation of betweenness is based on the fact that the cubic number of pair-wise dependencies (s, t|v) = (s, t|v)/(s, t) can be aggregated
without computing all of them explicitly. Defining one-sided dependencies
(s|v) =
(s, t|v) ,
tV
(s|v) =
w:
(v,w)E and
dist(s,w)=dist(s,v)+1
(s, v)
(1 + (s, w)) .
(s, w)
(1)
as shown in Brandes (2001). This recursive relation asserts that the dependency of a vertex s on some v can be compiled from dependencies on vertices
one edge farther away.
P
We have already argued that the basic algorithm is suitable for vertex betweenness in both directed and undirected graphs. In this section, we discuss
modifications that require adjustment of the basic algorithm.
3.1
Endpoints
paths problem minus one equals the increase of cB (s) due to including s as a
source. The increase due to including targets can be added one by one, since
there is a connected pair s, t involving t as a target if and only if t is popped
from stack S during the accumulation for source s.
If desired, we are therefore in a position to restrict the inclusion of endpoints
to either sources or targets, e.g., because we are in a scenario in which only
the source has control over a connection, but not the target.
3.2
Proxies
Above we allowed for the possibility that sources and/or targets have an influence. In extreme cases, one might even argue that only the source or the target
have control over their pair-wise connection. Betweenness would then reduce
to determining the number of actors that can be reached from a source or that
can reach a target. In Algorithm 2, this corresponds to simply excluding the
dependencies from the betweenness summation.
Similarly, we might attribute a special role to yet another position, namely
the first or last actor other than source and target. Borgatti argues 2 that the
final intermediate can be special, because it is with this actor that the target
interacts directly and may thus depend on entirely. Penultimate actors in a
shortest-path connection are thus seen as proxies having full control over the
connection using this path.
Algorithm 3: Proximal betweenness
output: proximal betweenness cBps [v], cBpt [v] for all v V (initialized to 0)
H accumulation
for v V do [v] 0
while S not empty do
pop w S
for v Pred [w] do
[v]
[v] [v] + [w]
(1 + [w])
if v 6= s then
[v]
cBps [v] cBps [v] + [w]
// v is proximal source for w
else
cBpt [w] cBpt [w] + [w] // w is proximal target for v = s
cBps (v) =
sV
t : (v,t)E
(s, t|v)
.
(s, t)
3.3
Bounded-distance Betweenness
3.4
Distance-scaled Betweenness
Algorithm 4: k-Betweenness
input: directed graph G = (V, E), length bound k
output: k-betweenness for all v V (initialized to 0)
for s V do
H single-source shortest-paths problem
I initialization
while Q not empty do
dequeue v Q; push v S
foreach vertex w such that (v, w) E do
H path discovery // w found for the first time?
if dist[w] = then
dist[w] dist[v] + 1
if dist[w] k then enqueue w Q
I path counting
X
s6=tV
1
(s, t|v)
.
dist(s, t) (s, t)
The rationale here is that, the longer a path, the less valuable it may be
to control it. We refer to this measure as length-scaled betweenness, because
the dependencies are scaled by a factor that depends only on the length of a
shortest path and is thus the same for all its inner vertices. This should be
compared to the next measure introduced in this subsection. Since each new
target enters the accumulation by adding one to the dependencies accumulated
so far, we replace this value by its weighted equivalent, 1/dist(s, t), as shown
in Algorithm 5.
Algorithm 5: Length-scaled betweenness
H accumulation
for v V do [v] 0
while S not empty do
pop w S
for v Pred [w] do
[v]
1
[v] [v] + [w]
( dist(s,t)
+ [w])
if w 6= s then cBdist [w] cBdist [w] + [w]
A complementary variant is implicitly introduced in Geisberger et al. (2008),
where the actual goal is the efficient approximation of standard betweenness
centrality. Although designed to be used in a betweenness estimator, it appears
to be a plausible variant of betweenness by itself, because the relative position
9
X
s6=tV
.
dist(s, t) (s, t)
Different from the previous variant, it is not the length of a path that modifies
the dependency on v, but vs relative distance from the source. The farther
away from the source, and hence the closer to the target, the more influential
a vertex is. This measure thus generalizes proximal source betweenness, and
can be used to generalize proximal target betweenness by replacing dist(s, v)
with dist(v, t). Observe, however, that linearly scaled betweenness equals the
usual betweenness in undirected graphs, since the relative distances on (s, t)and (t, s)-paths sum to one.
The additional factor of dist(s, v) is not propagated in Algorithm 6, because
it must not contribute to vertices closer to the root (which have their own,
smaller factor).
3.5
Edge Betweenness
2
1
1
0
shown in Algorithm 7.
An alternative and more general approach for defining edge centralities is
based on the following construction. The line graph L(G) of a graph G consists
of a vertex uv for each edge (u, v) of G, and there is an edge between uv and wx,
if and only if v = w. If G is undirected, the undirected line graph is obtained
by identifying the two vertices created for two directed edges corresponding
to one undirected edge. In other words, the line graph of an undirected graph
G = (V, E) is the graph with vertex set E in which two vertices are adjacent,
if and only if the corresponding edges share a vertex.
The following theorem indicates that the approach via vertex betweenness in
the line graph does not yield the same results as the direct definition of edge
betweenness, and that the difference between the two approaches cannot be
overcome. Kosch
utzki et al. (2005) also give substantive reasons why the line
graph approach, though elegant, may be inappropriate for many application
scenarios.
11
Theorem 1 Vertex betweenness in a line graph L(G) differs from edge betweenness in the original graph G. In fact, if G is undirected, edge betweenness
in G may not equal any vertex centrality that can be defined on L(G).
3.6
Group Betweenness
Everett and Borgatti (1999) define the group betweenness, cB (C), of a subset
C V as
X (s, t|C)
cB (C) =
s,tV (s, t)
where the numerator counts the number of shortest (s, t)-paths containing any
vertex of C as an inner vertex.
Algorithm 8: Group betweenness
input: directed graph G = (V, E), group C V
output: group betweenness cB (C) (initialized to 0)
H accumulation
for v V do [v] 0
while S not empty do
pop w S
for v Pred [w] do
if w C then i 0 else i [w]
[v]
[v] [v] + [w]
(1 + i)
if w C \ {s} then cB (C) cB (C) + [w]
3.7
Actors of Sorts
The actors in a network need not be of the same kind, or may differ with
respect to attributes of interest. Such scenarios are frequently modeled by networks that are referred to as two-mode or, more generally, k-mode networks.
However, two-mode networks are also often equated with bipartite graphs and
represented in rectangular matrices, in other words restricted to cases where
there are no ties between actors of the same mode.
To avoid the confusion with k-partite graphs, we here refer to networks with
a partitioned set of actors, who may nevertheless be connected independent
of the partition, as multi-sort networks, and the partition classes are called
sorts.
Flom et al. (2004) introduce variants of betweenness to determine actors that
bridge between actors of different sorts, defined, e.g., by persons having an
infectious disease and persons that do not. Given a graph G = (V, E) with a
partition of the vertices into sorts A and B, they define two measures Q1 , Q2 in
which dependencies are counted only for pairs of vertices s, t of opposite sorts
as follows. Let 1 and 1 be defined like and , but restricted to paths that
consist of a sequence of vertices of one sort possibly followed by a sequence of
vertices of the other sort, i.e. along which group membership changes at most
once.
cQ1 (v) =
X
sA, tB
1 (s, t|v) +
1 (s, t|v)
sB, tA
13
cQ2 (v) =
(s, t|v) +
sA, tB
(s, t|v)
sB, tA
In the undirected graphs considered in Flom et al. (2004), s and t are chosen from A and B symmetrically. For directed graphs, it is however relevant
whether paths are restricted to run from A to B or vice versa. The above definitions are easily adjusted by leaving out one of the two sums. Normalization
is disregarded again, because it does not pose algorithmic challenges.
Algorithm 9: Q-measures
input: directed graph G = (V, E) with two sorts of vertices V = A ] B
output: Q-measures cQ1 [v], cQ2 [v] for all v V (initialized to 0)
for s V do
H single-source shortest-paths problem
I initialization
for t V do 1 [t] 0
1 [s] 1
while Q not empty do
dequeue v Q; push v S
foreach vertex w such that (v, w) E do
I path discovery
H path counting
if dist[w] = dist[v] + 1 then
if s, v or v, w of same sort then 1 [w] 1 [w] + 1 [v]
[w] [w] + [v]
append v Pred [w]
H accumulation
for v V do [v] 0; 1 [v] 0
while S not empty do
pop w S
for v Pred [w] do
if s, w of different sort then i 1 else i 0
[v]
1 [v] 1 [v] + 11[w]
(i + 1 [w])
[v] [v] +
[v]
[w]
(i + [w])
if w 6= s then
cQ1 [w] cQ1 [w] + 1 [w]
cQ2 [w] cQ2 [w] + [w]
The see why this algorithm works, let us first consider Q2 . Similar to the
dependency filtering used for group betweenness above, we modify the propagation of dependencies. Instead of blocking the propagation of dependencies
from vertices farther from the source, however, we here propagate dependencies only for targets that are of a sort different from the source s.
Since the same pairs of sources and target are considered, the same strategy
works for Q1 , but in addition we have to restrict the shortest paths counted in
1 to those that change sorts at most once. This is done during breadth-first
search from a source s by passing on path counts from a predecessor v to some
w only if at least one of the following two statements is true: s and v are of
the same sort, or v and w are of the same sort. This is because otherwise s
and w are of the same sort and v is an intermediate vertex of the other sort,
so that sorts are changed twice along the (s, w)-path via v.
In principle, both definition and computation strategy carry over to k-sort
networks, but the generalization is not straightforward. This is because of the
exponentially growing numbers of combinations of sorts for source-target pairs
and admissible sort changes along paths.
3.8
Dichotomous ties are either present or absent, and therefore frequently yield
only coarse approximations to the actual relationships being modeled. If valued
ties are measured because they represent a situation more accurately, also the
definition of betweenness centrality should be based on these values.
Alternative measures of betweenness for valued networks have been proposed
(Freeman et al., 1991; Newman, 2005), but are not based on shortest paths.
In fact, different models of transportation (Borgatti, 2005) are assumed, so
that these measures can not be seen as proper generalizations of shortest-path
betweenness.
Although shortest-path betweenness centrality can indeed be generalized to
valued networks while preserving the underlying assumptions, it is important
to be clear about the semantics of those values, so that we will actually end
up with two different generalizations.
This is in resonance with the two essential factors that determine shortestpath betweenness centrality, namely the length of paths (default: number of
edges) and their relative weight (default: uniform). Both these factors can be
adapted in the presence of edge values.
The common generalization of shortest paths to graphs with edge values con15
sists in a re-definition of the length of a path as the sum of values on its edges.
This is a proper generalization, since assigning uniform values of one to the
edges of a non-valued graph yields a path length that equals the number of
edges. The values are referred to as edge lengths, and shortest paths according
to this generalization can be computed in much the same way by replacing
the breadth-first search of the basic algorithm by a more general algorithm
for the single-source shortest-path problem.
The generalization interpreting values as lengths is referred to as weighted
betweenness in Brandes (2001), where also the use of Dijkstras algorithm (see
any of the above cited textbooks on algorithms) is proposed. The necessary
modifications are shown in Algorithm 10 and basically consist of replacing
the breadth-first search queue by a priority queue, which worsens the running
time bound from O(nm) to O(nm + n2 log n), though.
Interpreting edge values as lengths implies, however, that higher values reflect
weaker ties. If higher values indicate stronger ties, for instance in a communication network with frequency-of-contact values on the edges, one might
instead consider reversing the order of all edge values x, e.g., by subtracting
from an upper bound like the maximum plus one, or by taking the inverse
1
or a negative exponential ex . With these transformed values, the previous
x
algorithm can still be applied, but its use can be problematic, since a sum of
such rescaled values might not reflect the intuition of distance between two
vertices.
16
1/4
1/3
1/2
2/3
1/4
1/3
t
1/2
1/2
t
1/3
1/3
Fig. 2. Contribution of a single pair (s, t) to load (left) and betweenness (right).
3.9
[w]
|Pred[w]|
Discussion
References
Anthonisse, J. M., October 1971. The rush in a directed graph. Tech. Rep. BN
9/71, Stichting Mathematisch Centrum, 2e Boerhaavestraat 49 Amsterdam.
Bollobas, B., 1998. Modern Graph Theory. Vol. 184 of Graduate Texts in
Mathematics. Springer-Verlag.
Borgatti, S. P., 2005. Centrality and network flow. Social Networks 27, 5571.
20
Borgatti, S. P., Everett, M. G., 2006. A graph-theoretic perspective on centrality. Social Networks 28, 466484.
Brandes, U., 2001. A faster algorithm for betweenness centrality. Journal of
Mathematical Sociology 25 (2), 163177.
Brandes, U., Erlebach, T. (Eds.), 2005. Network Analysis. Vol. 3418 of Lecture
Notes in Computer Science. Springer-Verlag.
Brandes, U., Fleischer, D., 2005. Centrality measures based on current flow. In:
Proceedings of the 22nd International Symposium on Theoretical Aspects
of Computer Science (STACS05). Vol. 3404 of Lecture Notes in Computer
Science. pp. 533544.
Brandes, U., Pich, C., 2007. Centrality estimation in large networks. International Journal of Bifurcation and Chaos 17 (7), 23032318.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C., 2001. Introduction
to Algorithms, 2nd Edition. MIT Press.
De, P., Singh, A., Wong, T., Yacoub, W., Jolly, A., 2004. Sexual network
analysis of a gonorrhoea outbreak. Sexually Transmitted Infections 80, 280
285.
Diestel, R., 2000. Graph Theory, 2nd Edition. Graduate Texts in Mathematics.
Springer-Verlag.
Everett, M. G., Borgatti, S. P., 1999. The centrality of groups and classes.
Journal of Mathematical Sociology 23 (3), 181201.
Everett, M. G., Borgatti, S. P., 2005. Ego network betweenness. Social Networks 27, 3138.
Flom, P. L., Friedman, S. R., Strauss, S., Neaigus, A., 2004. A new measure
of linkage between two sub-networks. Connections 26 (1), 6270.
Freeman, L. C., 1977. A set of measures of centrality based upon betweenness.
Sociometry 40, 3541.
Freeman, L. C., 1979. Centrality in social networks: Conceptual clarification I.
Social Networks 1, 215239.
Freeman, L. C., Borgatti, S. P., White, D. R., 1991. Centrality in valued
graphs: A measure of betweenness based on network flow. Social Networks
13 (2), 141154.
Geisberger, R., Sanders, P., Schultes, D., 2008. Better approximation of betweenness centrality. In Proceedings of the 10th Workshop on Algorithm
Engineering and Experimentation (ALENEX08). To appear.
Goh, K.-I., Kahng, B., Kim, D., 2001. Universal behavior of load distribution
in scale-free networks. Physical Review Letters 87 (27), 14.
Goodrich, M. T., Tamassia, R., 2002. Algorithm Design. Wiley.
Gould, R. V., 1987. Measures of betweenness in non-symmetric networks. Social Networks 9, 277282.
2006. Algorithm Design. Addison-Wesley.
Kleinberg, J., Tardos, E.,
Kosch
utzki, D., Lehmann, K. A., Peeters, L., Richter, S., Tenfelde-Podehl, D.,
Zlotowski, O., 2005. Centrality indices. In Brandes and Erlebach (Eds.)
Lammer, S., Gehlsen, B., Helbing, D., 2006. Scaling laws in the spatial structure of urban road networks. Physica A 363 (1), 8995.
21
Moody, J. W., 2006. Diffusion and visualization in dynamic networks. Paper presented at the Sunbelt XXVI Social Network Conference (Vancouver,
Canada).
Newman, M. E. J., 2001. Scientific collaboration networks II: Shortest paths,
weighted networks, and centrality. Physical Review E 64, 016132.
Newman, M. E. J., 2004. Analysis of weighted networks. Physical Review E
70, 056131.
Newman, M. E. J., 2005. A measure of betweenness centrality based on random
walks. Social Networks 27, 3954.
Newman, M. E. J., 2006. Erratum: Scientific collaboration networks II:
Shortest paths, weighted networks, and centrality. Physical Review E 73,
039906(E).
Newman, M. E. J., Girvan, M., 2004. Finding and evaluating community structure in networks. Physical Review E 69, 026113.
Perer, A., Shneiderman, B., 2006. Balancing systematic and flexible exploration of social networks. IEEE Transactions on Visualization and Computer Graphics 12 (5), 693700.
Puzis, R., Elovici, Y., Dolev, S., to appear. Fast algorithm for successive computation of group betweenness centrality. Physical Review E.
Shimbel, A., 1953. Structural parameters of communication networks. Bulletin
of Mathematical Biophysics 15, 501507.
Valente, T. W., 2006. Bridges and potential bridges: Changing links to find
critical paths and nodes in a network. Paper presented at the Sunbelt XXVI
Social Network Conference (Vancouver, Canada).
White, D. R., Borgatti, S. P., 1994. Betweenness centrality measures for directed graphs. Social Networks 16, 335346.
22