A Graph-Based Framework For Real-Time

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

A Graph-based Framework for Real-time

Vulnerability Assessment of Road Networks


Angelo Furno, N. E. El Faouzi Rajesh Sharma Valerio Cammarota, Eugenio Zimeo
Univ Lyon, ENTPE, IFSTTAR, LICIT UMR T9401 University of Tartu University of Sannio
Lyon, France Tartu, Estonia Benevento, Italy
Email: {angelo.furno, nour-eddin.elfaouzi}@ifsttar.fr Email: rajesh.sharma@ut.ee Email: zimeo@unisannio.it

Abstract—The ability to detect critical spots in transportation of failures and other unpredictable events [4], [21], [32], [33].
networks is fundamental to improve traffic operations and A robust transportation network is able reducing the impact of
road-network resilience in smart cities. Real-time monitoring such failures on traffic flow, human safety and urban economy.
of these networks, especially in very large metropolitan areas,
is a compelling challenge due to the complexity of computing Vulnerability can be viewed as the flip-side of resilience: a
robustness metrics. This paper presents a framework for identi- vulnerable system has limited ability to absorb and react to
fying vulnerabilities in very-large road networks. The framework the strains caused by adverse events. The availability of real-
adopts graph-based modeling of road networks and exploits time information provided by a growing number of sensors
big-data techniques and technologies for processing such large and small devices distributed across geographic areas makes
and complex graphs. First, we use the framework to prove the
existence of significant correlation between global efficiency and it possible today to use novel approaches to study robustness
betweenness centrality. Then, we focus on an efficient algorithm, and adaptation ability of cities against possible unpredictable
integrated in the framework, to rank the nodes according to this events, thus reducing their vulnerability. Moreover, the adop-
metric for finding potential vulnerabilities of a road network. To tion of cloud computing and big-data techniques to effectively
keep computation time under a “quasi” real-time threshold, a analyze this large amount of data paves the way to more
fast, requirement-driven, approximated strategy for computing
betweenness centrality is adopted. The evaluation shows that the quantitative studies to characterize city infrastructures. City
algorithm, integrated in the framework, exhibits a very good key indicators derived from quantitative complex analyses will
approximation for the most critical nodes, thus being well-suited help planners and mobility operators to clearly identify vul-
for on-line operational monitoring. nerabilities and adopt relevant strategies to mitigate risks. This
Index Terms—Smart Transportation, Network Resilience, Be- is particularly important for physical transportation networks
tweenness centrality, Contingency Analysis, Big Data.
where accidents, weather conditions and social events create
everyday risky situations for their correct operation.
I. I NTRODUCTION
In this paper, we present a graph-based framework that helps
Nowadays, cities are facing unprecedented challenges due to identify the impact of breakage on very large road networks.
to urbanization, climate change, and fast technology advances. The framework takes into consideration the existence of sig-
They will be forced to cope with growing populations and nificant correlations between network properties and structural
connected (digital and physical) infrastructures whose robust- features of road networks, by focusing, in this work, only on
ness and resilience will become an essential property for their the nominal state of the transportation network with correctly
continuous operation. operating links and free-flow traffic conditions.
Resilience describes the ability of a given system to provide First, we analyze vulnerability using Global Efficiency
fundamental services to people without discontinuity, even in (GE) [24], a network metric that has been widely used to
presence of adverse or catastrophic events. As a consequence, characterize the overall resilience of power grids [22], the
it will be a key property for achieving smartness in cities of the Internet [27], biological [1] and transportation [21] networks.
near future. Resilience concerns several infrastructures used Then, to spot the most vulnerable nodes inside the network
for transporting goods or people: energy, water, information we use Betweenness Centrality (BC), a fundamental metric of
and road networks are a few examples. An unforeseen event centrality already used in transportation to identify topolog-
that originates a breakage in one of these infrastructures ical criticalities [4], [21], and traditionally preferred to other
may cause incalculable damages with serious socio-economic centrality metrics such as degree and closeness centrality [32].
consequences. BC has also been adopted for analyzing and predicting traffic
Transportation networks are bound to controllable (e.g., flows in transportation networks [15], [19]. However, studying
riots, technical errors, etc.) and uncontrollable events (e.g., the impact of network breakage on road traffic remains a
earthquakes, floods, etc.) that can easily cause disruptions very complex task, difficult to be treated with traditional
in the smooth operation of their services. In transportation desktop hardware and state-of-the-art algorithms, especially
literature, resilience is widely considered crucial to understand in large metropolitan areas with a large amount of links and
and quantify network robustness with respect to different kinds intersections.
The lowest time complexity for computing exact BC has [12], [14], [16], [20]. However, such shortcomings can be
been achieved by the algorithm proposed in [5]. Even though overcome by augmenting the graph representation of the
this algorithm is highly parallelizable, effective real-time vul- networks by taking into account also spatio-temporal aspects
nerability monitoring of large agglomerations requires much (e.g., congestion, accidents, road capacity changes, etc.) and
faster solutions on reasonable hardware. To that end, we geometric properties of road network [33], mapping them
propose the adoption in our framework of an approximated on a weighted dynamic graph. This additional information
technique to compute BC. The algorithm has been proposed by contributes to improve the effectiveness of the analysis but
the authors themselves in previous work [9], [10] to accelerate does not impact performance for searching relevant nodes in
computation of BC, by keeping the error below a reasonable very large networks, which is the main objective of this work.
threshold depending on the specific requirements on compu- For the reasons above, we assume that road networks are
tation time. In this paper, we integrate the algorithm in our in steady state and modeled as unweighted and undirected
proposed framework and prove the technique to be adequate graphs. This assumption has been adopted also in other studies
for real-time M -contingency analysis of road networks due to [4], [21], [32]. Some of them have tried to understand the
its ability of tuning precision and performance. effect of breakage, especially of nodes selected via BC, on
M -Contingency analysis is a simulation-based technique transportation network in various parts of the world, such
that considers different qualitative assumptions over M likely as Toronto [21], Melbourne [25], Sweden [17] and other
events or phenomena in order to imagine different scenarios metropolitan cities [4]. In comparison, no other study has
and come up with optimal responses under the analyzed been performed on Lyon metropolitan road network, which,
circumstances. In the paper, the analysis is conducted on the differently from most other studied cases, is a non-scale-free
road network of Lyon, France graph1 [26]. Moreover, the datasets being used in previous
The rest of the paper is organized as follows. In Sec. II, we studies are relatively small, that is in the order of thousands of
present related work. In Sec. III, we describe our model and nodes. A larger size has an important impact on performance
the metrics used to represent road-networks and their vulnera- as the computation time for calculating BC is very high.
bilities, respectively. In Sec. IV, we present the framework Very recently, some researchers have proposed different
for performing M-contingency analysis and in Sec. V the (exact or approximated) solutions to reduce the computation
large-scale dataset used for testing the framework. Sec. VI time of BC [3], [5], [29], [31]. In [23], an efficient algorithm
shows the result of the evaluation of the framework and for calculating BC has been proposed for incremental BC
in particular of the algorithms adopted with the considered computation. However, the high speedup, characterizing the al-
dataset. Sec. VII reports instead on a synthetic performance gorithm when one single node is inhibited, drastically reduces
index that can be helpful for a user to configure the framework when analyzing the impact of M nodes. These incremental
towards the desired behavior depending on the specific domain algorithms are very fast and useful when network changes are
requirements. Sec. VIII concludes the paper by also discussing limited and continuous, but M -contingency analysis is a much
future directions. more complex and challenging problem as it should consider
M different perturbations of the network to simulate possible
II. R ELATED W ORK critical scenarios as consequence of catastrophic events or
Various approaches have been proposed for quantifying the phenomena.
vulnerability of transportation networks, traditionally in off- In conclusion, a proper contingency tool for vulnerability es-
line mode. Works include study of the impact of breakage timation in large-scale transportation networks is still lacking.
in transportation networks in single-layer [21], multilayer [7] In our study, we focus on reducing the computational effort
and multi-modal networks [30], by introducing random [7] required to compute network metrics for vulnerability analysis,
or central (or critical) nodes failure [21]. Similarly, in a in light of proposing a solution for on-line and continuous
recent work [13], some authors of this paper have leveraged a monitoring over large-scale networks.
mesoscopic fully fledged traffic simulator to perform contin-
gency analysis for resilience evaluation, clearly showing the III. N ETWORK MODELING AND METRICS
limitations in terms of computation time of traditional traffic In this section, we introduce the model adopted for repre-
simulation tools, which prove to be inappropriate for real- senting road networks and the metrics used to measure their
time vulnerability assessment, but much accurate in capturing degree of vulnerability. Such representation and metrics are
complex traffic and user dynamics beyond basic topological adopted as part of the framework for vulnerability assessment
network properties. described in Sec. IV.
In graph-based modeling, one of the most used metric to
A. Network representation
calculate nodes’ centrality is BC [8], especially in transporta-
tion network settings [32]. A large stream of transportation We model transportation networks as undirected graphs
studies have exploited BC for traffic flow prediction (e.g., G(V , E), where V denotes nodes and E ⊆ V × V edges.
[2], [12], [14], [33]) or vulnerability quantification (e.g., [4], 1 A scale-free graph is characterized by a power-law degree distribution,
[21], [32]). In this context, some authors have highlighted i.e., a very limited set of nodes in the graph has a very large degree, while
limitations of the BC metric in representing traffic dynamics the majority of the nodes has only a few neighbors.

2
Each node vi represents an intersection in the network and an 1) Global Efficiency (GE): represents the ability to effi-
edge eij between nodes vi and vj represents a road. N = |V | ciently exchange information in the network [24]. Let
denotes the number of nodes in the graph. N be the total number of nodes in the network, and
A path p(vi , vj ), between two nodes vi and vj , consists len(sp(vi , vj )) the length of the shortest path between
of a set of nodes and edges that connect these two nodes. If any two nodes vi and vj , the GE of the network G is
this set does not exist, the graph is disconnected in separated defined as:
components. The length of a path between any two nodes vi
1 X 1
and vj , represented by len(p(vi , vj )), is the number of edges GE(G) = . (2)
(or hops) to reach vj from vi . If nodes vi and vj are directly N (N − 1) len(sp(vi , vj ))
i6=j
connected, then the path length is 1. A shortest path between 2) Vulnerability: is defined as the drop in information
any two nodes vi and vj , denoted as sp(vi , vj ), is a path with exchange due to removal of a node and all corresponding
the minimum number of hops among all the paths connecting edges [6]. Formally it is defined as:
the two nodes. Multiple shortest paths may exist between
the same pair of nodes, i.e., all the paths having the same GE − GEvi
,V vi = (3)
minimum number of hops. We denote as σvi vj the number of GE
shortest paths between vi and vj , while σvi vj (vk ) represents where GE is the global efficiency of the original network
the number of shortest paths from vi to vj that cross node vk . as from Eq. 2 and GEvi is the global efficiency after the
removal of node vi and all the edges incident on it.
B. Nodes removal strategies
To analyze the impact of unpredictable events on network IV. O N - LINE FRAMEWORK FOR CONTINGENCY ANALYSIS
vulnerability, we can consider different nodes removal strate- The framework that we propose in this paper aims at
gies, depending on the scenarios we intend to simulate for the providing public administrators and maintenance enterprises
analysis: with an easy-to-use tool for handling contingencies.
1) Possible case: the nodes are removed uniformly at It has been developed as a multi-tier Web application and,
random. This particular strategy captures real scenarios according to this pattern, it is composed of four main logical
where an event can happen at any road or intersection tiers (see Fig. 1): (a) a Web-based client implemented atop
causing traffic disruptions; AngularJS; (b) a front-end tier, exposing REST services to
2) Worst case: removed nodes are the most critical ac- the client; (c) a business logic tier for supporting transactional
cording to a centrality index; they represent possible accesses to data stored in the data-tier and for processing data
vulnerabilities of a road network. when specific events occur, through Spark parallel jobs; (d) a
data tier for handling graph-structured data, implemented atop
The framework we propose in this paper is able to work
Neo4J, and for storing unstructured or semi-structured data
in both the scenarios. However, for performing the correlation
through HDFS.
analysis between vulnerabilities and graph metrics, we target
the worst case scenario.
Client-side Server-side Business Logic Persistence
As a metric to establish a criticality ranking of nodes, we architecture architecture
Orchestrator neo4j
REST API REST API Dynamic graph DB
consider BC [8], since it measures the number of shortest paths Client Web Web Front-End • Inject Faults (remove nodes)
• Start BC algorithms
application • Restful service • Retrieve BC values
crossing a node. Nodes with higher BC correspond to central controllers
persist graph,
read graph
intersections from an infrastructural perspective, being usually BC Algorithms
WebHDFS
write logs

Distributed
• Exact BC
selected by commuters to reach their destinations and naturally • 1C-Fast BC
• 2C-Fast BC
REST API Storage
HDFS

prone to breakdown compared to other nodes. The BC of a


node vk is defined as:
X σvi vj (vk ) Fig. 1. Road Management Framework: multi-tier architecture
BC(vk ) = , (1)
σ vi vj
vi 6=vk 6=vj
The framework Graphical User Interface (GUI) is presented
where σvi vj is the total number of shortest paths from node in Fig. 2. The sidebar on the left enables a user (a) to filter
vi to node vj and σvi vj (vk ) is the number of those paths that nodes on the basis of their BC values, (b) to execute a job with
cross vk . For scaling purpose, values are often normalized by the current configuration of potential faulty nodes and (c) to
dividing them by N · (N − 1). enable GUI auto-refresh. Additionally, it shows the number of
filtered nodes and the status of the computation.
C. Resilience metrics Faults, on the other hand, can be injected directly on the
Network partitioning and network ability to move vehicles map, shown on the right side of the GUI. Here, each node is
among nodes with a short number of hops might be useful represented by a marker whose selection activates a popover to
metrics to evaluate the impact of network breakages on trans- inject a fault. Several faults could be selected before triggering
portation networks. Therefore, to quantify the resilience of a the computation of the new values of betweenness centrality
road network, we consider the following metrics: through the processing of the underlying graph (job execution).

3
Marker colors map nodes to a discrete representation of BC,
based on three levels: green, 0 ≤ BC < x; orange, x ≤ BC <
y; red, BC ≥ y.

(a) Map (in red) of the analyzed road network and its
geographical extent

Fig. 2. Road Management Framework: GUI

By using the framework, a M-contingency analysis can be


conducted, by identifying the potential critical nodes in case M
hypothetical faults are injected into the network. Similarly, by
exposing the REST API of the framework to external sensors,
real faults could be captured and projected into the graph to
analyze in quasi real-time the new network configuration. The
dynamic interactions among the main software components of
the system are described by the sequence diagram of Fig. 3.
(b) Most critical nodes according to the BC metric (A6 and
D383 highways)
Fig. 4. The Grand Lyon road traffic network.

In the original graph, some roads are split in multiple


segments to capture properties that change along the road itself
(e.g., street name, road slope, etc.). This means that a single
long road with no real intersections along it can be actually
composed of multiple nodes and edges in the graph. We
Fig. 3. Road Management Framework: dynamic diagram capturing the main decided to filter out this fine-grained information, by retaining
interactions among the framework components.
only actual intersections and link endpoints as the nodes of
our dataset. Next, to avoid a disconnected graph, we selected
When a fault is injected into the system, the graph modeling
only the largest connected component (termed GLyon in the
the road network is changed and consequently a new process-
following) by filtering out very small isolated subnetworks
ing job can be started to execute the algorithms proposed by
that typically include country roads or cut areas lying at the
the authors in [9], [10], whose code is written in Scala. The job
border of the analyzed region. Finally, due to partial available
is executed asynchronously on the Spark framework and, once
information on road driving direction and number of lanes,
terminated, the client refreshes the map to show the potential
we treat the network as undirected. We remark here that the
up-to-date vulnerabilities of the new network configuration.
solutions described in the next section can be easily extended
V. DATASET to work with directed and weighted graph.
To test the framework with our algorithms for fast BC The final GLyon graph has 75,474 nodes and 96,406 edges.
processing, we consider a graph corresponding to the road Nodes with the highest values of BC mostly correspond to
network of the Grand Lyon metropolitan area, France, and highway interchanges (e.g., the A6 and D383) and roads
its surroundings, covering an area of approximately 3,000 that cross bridges, as reported in Fig. 4b. These represents
Km2 (see Figure 4a). This dataset was created using digital elements of the network with limited alternative routes, and
maps supplied by the French National Institute of Geographic thus highly vulnerable spots from a topological perspective.
Information (IGN). The network consists of 112,567 nodes The graph has a density of 3.38e−05 and clustering coefficient
and 240,372 edges. of 0.054. Its average path length and diameter are 84.74

4
0.12 M=2, r=0.895 0.005 0.020
M=4, r=0.928
M=8, r=0.977
0.10 0.004
M=16, r=0.951
M=32, r=0.932
0.015
0.08
Vulnerability

Vulnerability

Vulnerability
0.003
0.06 0.010
0.002
0.04
0.005
0.02 0.001

0.00
0.00 0.05 0.10 0.15 0.20 0.000 1 2 3 4 5 6 7 8 0.000 1 2 3 4 5 6 7 8
Mean BC BC Class BC Class
(a) Vulnerability vs node removal (b) Vulnerability analysis with M = 2 (c) Vulnerability analysis with M = 4
0.045 0.09 0.14
0.040 0.08 0.12
0.035 0.07
0.030 0.06 0.10
Vulnerability

Vulnerability

Vulnerability
0.025 0.05 0.08
0.020 0.04 0.06
0.015 0.03 0.04
0.010 0.02
0.005 0.01 0.02
0.000 1 2 3 4 5 6 7 8 0.00 1 2 3 4 5 6 7 8 0.00 1 2 3 4 5 6 7 8
BC Class BC Class BC Class
(d) Vulnerability analysis with M = 8 (e) Vulnerability analysis with M = 16 (f) Vulnerability analysis with M = 32
Fig. 5. Impact of M nodes removal from different BC-ranges on network vulnerability

and 244, respectively. The graph is therefore very sparse and The number of ranges considered in our analysis is equal
characterized by a Gaussian distribution of the node degree, to eight2 . Hence, we perform five tests by removing M (M =
with an average value of 2.55. 2, 4, 8, 16 and 32) nodes, selected uniformly at random from
each of the ranges. Nodes are inserted back in the graph after
VI. E VALUATION each test with a specific BC range. We repeat each test several
We evaluated the framework with our Scala algorithms [9], times to make our results statistically relevant.
[10] for fast computation of BC, by leveraging multi-core pro- Fig. 5a shows the scatter plots of the impact of M -nodes
cessing for parallel execution. Apache Spark was configured to removal from the eight classes (mean BC of the M removed
work in local mode, using 10 threads to partition the execution nodes on the X-axis) on vulnerability (Y-axis), where M
load of the map-reduce tasks on the available cores of an is represented by different colors and markers, and a linear
Intel Xeon E5 2.4 GHz multi-core machine, equipped with 56 regression shows the relative trend. As expected, vulnerability
virtual cores and 128 GB of DDR4 RAM. We also considered increases sharply as mores nodes (i.e., larger M ) are removed
a Scala implementation of Brandes’ algorithm [5] (referred as from each class. This is especially evident in those classes
Exact-BC), used as a benchmark in performance evaluation. with higher BC values. The linear regression for each of the
Exact-BC was executed in the same testing environment used class shows the linear behavior with different values of slope.
for Fast-BC, with 10 threads for parallelism. Figures 5b:5f plot vulnerability for M = 2, 4, 8, 16 and 32,
A. Betweenness centrality for vulnerability assessment respectively. In each one of these micro-analyses, we observe a
similar pattern by using box plots. The analysis clearly shows
In this section, we use the framework to verify and quantify
that removing nodes with BC in higher classes (on the right
the presence of correlation between vulnerability (as from
side of the X-axis) increases vulnerability more than removing
Eq. 3) and nodes’ betweenness centrality on the GLyon network,
nodes in lower BC classes.
the latter values computed with the Exact-BC algorithm.
Nodes with higher values of betweenness centrality are the
The framework is leveraged to simulate network failures,
ones that affect most vulnerability of a road-traffic network.
by removing M nodes from the graph and by measuring the
Monitoring should continuously identify these nodes since, if
new values of the vulnerability metric after nodes removal.
disrupted, they may significantly reduce network connectivity.
As a criterion to select nodes for removal, we consider the
On the other hand, their inhibition may significantly alter
descending order of BC (i.e., worst-case contingency). Since
network topology, forcing to recompute betweenness centrality
GLyon is characterized by many nodes with very close values
for the whole network.
of BC, to better verify the correlation between vulnerability
and BC, we group the nodes of the graph in multiple ranges
of BC, via Jenks natural breaks classification method [18]. 2 Other numbers of ranges have been tested with similar results.

5
0.300
Exact-BC (Brandes) Perc. Error
3000 0.30
2C-Fast-BC (10 threads) Exact-BC (Brandes)
0.275 1C-Fast-BC 2C-Fast-BC 0.001
0.010% 2500 1C-Fast-BC (10 threads)
0.250 Exact-BC (10 threads) 0.25
2000
Normalized BC

Perc. Error [%]


0.225

Normalized BC
0.005%

Exec. Time [s]


0.200 1500 0.20
0.175 0.000%
0.150 1000 0.15
-0.005%
0.125 500
0.100 -0.010% 0.10
0 20 40 60 80 100 0 20 40 60 80 100 0
Nodes ranked by Exact-BC (top-100) Nodes ranked by Exact-BC (top-100) 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100
K-Fraction Nodes ranked by Exact-BC (top-100)
(a) Values of normalized BC with (b) Percentage error of 1C-Fast-BC (a) Execution time of 2C-Fast-BC (b) BC values with K-Fraction =
1C-Fast-BC and Exact-BC for the over the top-100 nodes with different K-Fractions 0.001
top-100 nodes for the top-100 nodes
Fig. 6. Accuracy of the 1C-Fast-BC algorithm
0.300
Exact-BC (Brandes) 0.300% Perc. Error
0.275 2C-Fast-BC 0.3
0.250 0.200%

Normalized BC

Perc. Error [%]


B. M -contingency analysis: approximated fast BC 0.225
0.100%
0.200
We tested the framework in two different operation modes, 0.175 0.000%
0.150
based on two different algorithms aimed at rapidly computing -0.100%
0.125
approximate BC values. The first mode, called 1C-Fast-BC, 0.100 -0.200%
0 20 40 60 80 100 0 20 40 60 80 100
exploits topology-based clustering to reduce the number of Nodes ranked by Exact-BC (top-100) Nodes ranked by Exact-BC (top-100)

breadth-first-searches performed by the Brandes’ algorithm. (c) BC values with K-Fraction = 0.3 (d) Percentage error with K-Fraction
This strategy has been originally proposed in [29] by the for the top-100 nodes = 0.3
sover the top-100 nodes
authors. The second mode, called 2C-Fast-BC, exploits an
algorithm aimed at further reducing execution time on non- Fig. 7. Performance Evaluation of the 2C-Fast-BC algorithm
scale-free graphs, by preserving acceptable levels of accuracy
in BC approximation. This behavior is desirable when compu-
tation time must be very low and errors are tolerated below a the network.
specific threshold. The 2C-Fast-BC algorithm extends the one Fig. 6 shows the normalized BC values computed by the
used with 1C-Fast-BC by means of an additional clustering 1C-Fast-BC and Exact-BC algorithms for the top-100 nodes,
procedure, implemented via a Map-reduce parallel version of ranked according to their exact value of BC. Fig. 6a highlights
K-means clustering, based on the implementation provided in an almost perfect overlap of the two curves, confirmed by
Apache Spark MLlib machine learning library. The 2C-Fast- the extremely low percentage error in Fig. 6b, bounded in
BC algorithm has been originally presented in [9], [10] by the the range [-0.015%, 0.015%]. In other words, the 1C-Fast-BC
authors, and it has proved to allow for a quicker computation algorithm produces the exact same results (i.e., identification
of BC values than 1C-Fast-BC, at the price of larger error. of the most critical nodes) reported in Fig. 4b for Exact-BC on
It is worth to remark that, as a consequence of the adopted the GLyon graph. However, the algorithm takes 1,688 seconds
algorithms, the 1C-Fast-BC mode corresponds to a parameter- to complete, a fairly limited improvement with respect to the
free configuration of the framework, while the 2C-Fast-BC 1,982 seconds of the Exact-BC.
one requires the setting of one single parameter (i.e., K- Fig. 7a shows that 2C-Fast-BC execution time stays below
fraction) in the range [0, 1], where 0 means tolerating a larger both 1C-Fast-BC and Exact-BC (that do not depend on the
approximation error in order to achieve the smallest possible K-fraction parameter), up to very large values of K-Fraction
computation time. Conversely, a K-fraction of 1 drives the (i.e., 0.8). Larger values of the parameter result in higher
algorithm towards the most precise approximation (the same execution times (larger than the ones of 1C-Fast-BC for K-
as provided by 1C-Fast-BC), but with a larger execution time. Fraction > 0.8) due to the computation overhead associated
In the following, we briefly report on the performance trade- to K-means clustering. Obviously, lower K-Fractions introduce
off characterizing the 2C-Fast-BC algorithm, compared to the a larger approximation that negatively affects BC values even
1C-Fast-BC and the Exact-BC, evaluated in the GLyon case for critical nodes. As an example, Fig. 7b presents BC values
study. The interested reader may refer to our previous work [9] for the top-100 nodes when using an extremely low K-Fraction
for a more detailed description of the algorithms, as well as of 0.001. For this configuration, computation time is extremely
their evaluation on different kinds of graphs (e.g., scale-free, low (69 s) but percentage error is relatively high (i.e., in the
small world, random). range [-12%, 12%]). Conversely, by choosing slightly larger
Given the specific conceptual design of the proposed al- K-Fractions (e.g., a K-Fraction of 0.3), the algorithm quickly
gorithms and our objective of continuously monitoring only converges towards very good levels of accuracy, as highlighted
the most critical intersections of the road-traffic network, the by the perfect overlap with the Exact-BC values in Fig. 7c.
reported performance analysis focuses on the most-critical Such improved approximation is confirmed by a percentage
nodes of the network (e.g., top-100, top-5%, etc.). However, error in the range [-0.2%, 0.3%] (see Fig. 7d). As in the
it is worth to remark that BC is computed for all the nodes of previous case of 1C-Fast-BC, the most critical nodes reported

6
0.8 0.8 1C-Fast-BC 2C-Fast-BC (0.001)
Exact 1C-Fast-BC
0.7 0.7 1.0 1.0
0.6 0.6

Performance

Performance
0.8 0.8
Performance

Performance
0.5 0.5 0.6 0.6
0.4 0.4 0.4 0.4
0.3 0.3 0.2 0.2
2C-Fast-BC (10 threads) 0.2 2C-Fast-BC (10 threads) 0.0 0.0
0.2 1.0 1.0
1C-Fast-BC (10 threads) 0.1 1C-Fast-BC (10 threads) 0.8 0.8
0.1 Exact-BC (10 threads) Exact-BC (10 threads) 0.6 0.6
0.4 0.4

e max

e max
0.0 0.0 0.2 0.4 0.6 0.8 0.0 0.0 0.2 0.4 0.6 0.8 6000 7000 6000 7000
0.2 3000 4000 5000 0.2 3000 4000 5000
K-Fraction K-Fraction 0.0 0 1000 2000 t max 0.0 0 1000 2000 t max

(a) eM ax = 0.25, tM ax = 1982s (b) eM ax = 0.1, tM ax = 1982s (a) Exact-BC vs 1C-Fast-BC (b) 2C-Fast-BC (K-Fraction 0.001)
vs 1C-Fast-BC
0.8 0.6 2C-Fast-BC (10 threads)
0.7 1C-Fast-BC (10 threads) 2C-Fast-BC (0.3) 2C-Fast-BC (0.9)
0.5 Exact-BC (10 threads) 1C-Fast-BC 1C-Fast-BC
0.6
1.0 1.0
0.4
Performance

Performance

0.5

Performance

Performance
0.8 0.8
0.4 0.3 0.6 0.6
0.3 0.2 0.4 0.4
0.2 2C-Fast-BC (10 threads) 0.2 0.2
0.1 1C-Fast-BC (10 threads) 0.1 0.0 0.0
Exact-BC (10 threads) 1.0 1.0
0.0 0.0 0.2 0.4 0.6 0.8 0.0 0.0 0.2 0.4 0.6 0.8
0.8
0.6
0.8
0.6
K-Fraction K-Fraction 0.4 0.4

e max

e max
6000 7000 6000 7000
0.2 3000 4000 5000 0.2 3000 4000 5000
0.0 0 1000 2000 t max 0.0 0 1000 2000 t max
(c) eM ax = 0.05, tM ax = 1982s (d) eM ax = 0.25, tM ax = 600s
(c) 2C-Fast-BC (K-Fraction 0.3) vs (d) 2C-Fast-BC (K-Fraction 0.9) vs
Fig. 8. Performance 1C-Fast-BC 1C-Fast-BC

Fig. 9. 3-d Performance curves with different values of tmax and emax
in Fig. 4b were correctly identified by the 2C-Fast-BC on (tideal = 0s)
the GLyon graph. Most importantly, the 2C-Fast-BC with a K-
Fraction of 0.3 executed in only 609 seconds, corresponding
to a 277% and a 325% decrease in execution time with respect of 0.001. We set tM ax to the execution time of the Exact-BC
to the 1C-Fast-BC and Exact-BC, respectively. algorithm (i.e., 1,982 seconds) and eM ax to 0.25. This setting
rewards computing times below the one of Exact-BC and
VII. A N AGGREGATE INDEX OF PERFORMANCE FOR
errors below 25% NRMSE. Fig. 8a clearly shows the values
FRAMEWORK CONFIGURATION
of K-Fraction satisfying these performance requirements, i.e.,
Since the 2C-Fast-BC algorithm generates approximated BC K-Fraction should be selected within the range [0.05, 0.7],
values in a time depending on the error we can tolerate, with the most performing configuration being associated to a
an aggregated index for evaluating both performance and K-Fraction of 0.15. A comparison with the performance of
accuracy is useful. To this end, we propose the following: Exact-BC and 1C-Fast-BC is visually reported in the figure.
! ! Different settings of eM ax and tM ax are depicted in
tx − tideal ex Figs. 8b:8d. Such settings correspond to more stringent re-
P (x) = 0.5 1 − + 0.5 1 − (4) quirements on the tolerated NRMSE (Figs. 8b and 8c) and
tM ax − tideal eM ax
execution time (Fig. 8d), respectively. Interestingly, a con-
The equation depends on two variables that are computed figuration with an extremely low tolerated execution time of
after executing 2C-Fast-BC in configuration x: 1) the time 600 seconds and maximum NRMSE of 0.25 (Fig. 8d) can
taken (tx ) for BC calculation; 2) the NRMSE error3 (ex ) be satisfied only by using the 2C-Fast-BC with a K-Fraction
of the approximated BC values. The equation includes three between 0.001 and 0.25.
parameters tideal , tM ax and eM ax that can be fixed according In Fig. 9, we exploit 3D plots to represent the surfaces
to the specific domain requirements. Specifically, tM ax is the generated by varying the two parameters tM ax and eM ax in
maximum tolerated execution time. Similarly, ex represents Eq. 4, for both 1C-Fast-BC and 2C-Fast-BC. Fig. 9a compares
the maximum allowed NRMSE. Finally, tideal constitutes the the performance of 1C-Fast-BC to that of Exact-BC. The two
minimum amount of time allowed to the execution of the surfaces are mostly overlapping, thus showing very similar
algorithm. The idea is to compute P (x) during a profiling performance on the GLyon graph. In particular, performance
phase on a given network, where the framework is tested in is slightly better with 1C-Fast-BC when a larger maximum
different modes and with several values of K-fraction. error (i.e., larger than 0.4) can be tolerated. Exact-BC should
In the first setting of the P metric, represented in Fig. 8a, instead be preferred whenever the maximum tolerated time is
we defined tideal equal to the execution time of the fastest large enough (i.e., larger than 1600 seconds) for the algorithm
2C-Fast-BC configuration, i.e., 60 seconds with a K-Fraction to complete, and the maximum tolerated error is smaller than
q P
n
0.2. Different configurations of 2C-Fast-BC are compared to
NRMSE is defined as: σ̄1 n
3 The 1 2
i=1 ei , with σ̄ denoting the mean 1C-Fast-BC in Figs. 9b:9d. Specifically, Fig. 9b is related to
of the exact BC values and ei representing the difference between exact and
approximated values of BC for node i. NRMSE is generally preferred to the the lowest values of K-Fractions (i.e., 0.001) considered in
percentage error when 0-values are present among the expected ones. our evaluation. Good performance can be achieved with this

7
configuration when the maximum tolerated execution time is [8] L. C. Freeman. A set of measures of centrality based on betweenness.
significantly low (e.g., lower than 100 seconds), and a fairly Sociometry, 25:35–41, 1997.
[9] A. Furno, N. E. El Faouzi, R. Sharma, and E. Zimeo. Reducing pivots
large error on BC values can be tolerated (e.g., larger than of approximated betweenness computation by hierarchically clustering
0.5-0.6). Configurations with larger K-Fractions (e.g., 0.3 in complex networks. International Conference on Complex Networks and
Fig. 9c) appear to be generally preferable when requirements Their Applications, 2017.
[10] A. Furno, N. E. El Faouzi, R. Sharma, and E. Zimeo. Two-level
are not particularly stringent both on BC error and computation clustering fast betweenness centrality computation for requirement-
time. Finally, configurations with larger K-Fractions tend to driven approximation. IEEE Big Data 2017 Conference, 2017.
exhibit performance that are very close to that of 1C-Fast-BC [11] A. Furno, M. Fiore, R. Stanica, C. Ziemlicki, and Z. Smoreda. A tale
of ten cities: Characterizing signatures of mobile traffic in urban areas.
(see Fig. 9d). IEEE Transactions on Mobile Computing, 16(10):2682–2696, 2017.
[12] S. Gao, Y. Wang, Y. Gao, and Y. Liu. Understanding urban traffic-flow
VIII. C ONCLUSION characteristics: a rethinking of betweenness centrality. Environment and
Planning B: Planning and Design, 40(1):135–153, 2013.
We have presented a big-data framework and its tuning [13] P. Gauthier, A. Furno, and N. E. El Faouzi. Road network resilience:
capabilities for performing real-time M -contingency analysis. how to identify critical links subject to day-to-day disruptions? 97th
The framework is based on fast algorithms for computing TRB Annual Meeting, 2018.
[14] P. Holme. Congestion and centrality in traffic flow on complex networks.
betweenness centrality in a graph-based representation of road Advances in Complex Systems (ACS), 06(02):163–176, 2003.
networks. This metric, in fact, has proved to be strongly related [15] A. Jayasinghe, K. Sano, and H. Nishiuchi. Explaining traffic flow
to vulnerabilities, identified via the decrease of the global patterns using centrality measures. International Journal for Traffic &
Transport Engineering, 05(02):134–149, 2015.
efficiency of a network. [16] A. Jayasinghe, K. Sano, and H. Nishiuchi. Explaining traffic flow
The results are very promising since the algorithms, on the patterns using centrality measures. International Journal for Traffic &
basis of a tolerated approximation, are able of rapidly locating Transport Engineering, 5(2):134–149, 2015.
[17] E. Jenelius and L.-G. Mattsson. Road network vulnerability analysis
the most M critical nodes of a faulty road network. of area-covering disruptions: A grid-based approach with case study.
In order to generalize the approach to dynamic networks, Transportation Research Part A: Policy and Practice, 46(5):746–760,
three main extensions are planned: 1) adapting the algorithms 2012.
[18] G. F. Jenks. The data model concept in statistical mapping. International
for fast BC computation to weighted and directed graphs; 2) Yearbook of Cartography, 7:186–190, 1967.
enriching network modeling by taking into account additional [19] A. Kazerani and S. Winter. Can betweenness centrality explain traffic
properties of road networks (e.g., road length and capacity, flow? International Conference on Geographic Information Science,
2009.
number of lanes, land use information [11]), as well as [20] A. Kazerani and S. Winter. Can betweenness centrality explain traffic
dynamic traffic data (e.g., demand, accidents, travel times, flow? AGILE, 2009.
etc.); 3) studying the impact of nodes removal in multi-modal [21] D. King and A. Shalaby. Performance metrics and analysis of transit
network resilience in Toronto. Transportation Research Record, 2016.
transportation networks modelled as multilayer networks [28]. [22] R. Kinney, P. C. , R. Albert, and V. Latora. Modeling cascading failures
Moreover, the framework will be extended with features in the north american power grid. The European Physical Journal B -
to: 1) collect heterogeneous data from sensors; 2) adaptively Condensed Matter and Complex Systems, 46(1):101–107, 2005.
[23] N. Kourtellis, G. D. F. Morales, and F. Bonchi. Scalable online be-
change the settings of the 2C-Fast-BC algorithm accord- tweenness centrality in evolving graphs. 2016 IEEE 32nd International
ing to dynamically evolving domain requirements (e.g., the Conference on Data Engineering (ICDE), 00:1580–1581, 2016.
framework could run in an emergency mode, requiring low [24] V. Latora and M. Marchiori. Efficient behavior of small-world networks.
Phys. Rev. Lett., 87(19), Oct. 2001.
computation time and larger tolerated error, or alternatively in [25] G. Leu, H. Abbass, and N. Curtis. Resilience of ground transportation
a maintenance mode, demanding smaller error but allowing networks: a case study on melbourne. 33rd Australasian Transport
for more execution time). Research Forum Conference, 2010.
[26] K. Ohara, K. Saito, M. Kimura, and H. Motoda. Accelerating Compu-
tation of Distance Based Centrality Measures for Spatial Networks.
R EFERENCES [27] C. R. Palmer, G. Siganos, M. Faloutsos, C. Faloutsos, and P. B. Gibbons.
[1] S. Achard and E. Bullmore. Efficiency and cost of economical brain The connectivity and fault tolerance of the internet topology. ACM
functional networks. PLoS Computational Biology, 03(02), 2007. SIGMOD/PODS Workshop, 2001.
[2] Y. Altshuler, R. Puzis, Y. Elovici, S. Bekhor, and A. BaglioniPentland. [28] R. Sharma, M. Magnani, and D. Montesi. Investigating the types and
Augmented betweenness centrality for mobility prediction in transporta- effects of missing data in multilayer networks. In Proceedings of
tion networks. Finding Patterns of Human Behaviors in Network and the 2015 IEEE/ACM International Conference on Advances in Social
Mobility Data (NEMO), 2011. Networks Analysis and Mining 2015, pages 392–399, 2015.
[3] M. Baglioni, F. Geraci, M. Pellegrini, and E. Lastres. Fast exact com- [29] P. Suppa and E. Zimeo. A clustered approach for fast computation of
putation of betweenness centrality in social networks. In Proceedings betweenness centrality in social networks. In 2015 IEEE International
of the 2012 International Conference on Advances in Social Networks Congress on Big Data, pages 47–54, June 2015.
Analysis and Mining (ASONAM 2012), pages 450–456, 2012. [30] M. A. P. Taylor, G. D’este, et al. Concepts of network vulnerability
[4] B. Berche, C. von Ferber, T. Holovatch, and Y. Holovatch. Resilience of and applications to the identification of critical elements of transport
public transport networks against attacks. THE EUROPEAN PHYSICAL infrastructure. 2003.
JOURNAL B, 71(1):125–137, 2009. [31] J. Yang and Y. Chen. Fast computing betweenness centrality with virtual
[5] U. Brandes. A faster algorithm for betweenness centrality. Journal of nodes on large sparse networks. PLoS One, 2011.
Mathematical Sociology, 25(163), 2001. [32] Y. Zhang, X. Wang, P. Zeng, and X. Chen. Centrality characteristics of
[6] L. Costa, F. Rodrigues, G. Travieso, and P. Boas. Characterization of road network patterns of traffic analysis zones. Transportation Research
complex networks: A survey of measurements. Advances in Physics, Record: Journal of the Transportation Research Board, 2256:16–24,
56(1):167–242, 2007. 2011.
[7] M. De Domenico, A. Solé-Ribalta, S. Gómez, and A. Arenas. Naviga- [33] S. Zhao, P. Zhao, and Y. Cui. A network centrality measure framework
bility of interconnected networks under random failures. Proceedings for analyzing urban traffic flow: A case study of wuhan, china. Physica
of the National Academy of Sciences, 2014. A: Statistical Mechanics and its Applications, 478:143 – 157, 2017.

You might also like