Physica A 390 (2011) 374–386
Contents lists available at ScienceDirect
Physica A
journal homepage: www.elsevier.com/locate/physa
Navigation in large subway networks: An informational approach
Josep Barberillo a , Joan Saldaña b,∗
a
Institute of Transportation Studies, University of California, Irvine, CA 92697-3600, United States
b
Departament d’Informàtica i Matemàtica Aplicada, Universitat de Girona, Campus de Montilivi, Girona E-17071, Spain
article
info
Article history:
Received 9 June 2010
Received in revised form 7 September 2010
Available online 16 October 2010
Keywords:
Complex networks
Navigation
Route modularity
Search information
abstract
The structural properties of the subway network are crucial in effective transportation
in cities. This paper presents an information perspective of navigation in four different
subway networks: New York City, Paris, Barcelona and Moscow. We addressed our study to
investigate what is that makes it complicated to navigate in these kinds of networks and we
carried out a comparison between them and their intrinsic constraints. Our methodological
approach is based on a set of cost/efficiency indicators which are defined in the complex
networks literature. We find that the overall complexity in finding stations measured by
the average search information S linearly increases as a function of the network size N. The
direct implication of this finding is that from these basic levels of required information, the
average value H (k) can be represented as a function of the node degree k. Finally, through
analyzing subway networks in space P, we reveal the existing service modularity among
subway routes using a rescaled expression of S.
© 2010 Elsevier B.V. All rights reserved.
1. Introduction and overview of the study
Easily accessible and accurate network information is a key to the solution of congestion and inefficiency problems
in a complex transportation system [1,2]. In a subway transportation system, users interact with subway network from
day-to-day.
A typical subway transport trip involves many different steps like boarding the vehicle from one origin, making the necessary transfers from one route to another if necessary, and continuing the process until one reaches the final destination. The
computational cost to perform these operations is really high. Subway travelers, in other words, expend a great deal of processing capacity (a certain amount of information is needed to properly perform a trip) determining their travel decisions [3].
The choices of the whole population would determine congestion and other characteristics of the network. Moreover,
the trip quality experienced by tripmakers influences their own future decisions. Therefore, the performance of the overall
transportation system is highly related to users’ behavior, available information (see Fig. 1) and the transportation networks’
structure and information exchange efficiency [4].
It is also known that the decision maker has bounded rationality, since he/she has limited capabilities for gathering and
treating information [5] for example extracting useful information from a subway map not being overloaded [6] with the
information roughly presented on it.
In the literature, there are many studies on the complex networks present in modern engineering systems [7–10] such
as the Internet [11] and food webs [12] among others. There are other studies that specifically studied public transport
networks as a complex systems [13–19] and those that specifically study subway networks [3,20–22]. In Refs. [23–25] the
authors approached the public transport network analysis from an informational perspective.
∗
Corresponding author. Tel.: +34 972 418837; fax: +34 972 418792.
E-mail address: jsaldana@ima.udg.edu (J. Saldaña).
0378-4371/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.physa.2010.09.017
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
375
Fig. 1. Users’ future behavior.
Fig. 2. The meaning of uncertainty-based information and information gain.
In these and many other studies, however, it is generally a network but not an end-user mobility approach. We are
presenting a systematic performance analysis of different complex subway networks where the information needed to
navigate (cost/efficiency trade-off measure) in them plays a key role in a scenario where all users follow the utility
maximization principle and their behavior is strongly determined by the network structure.
Our approach hypothesizes that the key to good tripmaker mobility is to have access to the minimum and optimum
amount of information required to make a trip from one origin to a specific destination. In reality, due to habits [26,27],
subway network complexity and other constraints (e.g. all tripmakers do not have access to full information of the
transportation system), the users’ decisions are uncertain and tend to be inefficient.
The developed tools and gained insights could be helpful for understanding how such a new information approach can
help transportation planners to perform a systematic study that allows them to build and design more efficient and well
conceived public transportation networks [28,29].
2. The framework
When there is uncertainty facing a route choice decision, there is usually the possibility of reducing it by the acquisition
of information [30,31]. Information is indeed a measure of the decrease of uncertainty (see Fig. 2). In our context, it can
be understood as the average number of yes/no questions that are necessary to formulate in order to find the correct link
connecting our station to the next station on the path to a target node. Individuals (users) also have different knowledge
of the network [32] (heterogeneity of the system) and this makes it complicated to predict the users’ response and
consequently its impact on the system performance. At this point, we want to introduce a figure explaining how, in our
framework, the aggregate behavior is the result of individual decisions (see Fig. 3), individual decisions that are directly
related to the quantity of information needed to make a trip, a product of the specific network structure.
From an information theoretic perspective, Shannon’s entropy measure [31,33] is an ideal choice for quantifying the
cost of acquiring information (roughly speaking, the number of yes/no questions necessary to make zero the uncertainty
associated with a trip). In other words, this measure gives a weight to the cost of processing the uncertainty associated with
user mobility and location management complexity (e.g., in number of bits) when making a specific trip. Higher levels of
information to be managed by the tripmaker when facing a decision mean lower system efficiency because it affects the
three stages of the user decision process [34]: the information phase, the deliberation phase and the implementation phase.
Our approach could be used for a better understanding of the network structure, mobility and the influence of the
communicative process in the users’ response, making a graph based quantitative navigational evaluation incorporating
indicators defined in the complex networks literature as access and hide information [25,35,36]. Like in the Web Space [4],
the transportation network should be a space designed to let information flow, and to create opportunities for cooperation
and collaboration to reduce congestion levels and improve service levels.
376
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
Fig. 3. System level versus individual level.
Table 1
NYC, Paris, Barcelona and Moscow street layout.
New York city
Paris
Barcelona
Moscow
Street layout
Urban planning
Grid
Diagonal-radial-irregular
Grid-irregular
Radial-circumferential
Commissioners’ plan (1811)
Haussmann plan (1852)
Ildefons Cerda plan (1859)
3. Subway network study
3.1. Network representation
We have based our study on different networks with different sizes because it allows us to draw patterns as a result
of specific network characteristics (e.g. common geographical/geopolitical constraints). In Table 1 the street layout and
network patterns of different cities are given.
The first step is to collect a database (adjacency lists) from the subway networks being studied: New York City (NYC),
Moscow, Paris and Barcelona. While NYC presents one of the largest population concentrations in the world, Barcelona is a
prime harbor location on the Mediterranean. Nowadays it is difficult to discover any special geographic reason for the choice
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
377
Fig. 4. Partial view of a subway network: a shared parallel track example.
Table 2
Space L network indicators: N
(number of nodes), M (number of
links).
New York city
Paris
Barcelona
Moscow
N
M
496
300
203
183
605
353
237
215
of the Moscow and Paris as specific locations, but it is considered that route intersections and river crossings caused their
initial developments [37].
The second step is to create the corresponding unweighted and undirected graph representation space. The ideas of space
L and space P are proposed in general terms in Ref. [13] and used in many other studies [16,17,20,38–40]. The first topology
(space L) (see Table 2) consists of vertices representing stations, while the edges consist of the stretch of tunnel that joins
physically adjacent stations, indicating that there is at least one route that provides service to two consecutive stations. No
multiple links are allowed in the either network representation and the distance between stations equals the number of
stops from one to other. It is interesting to note that, as it has been observed elsewhere [3], subway systems grow by the
addition of new routes in the network or lengthening the existing ones rather than by attaching single nodes at different
locations of the network. Therefore, under the representation in space L, such networks will have a high frequency of stations
with two connections due to the strong spatial constraints and, indeed, is one of the main reasons why high-connected nodes
tend not to be connected with each other. Similarly, transfer stations of two or more lines usually have even degrees, and
nodes with odd degrees correspond to stations where a line begins or ends sharing a track with other lines (see Fig. 4) [3].
These features of the subway networks imply that the average clustering coefficient in the space L (the probability that
two neighbors are also neighbors between them) is really low [8,41,42]. As usual, this coefficient must be understood as a
measure of the local cohesiveness of the network.
In space P, nodes are the same as in the previous topology but here an edge between nodes means that you can travel from
one station to the other without any line or train change (see Fig. 5). Thus, the distance in number of links between stations
is equal to ‘‘the number of line changes +1’’ to be made by a tripmaker in order to successfully reach one station from his/her
specific starting location. This main characteristic will help us when carrying out what we defined as a ‘‘route modularity
study’’ because while in the space L the different existing lines are invisible (impossible to distinguish among each other),
working in the space P allow us to specifically focus in evaluating the existing modularity among railroad tracks. In this
space, the representation of subway networks shows a very high level of clustering [16,38], with values of the clustering
coefficient close to 0.9.
3.2. Performance measures
To characterize the ease or difficulty of navigation in subway networks, we use the concept of ‘‘search information’’ (S),
introduced in Refs. [25,36]. When randomly traveling a network, the probability of taking a specific exit (link) from a node
of degree k is 1/k. In this case, the information needed for locating a given exit is the number of yes/no questions to guess
the correct link. When links are ordered in such a way that a question can be used to reduce the number of possible links
by a factor of 2, this number of questions for a node of degree k is equal to log2 (k) (way finding task [43], taking a direct
path and not making intentional detours). An example of this situation is that of lines intersecting in a subway (or train)
378
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
Fig. 5. Representation of a theoretical 2-routes network in space L (top) and space P (bottom).
transfer station where the links emanating from such transfer nodes are ordered (see Ref. [36]). In other words, log2 (kn ) bits
of information are initially needed to correctly start a path from a station n of degree kn to any station in the network.
For each path p∏
(s, t ) from node s to node t, the probability to follow it at random is given by Rosvall et al. [36]:
1
P (path(s, t )) = k1
j∈path(s,t ) k −1 where j runs from the second node on the path until the one before reaching node t,
s
j
and 1/(kj − 1) is the probability of taking the correct way considering that the arrival link to a specific station is never a
possible exit (no backward movements are allowed when traveling through a specific path). The idea is to locate t through
any of the different (a single one or many) existing degenerate shortest paths from s to t. The total informational value
of knowing any one of the degenerated
∑ shortest paths between s and t defines the ‘‘search information between s and t’’
which is given by S (s, t ) = − log2 {path(s,t )} P (path (s, t )) where the sum runs over the set of degenerate paths {path (s, t)}
between s and t.
If N is the network size (the total number of stations), the total number of ordered pairs (s, t ), s ̸= t, of nodes is N (N − 1).
Summing the search information between nodes over all ordered pairs and dividing the sum by the total number of these
pairs, we obtain the so-called ‘‘average search information’’ in the network, namely:
S=
1
N (N − 1)
−−
s
S (s, t ).
t ̸=s
S measures the overall difficulty of navigating in the network and is based on the local minimization of the information
needed to navigate to a target node (see Ref. [36] for a discussion of its properties and a comparison with other measures).
To quantify how difficult it is to find a given node t starting from an arbitrary node in the network, it is defined the ‘‘hide
information of a node t’’ (Ht ) as the average pairwise information needed to reach the node t from any other node in the
network [36]. Since there are N − 1 nodes different from t, this average is given by
Ht =
1
−
N − 1 s=
̸ t
S (s, t ).
(1)
Similarly, a measure of how good the access to the network is from a given node t is given by the average pairwise
information At needed to reach any node in the network starting from t (which now appears as the first argument of search
information S):
At =
1
−
N − 1 s=
̸ t
S (t , s).
(2)
This average is called the ‘‘access information
of a node
∑
∑ t’’. Note that the average access and hide information values over
1
the whole network, namely, A = N1
A
and
H
=
t
t
t Ht , are always equal to S.
N
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
379
Fig. 6. Elementary step of the randomization procedure.
The spatial location of nodes according to their degree (nodes with degree 1 correspond to route ends, nodes with degree
2 are regular (non-transfer) stations, etc.) is expected to affect their values of At and Ht and to induce a noticeable dependence
between these information measures and the nodal degree. To reveal the existence of such a relationship, we compute the
average value of At and Ht over the set of nodes in the network with the same degree k [36]. Precisely, if Nk is the number
of nodes with degree k and kt is the degree
∑ of a node t, the average
∑values of A and H over the set {t : kt = k} define the
1
functions of the nodal degree A(k) = N1
t :kt =k At and H (k) = N
t :kt =k Ht .
k
k
3.3. Randomization procedure
The previous measures will allow us to perform an analysis of how spatial constraints affecting the development of
subway networks are reflected in the network architecture. To this aim, we will compare subway networks’ architecture to
that of similar networks not affected by spatial constraints. The simplest way to do this is to construct randomized versions of
the original networks. The randomly rewired networks preserve the nodal connectivity of the original ones and, moreover,
they are constructed in such a way that the existence of disconnected parts is not permitted [44]. Therefore, both types
of networks are similar in the sense that they have the same degree distribution but, in the randomized versions, spatial
constraints disappear due to a randomly reshuffling of links. For these networks, we calculate the corresponding average
search information, average access information, and average hide information, here denoted by Sr , Ar , and Hr , respectively.
The iterative algorithm for the random rewiring of the original network consists of first randomly selecting a pair of edges
{A, B} and {C, D}. The two edges are then rewired in such a way that A becomes connected to D, while C connects to B (see
Fig. 6). This step is aborted if one or both of these new links already exists (preventing the appearance of multiple edges
connecting the same pair of nodes and self-loops). A repeated application of this step leads to a randomized version of the
original subway network keeping it globally connected.
4. Results
In all the analyzed subway networks, the average search information is larger than that of their randomized counterparts.
In other words, the observation of a universally large S relative to Sr in all subway networks means that the ability to obtain
information is more important in these real networks (see Fig. 7). These results are consistent with the studies [25] where
the authors show that in general is more difficult to navigate in the original networks than in their randomized ones. The
spatial distribution in the real networks is planar (due to strong geographical constraints [45]) while randomized networks
show a great homogeneity and integrity between different parts because they have been constructed with more flexibility
at the time of doing the rewiring. Randomized networks are independent of their size N with respect to average access and
hide information needed to characterize the overall network operability while the topological differences between real and
randomized networks increase with the size of the network N (see Fig. 8). These differences are caused by the geographical
constraints (embedding the network in a two-dimensional space 2D).
Overall we observe that Paris and Barcelona are more efficiently organized than the NYC subway network, but that
both are harder to navigate than the Moscow network or their respective randomized networks. Thus, the ability to obtain
information is more important in the NYC subway network.
The S value linearly increases with the size of the network N (see Fig. 7) meaning that the communication process gets
harder while the network size increases reflecting the limitations of the current subway networks design and organization.
In a small size network like Moscow, with a well connected circumferential structure (compact network topology), the
discrepancy between real and randomized network is really small (S ≈ Sr ). Furthermore, BCN and Paris have similar
compactness levels (due to their particular topology) and the NYC subway network shows great differences between S
and Sr values reflecting in part its neighborhood community structure (see Table 3).
To compare networks of different sizes and to compensate the effect of the mean degree on the value of S, we compute the
ratio S /Sr in space L (randomized networks have the same connectivity distribution as the original ones; hence, they have
380
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
Fig. 7. Network average search information S over network size N. The straight line corresponds to the linear regression for real networks (R2 = 0.98).
Table 3
Space L network average search information
(bits).
New York city
Paris
Barcelona
Moscow
S
Sr
S /Sr
16.43
10.69
8.99
7.34
9.07
9.05
7.81
7.31
1.81
1.18
1.15
1.00
the same mean connectivity). Values of this ratio close to 1 reflect strong compactness of the network architecture (low
characteristic path length and low modularity). The ratio increases as the network geographical dispersion does. Table 3
shows the results for the analyzed networks from which we see that NYC’s value of S /Sr is almost twice Moscow’s value.
Moreover, it also follows the logarithmic scaling of Sr with the network size N, as it is claimed in Ref. [36] for several types
of random network architectures (Sr ≈ log2 (N ) with R2 = 0.79).
It is remarkable that the Moscow subway network shows better levels of accessibility than any other network (see Fig. 9).
In all the panels of Fig. 8, vertices of degree 1 fall into the upper part of the plot showing high values of information needed
to find them (hide information) at the same time that their accessibility level to the network keeps at similar levels as the
rest of the vertices with degree greater than 1. The same behavior is observed for nodes with degree greater than 1: while
the basic level of information needed to find them decreases with their degree (more connections imply less information
needed to find them and vice versa), their accessibility level is independent to the connectivity. The network study results
suggest a heterogeneous spatial node distribution.
This pattern is also observed for the different networks in focus (see Fig. 11) and it could be basically explained because
the vertices with degree 1 are located where a specific line ends/starts. These vertices are difficult to reach by the subway
user but in general show really good network accessibility levels.
From the network analysis in terms of the station degree, it follows that basic levels of hide information H which are
needed to find a specific station are fixed depending on the size of the network N (see Fig. 10). The key point to understand in
Fig. 10 is that the randomization procedure equally affects each different node (when making a nodal degree categorization),
making the network more compact and accessible (lower overall hide and access information values because shorter paths
connecting stations would exist) changing their intrinsic network architecture characteristics (e.g., lowering clustering
coefficient value). Once the size-related value of H is fixed, the hide information H depends only on the degree of the station
k but not on the particular size of the network N. The same behavior was observed in every subway network analyzed since
the slopes of the straight lines are almost equal (see Fig. 11).
We can conclude that having good levels of hide information depends on the good position of the station inside the
subway system, fact that is highly related to the stations own nodal connectivity.
381
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
25
NEW YORK CITY
PARIS
15
20
Hide Info. (bits)
Hide Info. (bits)
12
15
10
8
10
5
5
8
10
12
15
18
20
22
6
7
8
Access Info. (bits)
9
10
11
12
13
14
15
16
Access Info. (bits)
Real Network
Randomized Network
12
BARCELONA
11
11
10
10
Hide Info. (bits)
Hide Info. (bits)
12
9
8
MOSCOW
9
8
7
6
6
5
5
6
7
8
9
10
Access Info. (bits)
11
12
13
6
7
8
9
Access Info. (bits)
10
11
Fig. 8. Hide information H versus the access information A in space L. The analysis shows a correlated increment between these two parameters.
Furthermore, these graphics show a comparison between real (in blue) and randomized (in green) networks. (For interpretation of the references to colour
in this figure legend, the reader is referred to the web version of this article.)
4.1. Space P: route modularity
To analyze the route modularity of the subway networks we compare their topology in the space P. Precisely, we measure
the route modularity of a subway network of size N by S / log2 (N − 1). Note that if only a single route operates giving service
to a whole network of size N, then the average search information in space P is S = log2 (N − 1) because all nodes are fully
connected and, hence, all paths have length one. By contrast, if the same network has two (or more) railroad tracks, the
network modularity in the space increases and the value of S is always larger. In particular, if a network of size N has two
routes of sizes N1 and N2 , respectively, with only one node in common (see, for instance, Fig. 5), then N = N1 + N2 − 1 and
it can be shown that the average search information S attains its maximum value at N1 = N2 = (N + 1)/2 (i.e., when both
routes have the same number of stations), and decreases as the difference between route sizes increases until the limit case
given by N1 (or N2 ) equals 1 (see Appendix for details). Therefore, S = log2 (N − 1) is the minimum value of S for a fixed
network size N and we will use it to scale the S value in space P and analyze the network modularity.
The results show that there is a strong modularity among routes in the NYC network (high values of S and S / log2 (N − 1)).
The S value in Barcelona and Paris networks are quite similar although their size differs (Paris is 1.5 times the Barcelona
network size). However, when comparing values of S scaled by the minimum possible modularity for a specific network size
N in the space P (equivalent to having no modularity), the ranking of the networks becomes quite different (see Table 4).
In particular, despite its smaller size, the Barcelona integrated network (studied in more detail in the next section) shows
382
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
Fig. 9. Hide information versus access information in space L. Straight lines correspond to the linear regression of the hide information average value as
a function of degree k. Circles are colored according to the node degree.
Fig. 10. This figure shows the real–random hide ratio H (k)/Hr (k) in space L. Different networks show similar qualitative behavior.
greater levels of modularity among routes than the Paris network. The same reasoning applies to the route modularity of
the Moscow subway network with regard to the Paris subway network.
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
383
Fig. 11. Hide information averaged over nodes with the same degree k in space L. Straight lines correspond to the linear regression of the hide information
average value H (k) as a function of degree k for four chosen cities.
Table 4
Space P network average search information
(bits). No modularity among routes: Smin =
log2 (N − 1).
New York city
Paris
Barcelona
Moscow
S
S / log2 (N − 1)
13.58
11.43
11.19
10.74
1.52
1.39
1.46
1.43
4.2. A case study: the Barcelona integrated network
The Barcelona integrated network is operated by three different sector companies: Transports Metropolitans de
Barcelona (TMB), Ferrocarrils de la Generalitat de Catalunya (FGC) and Tramvia de Barcelona (TRAM). This example allows
us to study the connectivity among different railroad tracks using the space P to represent the integrated network. Thus,
plotting one single stand-alone station representing each line (not a transfer station where passengers change from one
route to another) we have drawn the navigational relationship among them (see Fig. 12).
The usage of the space P representation has been proven to be useful because it allows us to carry out a route based
study (not only a station-based network analysis as in space L). Analyzing the corresponding graph of all routes represented
together we take a closer look to the global network service structure.
Poor levels of overall communication between railroad tracks are shown in the Barcelona Integrated Network. Compared
with others, TMB lines have good levels of accessibility and are easily reachable (low hide information value).
Therefore, depending on which company is providing the service, the different railway lines present huge variations
in the navigability levels (there is a strong modularity in the service). In other words, a community structure is detected
in space P as a result of the existence of different companies operating in the integrated network (see Fig. 12). The direct
implication of the results is that there are few multiple crossing routes or shared lines between different companies.
On the other hand, the existence of an outlier in the upper top of the plot in Fig. 12 is explained as follows. The point
itself belongs to line 11 (L11) in the integrated subway system. Although being operated by TMB this line shows surprisingly
high levels of information needed to locate it (hide information value is 15.4 bits) and high values of access information
(12.7 bits) as well. This line is characterized for having only one connecting node to the remaining lines of the network (one
single transfer station that links lines L11 and line 4 (L4)) at the same time is the shortest line, composed only of 5 stations.
In other words, L11 is a dead-end line. Besides, L4 has 21 different stations being one of the longest lines in the integrated
network and its access and hide information values are, respectively, 10.6 bits and 11 bits. From this information directly
384
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
Fig. 12. Plot of the dependence of the hide information and access information in space P for each line considered (selecting one single node representing
each route) in function of the company administrator for everyone (TMB, TRAM & FGC). Three different communities (enclosed by the ellipses) can be
observed as well as one TMB outlier line located in the upper part of the plot showing poor levels of overall communication (see the main text for an
explanation).
extracted from Fig. 12 (and due to the special characteristics of this part of the network) it follows that we can directly
compute an approximate value of the access and hide information values for L11 from those of L4. The idea is quite simple
and takes into account the fact that every entering or exiting trip to/from L11 has to pass through L4 as a must condition.
From the expression given by Eq. (1) and the fact that H4 = 11 bits, we have that the average probability of finding L4 is
1/2048. Then, knowing that in space P the degree of the station plotted in Fig. 12 representing L4 is k = 21, a straightforward
1
1
· 2048
)∼
calculation shows that an approximate hide information value for L11 is H11 ∼
= 15.4 bits. Once located
= − log2 ( 21
on L4, one subway user traveling to L11 has to choose between 21 different options to locate the correct exit that leads
him/her to L11. That is the reason why the hide information value of L11 dramatically increases.
Following the same principle outlined above but now using Eq. (2), it follows that the average probability of locating
any other specific line departing from L4 is 1/1552. Departing from L11 the traveler has to choose between 4 different exits
(k = 4) with probability 1/4 to successfully locate the target line L4 as a first necessary step to find any other line when
1
navigating the integrated network. Thus, the approximate value of access information for L11 is A11 ∼
)∼
= − log2 ( 41 · 1552
=
12.6 bits.
These rough values presented above totally agree with the corresponding values in Fig. 12. Presenting these results we
would like to remark the influence and impacts of route modularity on access and hide information values.
5. Discussion
Our research is one example of how information quantification may be included in a transportation network study. In
this paper we have presented an empirical study of different subway networks under different representations (space L and
space P) of network topology.
Subways networks are characterized by significant shared track for many lines because pre-existing surface transport
used to have one single original route passing through main stations (see Fig. 4), and for having strong geographical
constraints, leading to similar topological and informational characteristics among them. In this sense, it is remarkable that
the average search information of the analyzed subway networks linearly increases with the network size, in contrast to
what happens in random networks where this information scales as the logarithm of the network size [36].
The presented results also suggest a heterogeneous spatial node distribution for each network meaning that vertices
with small connectivity are difficult to reach by the subway user but in general each station shows fairly good network
accessibility levels. Another interesting finding is that basic levels of hide information H needed to find a specific station
in a subway network increase as the size of the network N does. It is clearly more difficult to navigate in a larger and
modular subway network like the one located in New York City than to navigate either in a smaller one or in the respective
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
385
Fig. 13. Representation of paths of length 1 (top) and 2 (bottom) of a network with two routes in space P. There is one transfer station (node 4) and the
route sizes are N1 = N2 = 5. Any shortest path of length 2 must pass through node 4.
randomized counterpart. From these basic levels of required information, H can be represented as a linear decreasing
function of the node degree k.
An innovative approach for studying the network community structure using the space P has also been introduced. The
representation of subway networks in this space allows us to compare the route modularity among networks by using the
ratio of average search information in real networks to that of a fully connected network (i.e., to that of a subway network
with only one route). The same representation is used to analyze the global service structure in an integrated network with
different company administrators. This analysis leads to a clear conclusion, namely, that it is important to plan the integrated
rail network systems in an effective way to maintain good cooperation among the different parts of the subway networks
and to avoid poor levels of overall communication between them (as the obtained for the Barcelona integrated network)
because of the massive investment needed to build them.
Acknowledgements
This work has been partially supported by the project FP6-2003-NEST-PATH-1 named ‘‘Unifying Networks for Science
and Society’’ of the Sixth European Framework Programme (JB and JS), the research project MTM2008-06349-C03-02 of the
Spanish government, the project 2009SGR-345 of the Generalitat de Catalunya (JS), a Caja Madrid Foundation grant, and a
Balsells fellowship (JB).
Appendix
In the simple case that a subway network has only two routes, only paths of length one and two are possible in space
P. Therefore, S = (S1 + S2 )/(N (N − 1)) with S1 and S2 being the contributions to S of paths of length 1 and 2, respectively.
Note that if N1 and N2 are the route sizes, then N = N1 + N2 − 1 (the transfer station is present in both routes and hence
is counted twice). A path of length 1 either starts at a regular station or at the transfer one. In the first case, the probability
to randomly follow a given path of length 1 of the route i is 1/(Ni − 1), i = 1, 2, because no path of length 1 is possible
between stations of different routes (top of Fig. 13). Since there are Ni − 1 paths of length 1 from a given ordinary station of
the route i and each route has Ni − 1 ordinary stations, the total number of paths of length 1 is (Ni − 1)2 . On the other hand,
the probability of following a given path departing from the transfer station is 1/(N − 1) because there are N − 1 regular
stations (top of Fig. 13). Therefore, the contribution to S of paths of length 1 is given by
S1 = (N1 − 1)2 log2 (N1 − 1) + (N2 − 1)2 log2 (N2 − 1) + (N − 1) log2 (N − 1).
Because (shortest) paths of length 2 always connect regular stations of different routes through the transfer station (bottom
of Fig. 13), the probability that a random path of length 2 goes from a given departure station of route 1 to a given arrival
386
J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386
station of route 2 is equal to the probability that the departure station connects to the transfer station, 1/(N1 − 1), times
the probability that the transfer station connects to the arrival station (backward step not allowed), 1/(N − 2). Since the
total number of paths of length 2 from route 1 to route 2 is (N1 − 1) (N2 − 2), and it is equal to the number of paths in the
opposite direction, the contribution to S of paths of length 2 is then given by
S2 = (N1 − 1)(N2 − 1)(log2 (N1 − 1) + 2 log2 (N − 2) + log2 (N2 − 1)).
Taking S as a function of N1 (or N2 ), from the expressions of S1 and S2 it follows that, for a fixed N, the average search
information S attains its maximum at the same value of N1 as does the function
F (N1 ) = (N − 1)((N1 − 1) log2 (N1 − 1) + (N − N1 ) log2 (N − N1 )) + 2(N1 − 1)(N − N1 ) log2 (N − 2)
with N1 = 1, . . . , N. Both terms in this expression attain their maximum at N1 = (N + 1)/2 (the routes have the same
size), whereas they vanish at N1 = 1 and N1 = N − 1 (only one route is present). Therefore, it follows that log2 (N − 1) is
the minimum value of S for 1 ≤ Ni ≤ N (i = 1, 2) with N1 + N2 − 1 = N.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
M. Manheim, Fundamentals of Transportation Systems Analysis Basic Concepts, MIT Press, Cambridge, Massachusetts, 1979.
E. Cascetta, Transportation Systems Engineering: Theory and Methods, Kluwer Academic Publ., Dordrecht, Boston, London, 2001.
P. Angeloudis, D. Fisk, Physica A 367 (2006) 553–558.
B.-L. Tim, H. Wendy, A.H. James, O.H. Kieron, S. Nigel, J.W. Daniel, Foundations and Trends in Web Science 1 (2006) 1–130.
H.A. Simon, Models of Bounded Rationality, MIT Press, Cambridge, Massachusetts, 1982.
M. Ben-Akiva, A. De Palma, K. Isam, Transportation Research Part A: General 25 (1991) 251–266.
M.E.J. Newman, J. Park, Physical Review E 68 (2003) 36122.
R. Albert, A.L. Barabasi, Reviews of Modern Physics 74 (2002) 47–97.
M. Buchanan, Nexus: Small Worlds and the Groundbreaking Science of Networks, W.W. Norton & Company, New York, London, 2002.
D.J. Watts, Six Degrees: The Science of a Connected Age, Norton, New York, London, 2003.
R. Albert, H. Jeong, A.-L. Barabasi, Physica A 272 (1999) 173–187.
J.M. Montoya, S.L. Pimm, R.V. Solé, Nature 442 (2006) 259–264.
P. Sen, S. Dasgupta, A. Chatterjee, P.A. Sreeram, G. Mukherjee, S.S. Manna, Physical Review E 67 (2003) 036106.
Y.-Z. Chen, N. Li, D.-R. He, Physica A 376 (2007) 747–754.
C. von Ferber, T. Holovatch, Y. Holovatch, V. Palchykov, Modeling metropolis public transport, in: Traffic and Granular Flow’07, Springer Verlag, 2009,
pp. 709–719.
J. Sienkiewicz, J.A. Holyst, Physical Review E 72 (2005) 046127.
X. Xu, J. Hu, F. Liu, L. Liu, Physica A 374 (2007) 441–448.
D.O. Cajueiro, Physical Review E 79 (2009) 046103.
W. Li, X. Cai, Physica A 382 (2007) 693–703.
V. Latora, M. Marchiori, Physica A 314 (2002) 109–113.
I. Vragovich, E. Louis, A. Diaz-Guilera, Physical Review E 71 (2005) 036122.
D. Gattuso, E. Miriello, Networks and Spatial Economics 5 (2005) 395–414.
S.N. Dorogovtsev, J.F.F. Mendes, Evolution of Networks: From Biological Nets to the Internet and WWW, Oxford University Press, Oxford, 2003.
M. Rosvall, P. Minnhagen, K. Sneppen, Physical Review E 71 (2005) 066111.
M. Rosvall, A. Trusina, P. Minnhagen, K. Sneppen, Physical Review Letters 94 (2005) 028701.
T. Gärling, K. Axhausen, Transportation 30 (2003) 1–11.
K.W. Axhausen, A. Zimmermann, S. Schönfelder, G. Rindsfüser, T. Haupt, Transportation 29 (2002) 95–124.
S.C. Wirasinghe, U. Vandebona, Transportation and traffic theory, in: Proceedings of the 14th International Symposium on Transportation and Traffic
Theory, Jerusalem, Israel, 1999.
U. Vandebona, 28th Conference of Australian Institutes of Transport Research, School of Civil and Environmental Engineering, UNSW, Australia, 2006.
G.J. Klir, M.J. Wierman, Uncertainty-Based Information: Elements of Generalized Information Theory, Physica-Verlag, Heidelberg, New York, 1998.
C.E. Shannon, W. Weaver, The Mathematical Theory of Communication, The University of Illinois Press, Chicago, 1949.
K.J. Arrow, Information and Economic Behavior, Harvard Univ. Cambridge, Massachusetts, Defense Technical Information Center, Ft. Belvoir, 1973.
T.M. Cover, J.A. Thomas, Elements of Information Theory, John Wiley, New York, 1991.
B. Walliser, Cognitive Economics, Springer, Berlin, 2008.
K. Sneppen, A., M. Trusina, M. Rosvall, Europhysics Letters 69 (2005) 853–859.
M. Rosvall, A. Grönlund, P. Minnhagen, K. Sneppen, Physical Review E 72 (2005) 046117.
V.R. Vuchic, Urban Transit: Systems and Technology, J. Wiley & Sons, Hoboken, NJ, 2007.
C. von Ferber, T. Holovatch, Y. Holovatch, V. Palchykov, Physica A 380 (2007) 585–591.
X. Xu, J. Hu, F. Liu, Chaos 17 (2007) 023129.
Y. Hu, D. Zhu, Physica A 388 (2009) 2061–2071.
A. Barrat, M. Barthelemy, R. Pastor-Satorras, A. Vespignani, G. Parisi, Proc. Natl. Acad. Sci. USA 101 (2004) 3747–3752.
M.E.J. Newman, S.H. Strogatz, D.J. Watts, Physical Review E 64 (2001) 026118.
S. Timpf, Networks and Spatial Economics 2 (2002) 9–33.
S. Maslov, K. Sneppen, Science 296 (2002) 910–913.
M. Barthelemy, A. Barrat, A. Vespignani, Advances in Complex Systems 10 (2007) 5–28.