Academia.eduAcademia.edu

Navigation in large subway networks: An informational approach

2011, Physica D: Nonlinear Phenomena

The structural properties of the subway network are crucial in effective transportation in cities. This paper presents an information perspective of navigation in four different subway networks: New York City, Paris, Barcelona and Moscow. We addressed our study to investigate what is that makes it complicated to navigate in these kinds of networks and we carried out a comparison between them and their intrinsic constraints. Our methodological approach is based on a set of cost/efficiency indicators which are defined in the complex networks literature. We find that the overall complexity in finding stations measured by the average search information S linearly increases as a function of the network size N. The direct implication of this finding is that from these basic levels of required information, the average value H(k) can be represented as a function of the node degree k. Finally, through analyzing subway networks in space P, we reveal the existing service modularity among subway routes using a rescaled expression of S.

Physica A 390 (2011) 374–386 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Navigation in large subway networks: An informational approach Josep Barberillo a , Joan Saldaña b,∗ a Institute of Transportation Studies, University of California, Irvine, CA 92697-3600, United States b Departament d’Informàtica i Matemàtica Aplicada, Universitat de Girona, Campus de Montilivi, Girona E-17071, Spain article info Article history: Received 9 June 2010 Received in revised form 7 September 2010 Available online 16 October 2010 Keywords: Complex networks Navigation Route modularity Search information abstract The structural properties of the subway network are crucial in effective transportation in cities. This paper presents an information perspective of navigation in four different subway networks: New York City, Paris, Barcelona and Moscow. We addressed our study to investigate what is that makes it complicated to navigate in these kinds of networks and we carried out a comparison between them and their intrinsic constraints. Our methodological approach is based on a set of cost/efficiency indicators which are defined in the complex networks literature. We find that the overall complexity in finding stations measured by the average search information S linearly increases as a function of the network size N. The direct implication of this finding is that from these basic levels of required information, the average value H (k) can be represented as a function of the node degree k. Finally, through analyzing subway networks in space P, we reveal the existing service modularity among subway routes using a rescaled expression of S. © 2010 Elsevier B.V. All rights reserved. 1. Introduction and overview of the study Easily accessible and accurate network information is a key to the solution of congestion and inefficiency problems in a complex transportation system [1,2]. In a subway transportation system, users interact with subway network from day-to-day. A typical subway transport trip involves many different steps like boarding the vehicle from one origin, making the necessary transfers from one route to another if necessary, and continuing the process until one reaches the final destination. The computational cost to perform these operations is really high. Subway travelers, in other words, expend a great deal of processing capacity (a certain amount of information is needed to properly perform a trip) determining their travel decisions [3]. The choices of the whole population would determine congestion and other characteristics of the network. Moreover, the trip quality experienced by tripmakers influences their own future decisions. Therefore, the performance of the overall transportation system is highly related to users’ behavior, available information (see Fig. 1) and the transportation networks’ structure and information exchange efficiency [4]. It is also known that the decision maker has bounded rationality, since he/she has limited capabilities for gathering and treating information [5] for example extracting useful information from a subway map not being overloaded [6] with the information roughly presented on it. In the literature, there are many studies on the complex networks present in modern engineering systems [7–10] such as the Internet [11] and food webs [12] among others. There are other studies that specifically studied public transport networks as a complex systems [13–19] and those that specifically study subway networks [3,20–22]. In Refs. [23–25] the authors approached the public transport network analysis from an informational perspective. ∗ Corresponding author. Tel.: +34 972 418837; fax: +34 972 418792. E-mail address: jsaldana@ima.udg.edu (J. Saldaña). 0378-4371/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2010.09.017 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 375 Fig. 1. Users’ future behavior. Fig. 2. The meaning of uncertainty-based information and information gain. In these and many other studies, however, it is generally a network but not an end-user mobility approach. We are presenting a systematic performance analysis of different complex subway networks where the information needed to navigate (cost/efficiency trade-off measure) in them plays a key role in a scenario where all users follow the utility maximization principle and their behavior is strongly determined by the network structure. Our approach hypothesizes that the key to good tripmaker mobility is to have access to the minimum and optimum amount of information required to make a trip from one origin to a specific destination. In reality, due to habits [26,27], subway network complexity and other constraints (e.g. all tripmakers do not have access to full information of the transportation system), the users’ decisions are uncertain and tend to be inefficient. The developed tools and gained insights could be helpful for understanding how such a new information approach can help transportation planners to perform a systematic study that allows them to build and design more efficient and well conceived public transportation networks [28,29]. 2. The framework When there is uncertainty facing a route choice decision, there is usually the possibility of reducing it by the acquisition of information [30,31]. Information is indeed a measure of the decrease of uncertainty (see Fig. 2). In our context, it can be understood as the average number of yes/no questions that are necessary to formulate in order to find the correct link connecting our station to the next station on the path to a target node. Individuals (users) also have different knowledge of the network [32] (heterogeneity of the system) and this makes it complicated to predict the users’ response and consequently its impact on the system performance. At this point, we want to introduce a figure explaining how, in our framework, the aggregate behavior is the result of individual decisions (see Fig. 3), individual decisions that are directly related to the quantity of information needed to make a trip, a product of the specific network structure. From an information theoretic perspective, Shannon’s entropy measure [31,33] is an ideal choice for quantifying the cost of acquiring information (roughly speaking, the number of yes/no questions necessary to make zero the uncertainty associated with a trip). In other words, this measure gives a weight to the cost of processing the uncertainty associated with user mobility and location management complexity (e.g., in number of bits) when making a specific trip. Higher levels of information to be managed by the tripmaker when facing a decision mean lower system efficiency because it affects the three stages of the user decision process [34]: the information phase, the deliberation phase and the implementation phase. Our approach could be used for a better understanding of the network structure, mobility and the influence of the communicative process in the users’ response, making a graph based quantitative navigational evaluation incorporating indicators defined in the complex networks literature as access and hide information [25,35,36]. Like in the Web Space [4], the transportation network should be a space designed to let information flow, and to create opportunities for cooperation and collaboration to reduce congestion levels and improve service levels. 376 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 Fig. 3. System level versus individual level. Table 1 NYC, Paris, Barcelona and Moscow street layout. New York city Paris Barcelona Moscow Street layout Urban planning Grid Diagonal-radial-irregular Grid-irregular Radial-circumferential Commissioners’ plan (1811) Haussmann plan (1852) Ildefons Cerda plan (1859) 3. Subway network study 3.1. Network representation We have based our study on different networks with different sizes because it allows us to draw patterns as a result of specific network characteristics (e.g. common geographical/geopolitical constraints). In Table 1 the street layout and network patterns of different cities are given. The first step is to collect a database (adjacency lists) from the subway networks being studied: New York City (NYC), Moscow, Paris and Barcelona. While NYC presents one of the largest population concentrations in the world, Barcelona is a prime harbor location on the Mediterranean. Nowadays it is difficult to discover any special geographic reason for the choice J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 377 Fig. 4. Partial view of a subway network: a shared parallel track example. Table 2 Space L network indicators: N (number of nodes), M (number of links). New York city Paris Barcelona Moscow N M 496 300 203 183 605 353 237 215 of the Moscow and Paris as specific locations, but it is considered that route intersections and river crossings caused their initial developments [37]. The second step is to create the corresponding unweighted and undirected graph representation space. The ideas of space L and space P are proposed in general terms in Ref. [13] and used in many other studies [16,17,20,38–40]. The first topology (space L) (see Table 2) consists of vertices representing stations, while the edges consist of the stretch of tunnel that joins physically adjacent stations, indicating that there is at least one route that provides service to two consecutive stations. No multiple links are allowed in the either network representation and the distance between stations equals the number of stops from one to other. It is interesting to note that, as it has been observed elsewhere [3], subway systems grow by the addition of new routes in the network or lengthening the existing ones rather than by attaching single nodes at different locations of the network. Therefore, under the representation in space L, such networks will have a high frequency of stations with two connections due to the strong spatial constraints and, indeed, is one of the main reasons why high-connected nodes tend not to be connected with each other. Similarly, transfer stations of two or more lines usually have even degrees, and nodes with odd degrees correspond to stations where a line begins or ends sharing a track with other lines (see Fig. 4) [3]. These features of the subway networks imply that the average clustering coefficient in the space L (the probability that two neighbors are also neighbors between them) is really low [8,41,42]. As usual, this coefficient must be understood as a measure of the local cohesiveness of the network. In space P, nodes are the same as in the previous topology but here an edge between nodes means that you can travel from one station to the other without any line or train change (see Fig. 5). Thus, the distance in number of links between stations is equal to ‘‘the number of line changes +1’’ to be made by a tripmaker in order to successfully reach one station from his/her specific starting location. This main characteristic will help us when carrying out what we defined as a ‘‘route modularity study’’ because while in the space L the different existing lines are invisible (impossible to distinguish among each other), working in the space P allow us to specifically focus in evaluating the existing modularity among railroad tracks. In this space, the representation of subway networks shows a very high level of clustering [16,38], with values of the clustering coefficient close to 0.9. 3.2. Performance measures To characterize the ease or difficulty of navigation in subway networks, we use the concept of ‘‘search information’’ (S), introduced in Refs. [25,36]. When randomly traveling a network, the probability of taking a specific exit (link) from a node of degree k is 1/k. In this case, the information needed for locating a given exit is the number of yes/no questions to guess the correct link. When links are ordered in such a way that a question can be used to reduce the number of possible links by a factor of 2, this number of questions for a node of degree k is equal to log2 (k) (way finding task [43], taking a direct path and not making intentional detours). An example of this situation is that of lines intersecting in a subway (or train) 378 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 Fig. 5. Representation of a theoretical 2-routes network in space L (top) and space P (bottom). transfer station where the links emanating from such transfer nodes are ordered (see Ref. [36]). In other words, log2 (kn ) bits of information are initially needed to correctly start a path from a station n of degree kn to any station in the network. For each path p∏ (s, t ) from node s to node t, the probability to follow it at random is given by Rosvall et al. [36]: 1 P (path(s, t )) = k1 j∈path(s,t ) k −1 where j runs from the second node on the path until the one before reaching node t, s j and 1/(kj − 1) is the probability of taking the correct way considering that the arrival link to a specific station is never a possible exit (no backward movements are allowed when traveling through a specific path). The idea is to locate t through any of the different (a single one or many) existing degenerate shortest paths from s to t. The total informational value of knowing any one of the degenerated ∑ shortest paths between s and t defines the ‘‘search information between s and t’’ which is given by S (s, t ) = − log2 {path(s,t )} P (path (s, t )) where the sum runs over the set of degenerate paths {path (s, t)} between s and t. If N is the network size (the total number of stations), the total number of ordered pairs (s, t ), s ̸= t, of nodes is N (N − 1). Summing the search information between nodes over all ordered pairs and dividing the sum by the total number of these pairs, we obtain the so-called ‘‘average search information’’ in the network, namely: S= 1 N (N − 1) −− s S (s, t ). t ̸=s S measures the overall difficulty of navigating in the network and is based on the local minimization of the information needed to navigate to a target node (see Ref. [36] for a discussion of its properties and a comparison with other measures). To quantify how difficult it is to find a given node t starting from an arbitrary node in the network, it is defined the ‘‘hide information of a node t’’ (Ht ) as the average pairwise information needed to reach the node t from any other node in the network [36]. Since there are N − 1 nodes different from t, this average is given by Ht = 1 − N − 1 s= ̸ t S (s, t ). (1) Similarly, a measure of how good the access to the network is from a given node t is given by the average pairwise information At needed to reach any node in the network starting from t (which now appears as the first argument of search information S): At = 1 − N − 1 s= ̸ t S (t , s). (2) This average is called the ‘‘access information of a node ∑ ∑ t’’. Note that the average access and hide information values over 1 the whole network, namely, A = N1 A and H = t t t Ht , are always equal to S. N J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 379 Fig. 6. Elementary step of the randomization procedure. The spatial location of nodes according to their degree (nodes with degree 1 correspond to route ends, nodes with degree 2 are regular (non-transfer) stations, etc.) is expected to affect their values of At and Ht and to induce a noticeable dependence between these information measures and the nodal degree. To reveal the existence of such a relationship, we compute the average value of At and Ht over the set of nodes in the network with the same degree k [36]. Precisely, if Nk is the number of nodes with degree k and kt is the degree ∑ of a node t, the average ∑values of A and H over the set {t : kt = k} define the 1 functions of the nodal degree A(k) = N1 t :kt =k At and H (k) = N t :kt =k Ht . k k 3.3. Randomization procedure The previous measures will allow us to perform an analysis of how spatial constraints affecting the development of subway networks are reflected in the network architecture. To this aim, we will compare subway networks’ architecture to that of similar networks not affected by spatial constraints. The simplest way to do this is to construct randomized versions of the original networks. The randomly rewired networks preserve the nodal connectivity of the original ones and, moreover, they are constructed in such a way that the existence of disconnected parts is not permitted [44]. Therefore, both types of networks are similar in the sense that they have the same degree distribution but, in the randomized versions, spatial constraints disappear due to a randomly reshuffling of links. For these networks, we calculate the corresponding average search information, average access information, and average hide information, here denoted by Sr , Ar , and Hr , respectively. The iterative algorithm for the random rewiring of the original network consists of first randomly selecting a pair of edges {A, B} and {C, D}. The two edges are then rewired in such a way that A becomes connected to D, while C connects to B (see Fig. 6). This step is aborted if one or both of these new links already exists (preventing the appearance of multiple edges connecting the same pair of nodes and self-loops). A repeated application of this step leads to a randomized version of the original subway network keeping it globally connected. 4. Results In all the analyzed subway networks, the average search information is larger than that of their randomized counterparts. In other words, the observation of a universally large S relative to Sr in all subway networks means that the ability to obtain information is more important in these real networks (see Fig. 7). These results are consistent with the studies [25] where the authors show that in general is more difficult to navigate in the original networks than in their randomized ones. The spatial distribution in the real networks is planar (due to strong geographical constraints [45]) while randomized networks show a great homogeneity and integrity between different parts because they have been constructed with more flexibility at the time of doing the rewiring. Randomized networks are independent of their size N with respect to average access and hide information needed to characterize the overall network operability while the topological differences between real and randomized networks increase with the size of the network N (see Fig. 8). These differences are caused by the geographical constraints (embedding the network in a two-dimensional space 2D). Overall we observe that Paris and Barcelona are more efficiently organized than the NYC subway network, but that both are harder to navigate than the Moscow network or their respective randomized networks. Thus, the ability to obtain information is more important in the NYC subway network. The S value linearly increases with the size of the network N (see Fig. 7) meaning that the communication process gets harder while the network size increases reflecting the limitations of the current subway networks design and organization. In a small size network like Moscow, with a well connected circumferential structure (compact network topology), the discrepancy between real and randomized network is really small (S ≈ Sr ). Furthermore, BCN and Paris have similar compactness levels (due to their particular topology) and the NYC subway network shows great differences between S and Sr values reflecting in part its neighborhood community structure (see Table 3). To compare networks of different sizes and to compensate the effect of the mean degree on the value of S, we compute the ratio S /Sr in space L (randomized networks have the same connectivity distribution as the original ones; hence, they have 380 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 Fig. 7. Network average search information S over network size N. The straight line corresponds to the linear regression for real networks (R2 = 0.98). Table 3 Space L network average search information (bits). New York city Paris Barcelona Moscow S Sr S /Sr 16.43 10.69 8.99 7.34 9.07 9.05 7.81 7.31 1.81 1.18 1.15 1.00 the same mean connectivity). Values of this ratio close to 1 reflect strong compactness of the network architecture (low characteristic path length and low modularity). The ratio increases as the network geographical dispersion does. Table 3 shows the results for the analyzed networks from which we see that NYC’s value of S /Sr is almost twice Moscow’s value. Moreover, it also follows the logarithmic scaling of Sr with the network size N, as it is claimed in Ref. [36] for several types of random network architectures (Sr ≈ log2 (N ) with R2 = 0.79). It is remarkable that the Moscow subway network shows better levels of accessibility than any other network (see Fig. 9). In all the panels of Fig. 8, vertices of degree 1 fall into the upper part of the plot showing high values of information needed to find them (hide information) at the same time that their accessibility level to the network keeps at similar levels as the rest of the vertices with degree greater than 1. The same behavior is observed for nodes with degree greater than 1: while the basic level of information needed to find them decreases with their degree (more connections imply less information needed to find them and vice versa), their accessibility level is independent to the connectivity. The network study results suggest a heterogeneous spatial node distribution. This pattern is also observed for the different networks in focus (see Fig. 11) and it could be basically explained because the vertices with degree 1 are located where a specific line ends/starts. These vertices are difficult to reach by the subway user but in general show really good network accessibility levels. From the network analysis in terms of the station degree, it follows that basic levels of hide information H which are needed to find a specific station are fixed depending on the size of the network N (see Fig. 10). The key point to understand in Fig. 10 is that the randomization procedure equally affects each different node (when making a nodal degree categorization), making the network more compact and accessible (lower overall hide and access information values because shorter paths connecting stations would exist) changing their intrinsic network architecture characteristics (e.g., lowering clustering coefficient value). Once the size-related value of H is fixed, the hide information H depends only on the degree of the station k but not on the particular size of the network N. The same behavior was observed in every subway network analyzed since the slopes of the straight lines are almost equal (see Fig. 11). We can conclude that having good levels of hide information depends on the good position of the station inside the subway system, fact that is highly related to the stations own nodal connectivity. 381 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 25 NEW YORK CITY PARIS 15 20 Hide Info. (bits) Hide Info. (bits) 12 15 10 8 10 5 5 8 10 12 15 18 20 22 6 7 8 Access Info. (bits) 9 10 11 12 13 14 15 16 Access Info. (bits) Real Network Randomized Network 12 BARCELONA 11 11 10 10 Hide Info. (bits) Hide Info. (bits) 12 9 8 MOSCOW 9 8 7 6 6 5 5 6 7 8 9 10 Access Info. (bits) 11 12 13 6 7 8 9 Access Info. (bits) 10 11 Fig. 8. Hide information H versus the access information A in space L. The analysis shows a correlated increment between these two parameters. Furthermore, these graphics show a comparison between real (in blue) and randomized (in green) networks. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) 4.1. Space P: route modularity To analyze the route modularity of the subway networks we compare their topology in the space P. Precisely, we measure the route modularity of a subway network of size N by S / log2 (N − 1). Note that if only a single route operates giving service to a whole network of size N, then the average search information in space P is S = log2 (N − 1) because all nodes are fully connected and, hence, all paths have length one. By contrast, if the same network has two (or more) railroad tracks, the network modularity in the space increases and the value of S is always larger. In particular, if a network of size N has two routes of sizes N1 and N2 , respectively, with only one node in common (see, for instance, Fig. 5), then N = N1 + N2 − 1 and it can be shown that the average search information S attains its maximum value at N1 = N2 = (N + 1)/2 (i.e., when both routes have the same number of stations), and decreases as the difference between route sizes increases until the limit case given by N1 (or N2 ) equals 1 (see Appendix for details). Therefore, S = log2 (N − 1) is the minimum value of S for a fixed network size N and we will use it to scale the S value in space P and analyze the network modularity. The results show that there is a strong modularity among routes in the NYC network (high values of S and S / log2 (N − 1)). The S value in Barcelona and Paris networks are quite similar although their size differs (Paris is 1.5 times the Barcelona network size). However, when comparing values of S scaled by the minimum possible modularity for a specific network size N in the space P (equivalent to having no modularity), the ranking of the networks becomes quite different (see Table 4). In particular, despite its smaller size, the Barcelona integrated network (studied in more detail in the next section) shows 382 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 Fig. 9. Hide information versus access information in space L. Straight lines correspond to the linear regression of the hide information average value as a function of degree k. Circles are colored according to the node degree. Fig. 10. This figure shows the real–random hide ratio H (k)/Hr (k) in space L. Different networks show similar qualitative behavior. greater levels of modularity among routes than the Paris network. The same reasoning applies to the route modularity of the Moscow subway network with regard to the Paris subway network. J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 383 Fig. 11. Hide information averaged over nodes with the same degree k in space L. Straight lines correspond to the linear regression of the hide information average value H (k) as a function of degree k for four chosen cities. Table 4 Space P network average search information (bits). No modularity among routes: Smin = log2 (N − 1). New York city Paris Barcelona Moscow S S / log2 (N − 1) 13.58 11.43 11.19 10.74 1.52 1.39 1.46 1.43 4.2. A case study: the Barcelona integrated network The Barcelona integrated network is operated by three different sector companies: Transports Metropolitans de Barcelona (TMB), Ferrocarrils de la Generalitat de Catalunya (FGC) and Tramvia de Barcelona (TRAM). This example allows us to study the connectivity among different railroad tracks using the space P to represent the integrated network. Thus, plotting one single stand-alone station representing each line (not a transfer station where passengers change from one route to another) we have drawn the navigational relationship among them (see Fig. 12). The usage of the space P representation has been proven to be useful because it allows us to carry out a route based study (not only a station-based network analysis as in space L). Analyzing the corresponding graph of all routes represented together we take a closer look to the global network service structure. Poor levels of overall communication between railroad tracks are shown in the Barcelona Integrated Network. Compared with others, TMB lines have good levels of accessibility and are easily reachable (low hide information value). Therefore, depending on which company is providing the service, the different railway lines present huge variations in the navigability levels (there is a strong modularity in the service). In other words, a community structure is detected in space P as a result of the existence of different companies operating in the integrated network (see Fig. 12). The direct implication of the results is that there are few multiple crossing routes or shared lines between different companies. On the other hand, the existence of an outlier in the upper top of the plot in Fig. 12 is explained as follows. The point itself belongs to line 11 (L11) in the integrated subway system. Although being operated by TMB this line shows surprisingly high levels of information needed to locate it (hide information value is 15.4 bits) and high values of access information (12.7 bits) as well. This line is characterized for having only one connecting node to the remaining lines of the network (one single transfer station that links lines L11 and line 4 (L4)) at the same time is the shortest line, composed only of 5 stations. In other words, L11 is a dead-end line. Besides, L4 has 21 different stations being one of the longest lines in the integrated network and its access and hide information values are, respectively, 10.6 bits and 11 bits. From this information directly 384 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 Fig. 12. Plot of the dependence of the hide information and access information in space P for each line considered (selecting one single node representing each route) in function of the company administrator for everyone (TMB, TRAM & FGC). Three different communities (enclosed by the ellipses) can be observed as well as one TMB outlier line located in the upper part of the plot showing poor levels of overall communication (see the main text for an explanation). extracted from Fig. 12 (and due to the special characteristics of this part of the network) it follows that we can directly compute an approximate value of the access and hide information values for L11 from those of L4. The idea is quite simple and takes into account the fact that every entering or exiting trip to/from L11 has to pass through L4 as a must condition. From the expression given by Eq. (1) and the fact that H4 = 11 bits, we have that the average probability of finding L4 is 1/2048. Then, knowing that in space P the degree of the station plotted in Fig. 12 representing L4 is k = 21, a straightforward 1 1 · 2048 )∼ calculation shows that an approximate hide information value for L11 is H11 ∼ = 15.4 bits. Once located = − log2 ( 21 on L4, one subway user traveling to L11 has to choose between 21 different options to locate the correct exit that leads him/her to L11. That is the reason why the hide information value of L11 dramatically increases. Following the same principle outlined above but now using Eq. (2), it follows that the average probability of locating any other specific line departing from L4 is 1/1552. Departing from L11 the traveler has to choose between 4 different exits (k = 4) with probability 1/4 to successfully locate the target line L4 as a first necessary step to find any other line when 1 navigating the integrated network. Thus, the approximate value of access information for L11 is A11 ∼ )∼ = − log2 ( 41 · 1552 = 12.6 bits. These rough values presented above totally agree with the corresponding values in Fig. 12. Presenting these results we would like to remark the influence and impacts of route modularity on access and hide information values. 5. Discussion Our research is one example of how information quantification may be included in a transportation network study. In this paper we have presented an empirical study of different subway networks under different representations (space L and space P) of network topology. Subways networks are characterized by significant shared track for many lines because pre-existing surface transport used to have one single original route passing through main stations (see Fig. 4), and for having strong geographical constraints, leading to similar topological and informational characteristics among them. In this sense, it is remarkable that the average search information of the analyzed subway networks linearly increases with the network size, in contrast to what happens in random networks where this information scales as the logarithm of the network size [36]. The presented results also suggest a heterogeneous spatial node distribution for each network meaning that vertices with small connectivity are difficult to reach by the subway user but in general each station shows fairly good network accessibility levels. Another interesting finding is that basic levels of hide information H needed to find a specific station in a subway network increase as the size of the network N does. It is clearly more difficult to navigate in a larger and modular subway network like the one located in New York City than to navigate either in a smaller one or in the respective J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 385 Fig. 13. Representation of paths of length 1 (top) and 2 (bottom) of a network with two routes in space P. There is one transfer station (node 4) and the route sizes are N1 = N2 = 5. Any shortest path of length 2 must pass through node 4. randomized counterpart. From these basic levels of required information, H can be represented as a linear decreasing function of the node degree k. An innovative approach for studying the network community structure using the space P has also been introduced. The representation of subway networks in this space allows us to compare the route modularity among networks by using the ratio of average search information in real networks to that of a fully connected network (i.e., to that of a subway network with only one route). The same representation is used to analyze the global service structure in an integrated network with different company administrators. This analysis leads to a clear conclusion, namely, that it is important to plan the integrated rail network systems in an effective way to maintain good cooperation among the different parts of the subway networks and to avoid poor levels of overall communication between them (as the obtained for the Barcelona integrated network) because of the massive investment needed to build them. Acknowledgements This work has been partially supported by the project FP6-2003-NEST-PATH-1 named ‘‘Unifying Networks for Science and Society’’ of the Sixth European Framework Programme (JB and JS), the research project MTM2008-06349-C03-02 of the Spanish government, the project 2009SGR-345 of the Generalitat de Catalunya (JS), a Caja Madrid Foundation grant, and a Balsells fellowship (JB). Appendix In the simple case that a subway network has only two routes, only paths of length one and two are possible in space P. Therefore, S = (S1 + S2 )/(N (N − 1)) with S1 and S2 being the contributions to S of paths of length 1 and 2, respectively. Note that if N1 and N2 are the route sizes, then N = N1 + N2 − 1 (the transfer station is present in both routes and hence is counted twice). A path of length 1 either starts at a regular station or at the transfer one. In the first case, the probability to randomly follow a given path of length 1 of the route i is 1/(Ni − 1), i = 1, 2, because no path of length 1 is possible between stations of different routes (top of Fig. 13). Since there are Ni − 1 paths of length 1 from a given ordinary station of the route i and each route has Ni − 1 ordinary stations, the total number of paths of length 1 is (Ni − 1)2 . On the other hand, the probability of following a given path departing from the transfer station is 1/(N − 1) because there are N − 1 regular stations (top of Fig. 13). Therefore, the contribution to S of paths of length 1 is given by S1 = (N1 − 1)2 log2 (N1 − 1) + (N2 − 1)2 log2 (N2 − 1) + (N − 1) log2 (N − 1). Because (shortest) paths of length 2 always connect regular stations of different routes through the transfer station (bottom of Fig. 13), the probability that a random path of length 2 goes from a given departure station of route 1 to a given arrival 386 J. Barberillo, J. Saldaña / Physica A 390 (2011) 374–386 station of route 2 is equal to the probability that the departure station connects to the transfer station, 1/(N1 − 1), times the probability that the transfer station connects to the arrival station (backward step not allowed), 1/(N − 2). Since the total number of paths of length 2 from route 1 to route 2 is (N1 − 1) (N2 − 2), and it is equal to the number of paths in the opposite direction, the contribution to S of paths of length 2 is then given by S2 = (N1 − 1)(N2 − 1)(log2 (N1 − 1) + 2 log2 (N − 2) + log2 (N2 − 1)). Taking S as a function of N1 (or N2 ), from the expressions of S1 and S2 it follows that, for a fixed N, the average search information S attains its maximum at the same value of N1 as does the function F (N1 ) = (N − 1)((N1 − 1) log2 (N1 − 1) + (N − N1 ) log2 (N − N1 )) + 2(N1 − 1)(N − N1 ) log2 (N − 2) with N1 = 1, . . . , N. Both terms in this expression attain their maximum at N1 = (N + 1)/2 (the routes have the same size), whereas they vanish at N1 = 1 and N1 = N − 1 (only one route is present). Therefore, it follows that log2 (N − 1) is the minimum value of S for 1 ≤ Ni ≤ N (i = 1, 2) with N1 + N2 − 1 = N. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] M. Manheim, Fundamentals of Transportation Systems Analysis Basic Concepts, MIT Press, Cambridge, Massachusetts, 1979. E. Cascetta, Transportation Systems Engineering: Theory and Methods, Kluwer Academic Publ., Dordrecht, Boston, London, 2001. P. Angeloudis, D. Fisk, Physica A 367 (2006) 553–558. B.-L. Tim, H. Wendy, A.H. James, O.H. Kieron, S. Nigel, J.W. Daniel, Foundations and Trends in Web Science 1 (2006) 1–130. H.A. Simon, Models of Bounded Rationality, MIT Press, Cambridge, Massachusetts, 1982. M. Ben-Akiva, A. De Palma, K. Isam, Transportation Research Part A: General 25 (1991) 251–266. M.E.J. Newman, J. Park, Physical Review E 68 (2003) 36122. R. Albert, A.L. Barabasi, Reviews of Modern Physics 74 (2002) 47–97. M. Buchanan, Nexus: Small Worlds and the Groundbreaking Science of Networks, W.W. Norton & Company, New York, London, 2002. D.J. Watts, Six Degrees: The Science of a Connected Age, Norton, New York, London, 2003. R. Albert, H. Jeong, A.-L. Barabasi, Physica A 272 (1999) 173–187. J.M. Montoya, S.L. Pimm, R.V. Solé, Nature 442 (2006) 259–264. P. Sen, S. Dasgupta, A. Chatterjee, P.A. Sreeram, G. Mukherjee, S.S. Manna, Physical Review E 67 (2003) 036106. Y.-Z. Chen, N. Li, D.-R. He, Physica A 376 (2007) 747–754. C. von Ferber, T. Holovatch, Y. Holovatch, V. Palchykov, Modeling metropolis public transport, in: Traffic and Granular Flow’07, Springer Verlag, 2009, pp. 709–719. J. Sienkiewicz, J.A. Holyst, Physical Review E 72 (2005) 046127. X. Xu, J. Hu, F. Liu, L. Liu, Physica A 374 (2007) 441–448. D.O. Cajueiro, Physical Review E 79 (2009) 046103. W. Li, X. Cai, Physica A 382 (2007) 693–703. V. Latora, M. Marchiori, Physica A 314 (2002) 109–113. I. Vragovich, E. Louis, A. Diaz-Guilera, Physical Review E 71 (2005) 036122. D. Gattuso, E. Miriello, Networks and Spatial Economics 5 (2005) 395–414. S.N. Dorogovtsev, J.F.F. Mendes, Evolution of Networks: From Biological Nets to the Internet and WWW, Oxford University Press, Oxford, 2003. M. Rosvall, P. Minnhagen, K. Sneppen, Physical Review E 71 (2005) 066111. M. Rosvall, A. Trusina, P. Minnhagen, K. Sneppen, Physical Review Letters 94 (2005) 028701. T. Gärling, K. Axhausen, Transportation 30 (2003) 1–11. K.W. Axhausen, A. Zimmermann, S. Schönfelder, G. Rindsfüser, T. Haupt, Transportation 29 (2002) 95–124. S.C. Wirasinghe, U. Vandebona, Transportation and traffic theory, in: Proceedings of the 14th International Symposium on Transportation and Traffic Theory, Jerusalem, Israel, 1999. U. Vandebona, 28th Conference of Australian Institutes of Transport Research, School of Civil and Environmental Engineering, UNSW, Australia, 2006. G.J. Klir, M.J. Wierman, Uncertainty-Based Information: Elements of Generalized Information Theory, Physica-Verlag, Heidelberg, New York, 1998. C.E. Shannon, W. Weaver, The Mathematical Theory of Communication, The University of Illinois Press, Chicago, 1949. K.J. Arrow, Information and Economic Behavior, Harvard Univ. Cambridge, Massachusetts, Defense Technical Information Center, Ft. Belvoir, 1973. T.M. Cover, J.A. Thomas, Elements of Information Theory, John Wiley, New York, 1991. B. Walliser, Cognitive Economics, Springer, Berlin, 2008. K. Sneppen, A., M. Trusina, M. Rosvall, Europhysics Letters 69 (2005) 853–859. M. Rosvall, A. Grönlund, P. Minnhagen, K. Sneppen, Physical Review E 72 (2005) 046117. V.R. Vuchic, Urban Transit: Systems and Technology, J. Wiley & Sons, Hoboken, NJ, 2007. C. von Ferber, T. Holovatch, Y. Holovatch, V. Palchykov, Physica A 380 (2007) 585–591. X. Xu, J. Hu, F. Liu, Chaos 17 (2007) 023129. Y. Hu, D. Zhu, Physica A 388 (2009) 2061–2071. A. Barrat, M. Barthelemy, R. Pastor-Satorras, A. Vespignani, G. Parisi, Proc. Natl. Acad. Sci. USA 101 (2004) 3747–3752. M.E.J. Newman, S.H. Strogatz, D.J. Watts, Physical Review E 64 (2001) 026118. S. Timpf, Networks and Spatial Economics 2 (2002) 9–33. S. Maslov, K. Sneppen, Science 296 (2002) 910–913. M. Barthelemy, A. Barrat, A. Vespignani, Advances in Complex Systems 10 (2007) 5–28.