Academia.eduAcademia.edu

Communication Bottlenecks in Scale-Free Networks

2006, Computing Research Repository

We consider the effects of network topology on the optimality of packet routing quantified by c, the rate of packet insertion beyond which congestion and queue growth occurs. The key result of this paper is to show that for any network, there exists an absolute upper bound, expressed in terms of vertex separators, for the scaling of c with network

Communication Bottlenecks in Scale-Free Networks Sameet Sreenivasan,1,2 Reuven Cohen,3 Eduardo López,4 Zoltán Toroczkai,2∗ and H. Eugene Stanley1 arXiv:cs/0604023v1 [cs.NI] 6 Apr 2006 2 1 Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215 Center for Nonlinear Studies, Los Alamos National Laboratory, MS B258, Los Alamos, NM 87545 3 Laboratory of Networking and Information Systems and Department of Electrical and Computer Engineering, Boston University, Boston, MA 02215 4 Theoretical Division, Los Alamos National Laboratory, MS B258, Los Alamos, NM 87545 (Dated: February 1, 2008) We consider the effects of network topology on the optimality of packet routing quantified by γc , the rate of packet insertion beyond which congestion and queue growth occurs. The key result of this paper is to show that for any network, there exists an absolute upper bound, expressed in terms of vertex separators, for the scaling of γc with network size N , irrespective of the routing algorithm used. We then derive an estimate to this upper bound for scale-free networks, and introduce a novel static routing protocol which is superior to shortest path routing under intense packet insertion rates. PACS numbers: 89.75.Hc, 89.20.Hh, 89.75.Da Communication has stepped into a new era with the advent of the Internet, making possible information exchange/transport across the globe virtually in an instantaneous fashion between any two people who have access to it. Broadcasting and advertising messages, homepages, blogs and practically any information posted on the WWW is within the reach of anyone accessing those pages and thus, downloading that information. This activity, exponentially increasing over the past years involves an incredible amount of information stored and transmitted through the physical infrastructure of the Internet, every second of the day. As the number of computers and users surpasses into the billions, one might naturally ask about the ultimate limits to using the Internet. In terms of transmission latency the Internet is pretty good already. As an illustration, consider the distance between Los Alamos and Boston (as the crow flies), which is about 3109 km. The speed of light in fiber is about 2/3 of that in vacuum, about 2 × 105 km/s. Thus the round-trip time for information between Los Alamos and Boston is about 31 ms. Performing a ping on a Los Alamos computer to a computer at Boston University gives for the round-trip time about 64 ms which is within a factor of two of the absolute physical bound. Therefore, no order of magnitude improvements can be expected in transmission latency for the Internet. The current paradigm in communication on networks is packet switching where the message is divided into packets which are then routed between nodes over data links, independently from each other, and reassembled at the destination into the original message. This decentralized methodology makes information transmission efficient by providing better utilization of the available bandwidth (a single link can be used to transmit any ∗ Department of Physics, University of Notre Dame, Notre Dame, IN 46556 (after Jun. 1 2006) packet). However, due to the increasing demand of information carried through the Internet, delays can occur in packet delivery, mainly caused by device (end-user and router) latency. Device latency is the amount of time τ that a device needs to process a single packet. Although the devices are getting better in their latency, this is a physical constraint and can never be completely eliminated. Since more packets may arrive at a node than it is able to process per unit time, queues can accumulate and thus routers must have a storing capacity as well. These queues will naturally slow down information transport over the network. As an interesting observation, the US Postal Service is capable of achieving higher information transmission rates than the current Internet. For example, for a T1 line which transmits at about 1.544 Mbit/s, downloading a 4.7 GB DVD takes about 6.76hrs. If one ships 1000 DVD-s from coast to coast in the US, it will take about 3 days, but the transmitted information would have a bandwidth matching that of 94 T1 lines. Precisely this fact is exploited by DVD rental delivery companies like Netflix which distributes about 1.5 Terabytes of data per day, the same order of magnitude as the Internet [1]. In spite of technological advances, the Internet is being driven closer to its capacity. These facts lead us to two important questions: (1) How can one characterize a packet switched communication network’s ultimate carrying capacity? and (2) What routing algorithms will achieve this ultimate capacity? In this Letter we present a proof-of-principle study to show that the ultimate carrying capacity is strongly influenced by the network’s structure. We demonstrate the existence of a solely topology determined upper bound γT for the congestion threshold γc [2] which is the packet insertion rate at which queuing and congestion in the network appears. It has been conjectured that the degree distribution of the Internet follows a power law on several levels [3, 4, 5]. 2 Recent experimental studies have strengthened the validity of this conjecture [6, 7]. For our study, we will confine ourselves to the the configuration model (CM) [8] which is one of the simplest models to generate a random graph with a power law degree distribution. The approach presented here is, however, applicable to arbitrary graph structures. We consider all time scales measured in units of router latency τ which for simplicity, we take to be unity. We will also assume that routers have infinite storage capacity. The Static Routing Problem. Denote by G(V, E) the physical substrate graph (network) for communication which we assume to be singly connected. Once a packet entering node s reaches its destination node d, it disappears from the system. The sequence of nodes and edges the packet visits constitutes the route for that sourcedestination pair. For a network of size N , the routing problem consists of finding an assignment of routes for all N (N − 1)/2 pairs of nodes. We shall call such an assignment set a Static Routing Protocol (SRP). We consider a previously studied [2, 9, 10, 11] model of communication, which was motivated by the need to study the problem of congestion on the router-level Internet. Here, the packet transmission is modeled by a discrete time parallel update algorithm. At time t and at every node, a packet enters with probability 0 ≤ γ ≤ 1. The packet has a destination node, chosen uniformly at random from the remaining N − 1 nodes. Every node i maintains a set of all packets that were sent to it by its neighbors in the previous step, eliminates from this set newly arrived packets whose destination was i, adds to this set the freshly injected packet (if there is one) and finally places elements of this set in a sub-queue in a random order. This randomization is needed because times are not resolved below the single-packet processing timescale, τ . The sub-queue is then appended to the existing queue, if there is one, from before the t-th step. The top packet in the queue is then sent to a neighbor on G following the SRP. There is a critical rate γc of packet creation at which there is an onset of congestion, i.e., above γc , packets start accumulating on the network [9, 11]. This is commonly designated as the “congestion threshold”. In Fig. 3, we show a rescaled version [9] of the rate of steady-state packet growth θ(γ) ≡ limt→∞ [n(t + ∆t) − n(t)] / (N γ∆t) as function of γ for both the shortest path (SP) protocol and the novel one proposed in this paper. Here n(t) is the number of packets on the network at time t. This threshold can be expressed in terms of the maximal node betweenness B for a given SRP. The betweenness b for a node is the number of SRP routes passing through that node. The highest among the N betweenness values (one for each node) resulting from the SRP is the maximal node betweenness B. For a given SRP route between a source s and destination d the average packet current incurred from the source at s is γ/(N − 1). For a node with between- BT Bopt B 0 HA B SP Be 1 Q X* FIG. 1: The relative sizes of the betweenness values introduced in the text. ness b the average packet inflow current will be given by bγ/(N − 1). Since the outflow of packets occurs at unit latency, we will have queueing and congestion at the node for which this quantity reaches unity for the first time, namely at the node with b = B. Thus γc = N −1 . B (1) For SP routing [2, 9, 10, 11], the node betweenness becomes identical to the familiar, shortest path betweenness, B SP [12]. From Eq. (1) follows that for a given routing protocol, the dependence of the congestion threshold γc on N , is determined by the scaling with N of the maximal node betweenness B. Therefore, the best routing protocol from the point of view of router congestion avoidance, should be the one for which B exhibits the slowest growth with N . Although there have been prescribed ad hoc adaptive protocols [13, 14, 15] that increase γc , the above issue has not been systematically addressed. Next, we show that there is a lower bound BT ≤ B (and thus γ ≤ γT ) induced only by the topology of the network G, and it is independent of the routing protocol used. In other words, no SRP can do better than γT . This ultimate threshold BT is essentially a communication bottleneck quantifier for a given graph G. Among all possible SRPs (whose set is denoted as P), let us write Bopt for the smallest maximal betweenness value, namely Bopt = minSRP ∈P B SRP , so BT ≤ Bopt (Fig. 1). It is an open question whether the topological bound can be achieved by a routing protocol. Similar considerations have been made in the context of edge betweenness in Refs.[16, 17]. Here we focus on scaling of the bound BT as function of N . We introduce BT using graph partitioning arguments. Given an arbitrary network G, partition the set of all nodes V into three non-empty sets denoted, A, X and B. Since G is singly connected, there will be edges running between at least two pairs of the three possible pairs. Choose set X such that there are no edges running directly between A and B in which case X is called a vertex separator. For any SRP we must designate a route for all pairs of nodes, therefore also for those pairs for which one node is in A and the other in B. Since X is a separator set, all routes from A to B, must go through the nodes in X. Therefore, there are at least |A||B| routes passing through X for any SRP. Since the maximum is always larger or equal than the average, the maximum betweenness incurred on the nodes in X can be no less than |A||B| |X| . We define the sparsity [18] of the separator 3 A \ c(A) A c(A) c(B) sparsity QX ∗ is determined by    1 |c(A)| |c(B)| ∗ . (3) QX ≥ min min , O(N ) A⊂V,A≤ N2 |A| |A| B \ c(B) B FIG. 2: Bipartitioning the graph into two vertex subsets A and B such as to obtain two vertex separators, c(A) and c(B), see text. |X| X the quantity QX ≡ |A||B| . Thus, associated with every vertex separator X there is a quantity BX = 1/QX providing a lower bound to the maximal betweenness on nodes in X. Let us denote by M the set of all possible vertex separators in G. If we systematically consider all possible choices of vertex separators X ∈ M, we can find (at least) one separator X ∗ for which BX = 1/QX achieves its maximal value defined as BT . Thus, the topology of the graph constrains the maximal betweenness to be no less than BT , and for arbitrary routing, B ≥ BT = 1/QX ∗ = 1/ minX∈M QX . Finding minimal sparsity vertex separators is an NP-hard problem [19], and we shall not deal with it here. Due to the analytical and the computational difficulty in determining BT , we focus on obtaining an analytical estimate Be to BT , and derive its scaling with N for random, uncorrelated, scale-free networks. This estimate, while possibly being greater than the true topological bound BT , nevertheless provides a comparative value dependent only on the network topology. This estimate, Be , allows us to quantify the performance of the SP protocol. We start by systematically considering every possible vertex separator in the graph as follows. First, bipartition the graph as shown in Fig. 2 into sets A and B with |A| ≤ |B|. Let c(A) be the subset of nodes in A which are adjacent to at least one node in B and let c(B) be the subset of nodes in B which are adjacent to at least one node in A. We can now obtain a vertex separator c(A) which separates sets A \ c(A) and B, or similarly, a vertex separator c(B) which separates sets B \ c(B) and A . Thus, going through all possible bipartitions of the graph with |A| ≤ N/2 ensures that we have considered all possible vertex separators of the graph. If c(A) is chosen as the separator then the sparsity is Qc(A) = |c(A)|/ (|A − c(A)||B|) ≥ |c(A)|/ (|A||B|). We obtain a similar expression for Qc(B) if c(B) is chosen as the vertex separator. Therefore Qc(A) ≥ 1 |c(A)| 1 |c(B)| and Qc(B) ≥ . |B| |A| |B| |A| (2) Since |A| ≤ N/2, |B| ≡ O(N ), and a lower bound for the Next we use the notion of edge expansion χe defined below. For a bipartition of the graph G into sets A and B, denote the number of edges simultaneously adjacent to a node in A and B as ce (A, B). Then χe = min A⊂V,A≤ N 2 |ce (A, B)| , |A| (4) and an edge expander graph has χe ≥ O(1). Next consider a bipartition of the graph into A and B, and let |A| = cN α where c is a constant and 0 < α ≤ 1. From the edge expansion property of scale-free graphs with kmin ≥ 3 [16], the number of cut edges between A and B is at least χe cN α = O(N α ). We can bound from below both |c(A)| and |c(B)| (as needed by (3)) by the minimal size m of the set of nodes that can contribute χe cN α cut edges. The size m is obtained by taking R ∞ all nodes with degree higher than k̂, such that N k̂ kP (k)dk = χe cN α , where P (k) = Ak −λ is the degree distribution of the 1−α graph. This yields k̂ ∼ N λ−2 . Therefore the minimal size of the set of nodes that can contribute χe cN α edges R∞ 1−α is: m = N k̂ P (k)dk ∼ N · N (1−λ) λ−2 and therefore,   1−α (5) |c(A)|, |c(B)| ≥ m = O N · N (1−λ) λ−2 . The quantity m is bounded below by O(1). For a given λ we see that when α = 1 or in other words sets A and B in the bipartition are both O(N ), we get m ≡ O(N ). For all other values of α, we get m < O(N α ). As α decreases from 1, m also decreases until it becomes O(1) and this occurs for the first time whenα = 1/(λ  − 1). Thus, from λ (5) and (3) we get QX ∗ ≥ O N − λ−1 and so  λ  BT ≤ Be ≡ O N λ−1 . (6) From (6) we see that when λ → 2, we get the worst possible scaling of Be = O(N 2 ), which can be understood from the fact that the graph becomes increasingly star-like, and for such a graph the central node trivially has B = O(N 2 ). On the other hand, when λ → ∞, Be → O(N ). In this case the graph approaches a random regular graph and random regular graphs are good vertex expanders [21]. This implies that for any bipartition into A and B, there exists a constant µ such that µ |c(B)| ≥ µ|A|. Thus |c(A)| ≥ 1+µ |A|, so |c(A)| and |c(B)| are linear in |A| and hence Be = O(N ). When 2 < λ < 3, for the networks generated by the configuration model to be uncorrelated requires that the maximum degree in the network Kmax ∼ N 1/2 [22]. Incorporating this upper cutoff in the arguments made 3 above, we obtain QX ∗ ≥ O(N 2 ) and hence BT ≤ Be ≡ 3 O(N 2 ) (same as for λ = 3 in (6) ). From the inset in 4 1.2 10 1 10 1.80 10 0.8 1.48 0.6 θ(γ) 10 2 10 3 4 10 10 10 7 6 5 Bmax 4 3 N 0.4 γ HA γ SP c c 0.2 0 0 0.02 0.04 γ 0.06 0.08 0.1 FIG. 3: Numerical comparison for the performance of SP and hub avoidance HA protocols on a scale-free graph of size N = 103 and λ = 2.5. The black circles correspond to the SP protocol and the red squares correspond to the HA protocol. The congestion threshold γc beyond which packet growth occurs (θ(γ) > 0), is higher for the HA protocol as compared to the SP protocol. The inset shows that maximum betweenness for SP and HA protocols on scale-free graphs has power-law scaling with system size. The maximal betweenness B HA resulting from the HA protocol has scaling exponent 1.48, close to our estimate for the topological bound on the maxi3 mal betweenness Be ∼ N 2 . However, the maximal betweenness B SP resulting from the SP protocol grows much faster, B SP ∼ N 1.80 . Fig 3, we see that the scaling the maximal betweenness incurred by the SP protocol, B SP ∼ N 1.80 . This is much worse than the scaling of Be , and therefore suggests that an SRP for which the maximal betwenness scales like Be would have a better performance than the SP protocol from the point of view of congestion. The question arises whether Be can be achieved by any static routing protocol. We answer this question affirmatively by presenting next an SRP for which the scaling of the maximal betwenness is superior even to the scaling of Be and therefore significantly better than the scaling of B SP . Our derivation of Be suggests that the sparsity is smallest when obtained from a bipartition where the smaller set is of size of the order of the maximal degree. This [1] P. Wayner, New York Times, September 23 (2002), http://www.nytimes.com/ref/open/innovations /23NECO-OPEN.html [2] H. Fuks and A.T. Lawniczak, Math Comput. Simul. 51, 101 (1999) [3] M Faloutsos et al., Proc. SigComm. ACM (1999). [4] H. Tangmunarunkit et al., Proc. SigComm. ACM, (2002). [5] A. Vazquez et al.,cond-mat/0206084. [6] http://www.caida.org/tools/ measurement/skitter/router topology/ [7] http://www.netdimes.org [8] M . Molloy and B . Reed, Random Structures and Algorithms 6, 161 (1995) [9] A. Arenas et al. Phys. Rev. Lett. 86, 3196 (2001) [10] R. V. Sole and S. Valverde, Physica A 289, 595 (2001) suggests that, topologically, the betweenness for hubs is high, and using the SP protocol increases this betweenness since shorter paths largely tend to use hubs. Moreover, using the SP protocol leaves a large number of alternate paths unused for routing. Exploiting these observations, we obtain a novel SRP, which we call the hub avoidance (HA) protocol, as follows: (1) Remove x of the highest degree nodes. The network could now consist of several disconnected clusters. In every such cluster, assign a routing path for every pair of nodes using SP. (2) Place back the removed nodes with their edges. For every pair of nodes which have not been assigned a routing path in Step 1), assign one using the SP protocol. For our simulations we have chosen x = 0.01N , but for optimal performance the functional dependence of x on N may be different. A detailed theory for this protocol with these considerations will be presented elsewhere. Here our primary purpose of presenting the HA protocol is to indicate that there exists an SRP for which the scaling of the maximal betweenness not only achieves, but surpasses the scaling of the topological estimate Be , and therefore is a significant improvement over the SP protocol. This improvement comes from utilizing available alternate paths which, while not significantly longer than the shortest path, also considerably alleviate the load on the hubs. The plot in Fig.3 shows the improvement in performance achieved by our protocol as reflected by the increase in the position of the congestion threshold and the lowering in the number of accumulating packets at a given packet creation rate γ as compared to the shortest path protocol. Thus, in summary, we identify a bound to communication arising purely due to the network topology and utilize this to show that there exist better SRPs than the SP protocol for routing on scale free networks. We thank Donald Thompson for providing latency data and J. Živković for comments. S.S. and H.E.S. were supported by ONR, E.L. and Z.T. were supported by DOE contract No. W-7405-ENG-36 and R. Cohen was supported by ISF and DysoNet. [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] L. Zhao et al., Phys. Rev. E 71, 026125 (2005) L. C. Freeman, Sociometry 40, 35 (1977) P. Echenique et al., Phys. Rev. E 70, 056105 (2004). P. Echenique et al., Europhys. Lett., 71, 325 (2005). G. Yan et al., cond-mat/0505366 C. Gkantsidis et al., Proc. SigMetrics. ACM. (2003). A. Akella et al., ACM Principles of Distributed Computing, (2003) V. Vazirani, Approximation Algorithms, Springer-Verlag (2001). T.N. Bui and C. Jones, Inf. Proc. Lett. 42, 153 (1992). N. Alon, Lecture Notes In Computer Science 1380, Proceedings of the Third Latin American Symposium on Theoretical Informatics, 206 (1998). G Davidoff et al., Elementary Number Theory, Group 5 Theory and Ramanujan Graphs, Cambridge University Press (2003). [22] M. Catanzaro et al., Phys. Rev. E 71, 027103 (2005).