Alzahrani 2015

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Journal of Network and Computer Applications


journal homepage: www.elsevier.com/locate/jnca

Scalability of information centric networking using mediated


topology management
Bander A. Alzahrani a,n, Martin J. Reed a, Janne Riihijärvi b, Vassilios G. Vassilakis c
a
School of CSEE, University of Essex, Colchester, UK
b
Institute for Networked Systems, RWTH Aachen University, Aachen, Germany
c
Centre for Communication Systems Research (CCSR), Department of Electronic Engineering, University of Surrey, Guildford, UK

art ic l e i nf o a b s t r a c t

Article history: Information centric networking is a new concept that places emphasis on the information items
Received 15 December 2013 themselves rather than on where the information items are stored. Consequently, routing decisions can
Received in revised form be made based on the information items rather than on simply destination addresses. There are a
22 May 2014
number of models proposed for information centric networking and it is important that these models
Accepted 2 July 2014
are investigated for their scalability if we are to move from early prototypes towards proposing that
these models are used for networks operating at the scale of the current Internet. This paper investigates
Keywords: the scalability of an ICN system that uses mediation between information providers and information
Information centric networking consumers using a publish/subscribe delivery mechanism. The scalability is investigated by extrapolating
Bloom filter
current IP traffic models for a typical national-scale network provider in the UK to estimate mediation
Security
workload. The investigation demonstrates that the mediation workload for route determination is on a
Topology management
scale that is comparable to, or less than, that of current IP routing while using a forwarding mechanism
with considerably smaller tables than current IP routing tables. Additionally, the work shows that this
can be achieved using a security mechanism that mitigates against maliciously injected packets thus
stopping attacks such as denial of service that is common with the current IP infrastructure.
& 2014 Elsevier Ltd. All rights reserved.

1. Introduction networking (ICN) (Trossen and Parisis, 2012), content centric


networking (CCN) or named data networking (NDN) (Jacobson
Networking has traditionally focussed on a node centric model: et al., 2009). This paper will concentrate on one of these archi-
a user connects to a particular server through an identifier such as tectures, namely the publish subscribe Internet routing paradigm
a domain name and obtains, or sends, a specific piece of informa- (PSIRP), later furthered by the PURSUIT project (Fotiou et al., 2012),
tion. This model has served well and underpins highly successful and will use the term ICN for this new paradigm. One question
networks such as the Internet. However, coping with scale has that is important to ask with the proposal for a new architecture is
brought about changes to this model through technologies such as why it should be introduced? Certainly the current Internet
content delivery networks (CDN) and HTTP redirection (HTTP-R) architecture is highly successful and has, despite occasional “doom
(Spagna et al., 2013). Both of these technologies effectively break mongering” (Handley, 2006), managed to keep up with strong
the node centric model, although users are essentially unaware demands for growth. Consequently, we do not propose ICN
that a single domain name does not mean a single physical server because the Internet is “broken,” but rather look to an alternative
with one address. Recently researchers have questioned whether so that it can provide a basis for new services and future growth
creating elaborate technologies to bolster up a node centric model with lower transport costs.
is sensible and have turned instead to alternate architectures There are a wide variety of ICN solutions (Ahlgren et al., 2012),
which label information items in a manner that focusses less on however, we can generalize the architectures as systems that label
where they are stored and more on what they are, and how they information items and provide a network architecture where
relate to other information items (Trossen et al., 2010). There is not providers of the information items can advertise the information
yet a single architecture which can be said to definitively provide items; consumers of information items can request the items; and,
this alternative paradigm and there are a number of alternative network nodes can forward information based upon the matching
architectures said to fall under terms such as information centric of provider, consumer and information label. An important issue
with regard to ICN architectures is the methodology for forward-
ing the information. One strategy, taken by techniques such as
n
Corresponding author. NDN, is to forward information based on the content labels

http://dx.doi.org/10.1016/j.jnca.2014.07.002
1084-8045/& 2014 Elsevier Ltd. All rights reserved.

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i
2 B.A. Alzahrani et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

directly (Jacobson et al., 2009). This allows decisions on what to do function of the architecture in Section 3. Traffic models are
with the data to be made based on the content labels, for example described in Section 4 that are used to drive estimates of the
items may be opportunistically cached if it is known they may be scalability that are presented and discussed in Section 5.
useful for future users thus avoiding unnecessary transport from
the original, more distant, provider we will call this as content
based forwarding. An alternative strategy is to forward using more 2. The PURSUIT ICN architecture
conventional means, including IP or label swapping, and use a
mediation system to match the provider, consumer and informa- The PSIRP project (and later PURSUIT) (Fotiou et al., 2012)
tion to a suitable destination address or a path at the latest brings together two concepts to form an ICN solution: mediated
possible moment, the so-called late binding; we call this strategy assisted forwarding and publish/subscribe communication. A pub-
as mediation assisted forwarding. lisher notifies the mediation system that it can provide an item,
Content based forwarding is attractive as it allows feature rich a subscriber notifies the mediation system that it wishes to obtain
forwarding/caching decisions. However, if we scale this technique the information item. These two events need not be in the
to the size of the Internet this potentially means that a core traditional client/server order: it is possible that the subscriber
Internet forwarding node either has to have a forwarding table the requests something before a publisher has made it available. The
size of all the information item labels in the Internet or at least the mediation system is broken down into two functions: Rendezvous
ability to obtain forwarding information for any of the arbitrary and topology management. These two functions control the third
items in good time i.e. at a speed compatible with increasing function: forwarding. It is assumed in this paper that networks are
packet forwarding speeds. The scalability of this approach has not owned and managed by autonomous systems in the same manner
been suitably investigated and we leave it to the reader to draw as current network infrastructures. Consequently, the three func-
conclusions or further the investigation of the field. tions are required in each autonomous system and here we
Alternatively, mediation assisted forwarding can use an effi- consider how they operate within a single autonomous system.
cient forwarding mechanism allowing the use of high-speed net- Connections between autonomous systems might be expected to
work switching using either existing technology, or as used in the be maintained through mechanisms such as that used in the
work of this paper, alternative switching technology. However, current Internet through BGPv4 routing.
mediation assisted forwarding requires some centralized, or semi-
centralized, network resource that has knowledge of providers and 2.1. Mediation between publishers and subscribers: Rendezvous
consumers and can quickly plan a suitable resource. At first sight
this may seem to be challenging, however, in the current Internet The subject of this paper is mainly regarding the topology
model there is a close equivalent in the form of the domain name management function, but to further understand the mediation
system (DNS); in most Internet based communication there is a mechanism it is important to have a high-level understanding of
requirement for an application to contact this network resource to the Rendezvous function. Each information item is given a statis-
find the forwarding address and return it to the consumer (and tically unique Rendezvous ID (RID). The Rendezvous is a distrib-
often the provider in the form of a reverse name lookup). The uted system that maintains an information graph that links each
current Internet is evidence that such a “centralized” network RID in a hierarchical structure. Each RID is said to belong to a
resource is possible, however, in ICN the granularity of requests is parent scope and a scope may itself be the child of another scope.
not at the scale of domain names, but at the level of individual In this manner it is possible to relate information items through
information items. Thus, the question addressed in this paper is ontologies that can be flexibly defined (Tagger et al., 2013), for
the following: can mediated assisted forwarding be achieved at a example a video content provider could publish videos as each
scale equivalent to that provided by network operators that having a unique RID under different categories (e.g. action, drama)
support the current Internet? We affirm that the answer to this each defined by a different scope. The information graph allows an
question is positive and explore the parameters of the problem RID (or scope) to have more than one parent scope such that,
and suggest some solutions. The work is focussed on solving this following on from the video example, a film that is both a science
question for a single domain comparable in size to a current fiction and an action movie could have both categories as parents.
autonomous system. The Rendezvous is also responsible for matching publishers to
This paper presents an overview of an ICN architecture in subscribers as shown in Fig. 1. The Rendezvous has been studied in
Section 2 and expands the description of the topology management some depth (Trossen and Parisis, 2012) using approaches that are

Rendezvous: Matching events

Publish (RID) Subscribe (RID)


3 Create delivery path
1 2

Topology Manager: Path


Computation
4
Send FID

FW node FW node FW node


Publisher Subscriber
5 Data forwarding

Fig. 1. The PURSUIT architectural model showing the mediation between publisher and subscriber through the Rendezvous and topology manager.

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i
B.A. Alzahrani et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3

either based on distributed hash tables (Katsaros et al., 2012) or management however inter-domain routing is required as well.
hierarchical naming in systems such as data oriented (and beyond) We assume that a mechanism such as BGP will be used to
network architecture (DONA) (Koponen et al., 2007; Vasilakos et al., maintain reachability information for remote autonomous systems
2012). It would be expected that a Rendezvous function would be as in the current Internet.
maintained by each autonomous system. Once the Rendezvous has The second core functionality of the topology management, the
received both a subscription and a publication event for a construction of delivery trees connecting publishers and subscri-
particular item (an RID) it then passes over to the topology bers, is based on the topology data gathered from the network,
management function to perform the routing required for the and the information on the identities of the publishers and
forwarding function. The Rendezvous mechanism has been shown subscribers obtained from the Rendezvous. Typically this entails
to be scalable to Internet scale solutions by Katsaros et al. (2012) the construction of a shortest path spanning tree connecting the
and Vasilakos et al. (2012). publisher and subscribers, and informing the forwarding function
of the existence of this new delivery tree. In the following section
we shall discuss in more detail how the forwarding function can
2.2. Topology management – an overview be implemented, and how the delivery trees can be represented in
a compact fashion.
Topology management is responsible for maintaining intra- In the case of best-effort delivery – where paths between a
domain knowledge of an autonomous system and to determine publisher and subscriber(s) can be made arbitrarily and no path
which publisher will be used for providing a particular data item. state is required – a topology manager instance can determine a
When the Rendezvous requests that the topology management suitable FID based only on topological information. Consequently,
function calculates a route, a topology manager uses the topolo- any suitable topology manager instance can be used for the FID
gical data to construct a forwarding identifier (FID) that is used by calculation and the topology management function can be spread
the forwarding function. A more detailed description of topology across any suitable number of instances as is required to meet the
management is given in Section 3, here we present an overview of demand.
the function of topology management. It should be stated here
that the topology management is a central network function, but
this does not necessarily mean that the function is carried out by a 2.3. Forwarding
single topology manager instance. In practice the number of
topology manager instances may range from a single instance to The PURSUIT ICN model allows a number of forwarding
one running in every node in the network. One of the aims of this mechanisms to be used. Indeed it is possible that IP could be used
paper is to determine how many instances may be needed and this as the forwarding mechanism, in this case the topology manager
is determined for a typical intra-domain scenario in Section 5. sends a destination IP address (or multicast IP address) as the FID.
The core functionality of topology management consists of two However, PURSUIT has developed a default forwarding model
basic tasks: maintaining an up-to-date representation of the net- based on an approach called Line Speed Publish/Subscribe Inter-
work topology and the construction of delivery trees for forward- networking (LISPIN) (Jokela et al., 2009), a multicast forwarding
ing. Maintaining an up-to-date representation of the network fabric based on Bloom filters (Bloom, 1970). This has the advantage
topology, for example in a form of an annotated graph, enables of providing stateless multicast forwarding. LIPSIN operates using
finding shortest paths and minimal multicast tree construction. source routing that encodes the delivery path (or tree in the case
The details of this function vary somewhat between network of multicast) using link identifiers (LIDs) that are assigned to each
technologies, but in general resemble closely the neighbor dis- link. By using Bloom filters it is possible to encode a number of m-
covery mechanisms of classical Internet routing protocols such as length LIDs into a single m-length Bloom filter.
OSPF. In an ICN implementation the topology manager instances The operation of LIPSIN is shown in Fig. 2. Each unidirectional
running at individual nodes would subscribe to Hello-messages link between two nodes is identified with a LID. This identifier is
and other signalling information in a dedicated topology manage- m-bits long with k-bits set to 1, where k is the number of hash
ment scope, and periodically publish their identities as Hello- functions used to generate bit-positions set to 1. The topology
messages in this scope. Information from these messages can then manager encodes all the delivery tree elements that compose this
be combined by the different nodes and published again to all path into the Bloom filter by simply forming the logical OR of the
interested nodes, enabling the entire network topology to be individual LIDs constituting the tree. We identify the FID con-
reconstructed. This paper concentrates on intra-domain topology structed by the topology manager with this Bloom filter in order to

Fig. 2. LIPSIN based forwarding using Bloom filters: one of the possible forwarding models in the PURSUIT architecture.

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i
4 B.A. Alzahrani et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

be compatible with the LIPSIN terminology. At the end of this 3. Topology management design parameters
process, the topology manager sends the created FID to the
publisher in order to be placed in the packet header for Here we consider the design of the topology management
forwarding. function from both the perspective of performing its core function
Using the FID that is included in the packet header, a forward- – routing – and the requirement that this routing function cannot
ing node can decide where packets should be forwarded by be subverted by an attacker to send a significant quantity of
performing a bitwise AND between the FID and the LIDs of the malicious traffic.
outgoing adjacent links. If the result of the set membership
function is true, then the forwarder will forward the packet 3.1. Routing functionality in the topology management
through the link assuming that this link is part of the delivery
tree. It should be noted that, while the Bloom filter does not have Recall that the topology management is aware of the network
false negatives, it may have false positives; this is where an FID topology, and mainly responsible for finding the best path
matches LIDs that were not intentionally included. In the extreme between publishers and subscribers. It creates a FID that is used
case an all “1s” FID will match every LID. In practice the to route information object through the determined path. When
parameters of the Bloom filter need to be chosen carefully in the topology manager is instructed by the Rendezvous to find a
order to reduce the false positives, this is discussed further by suitable path and to create FID, the following two strategies may
Carrea et al. (2014). be used: running Dijkstra's algorithm for each request, and,
caching all the network topology paths. In the first case each time
a topology manager receives a request from Rendezvous to create
a delivery tree Dijkstra's algorithm is executed to compute the
2.4. Security and scalability in ICN shortest path considering the present network state. The average
processing time of the algorithm increases as the network size
Although the LIPSIN forwarding mechanism offers highly grows which may introduce some delay. In the case of caching all
efficient forwarding, it has several security issues that can be the network topology paths, in the setup phase, the topology
exploited to attack the network (Rothenberg et al., 2009). manager computes paths between each node pair in the network
An attacker can inject arbitrary traffic to the network by using a and caches them into a fast memory. Once the topology manager
previous valid FID that is created for another traffic request. This gets a request to find a path, the pre-calculated route is used and
attack is referred as a FID replay attack. Another threat could be to then the path LIDs are encoded to create the FID. In the case of
inject traffic by using brute-force attacks. In this attack, a malicious multicast requests, where there is one publisher and several
node tries all possible FIDs to cause some false positive over the subscribers, the topology manager will combine all paths into
links. Furthermore, a computational attack is still possible by one path by ORing all relevant LIDs. In the case where highly
collecting and analyzing many valid FIDs in order to infer some optimized multicast trees are required the topology manager
parts of the network topology and then to build a valid FID could use an improved tree routing algorithm, but a significantly
without the topology manager providing it. higher computation cost.
There are also fundamental scalability challenges in the topol-
ogy management and forwarding mechanisms as described above. 3.2. Securing the PURSUIT forwarding
Since Bloom filters have an inherent false positive rate that
depends on the number of links stored in the filter, the overhead The threats described in Section 2.4 are made possible because
of LIPSIN increases as the multicast trees become denser. As of the use of static forwarding LIDs which allow reuse and
discussed in Jokela et al. (2009) this problem can be mitigated inference of FIDs. To secure the LIPSIN forwarding mechanism
by including some state back into the network, and including against these threats we make use of, and extend, a technique
information on this state in the FID through the use of virtual link proposed by Rothenberg et al. called the zFormation technique
IDs. A more fundamental scalability challenge is related to the (Rothenberg et al., 2009), which is summarized in Fig. 3. The
topology management, as the churn of publishers and subscribers zFormation changes the LIDs within a certain timeframe which
creates a need to rapidly update the delivery trees and continu- leads to dynamic and expiring FIDs. To distinguish between the
ously recompute their respective FIDs. One of our key objectives in original LIDs, described previously, we here refer to the new
this paper is to demonstrate that this scalability challenge can identifier as a link ID tag (LIT). The zFormation technique uses a
be overcome by sufficient computational resources at a scale cryptographic function Z that accepts 4 parameters as input and
similar to the resources already available in present-day operator outputs a new LIT. The input parameters to Z are (1) an in-packet
networks. flow ID i (e.g. the RID), (2) a periodically changing time-based

Fig. 3. The creation of a zFormation for mitigating against malicious traffic injection in LIPSIN forwarding.

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i
B.A. Alzahrani et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 5

secret key Ki(t), (3) the incoming and outgoing port numbers I and improbable to send a packet to an arbitrary end-node by guessing
O, and (4) the optimization index d of the LIT. The index d was a FID. The probability p of guessing a correct FID for a Bloom filter
proposed by Jokela et al. (2009) to reduce the false positive rate constructed with k-hash functions for each LID, maximum fill-
inherited with the use of the Bloom filter. In this proposal d factor ρ and with a path length of h is given by (Rothenberg et al.,
different LITs are assigned to each unidirectional link which 2009; Alzahrani et al., 2013)
subsequently allows the creation of d different candidate FIDs by
p ¼ ρkh ð1Þ
the topology manager. Then the FIDs are evaluated and the one
with lowest false positive rate is selected and its index d is placed In practice we find that a maximum fill factor of 0.41 with k ¼5
in the packet header. To create a dynamic FID, a topology manager hash functions gives a sufficiently low probability of sending a
applies the function LIT ¼ Zði; K i ðtÞ; I; O; dÞ to create all the LITs of randomly generated FID of p ¼ 2:09  10  10 . If a sender could
the delivery tree and then constructs the FID by simply ORing all transmit 106 packets per second this means that if we set
the computed LITs as for the standard Bloom filter. The function Z Δt ¼ 40 min there is less than a 50% chance that a packet would
can be implemented using a stream cipher that uses Ki(t) as a reach an arbitrary user within this time window. This is consider-
time-bound key (Rothenberg et al., 2009). ably better than is possible with the current IP architecture which
For the topology manager to create a FID, it is required to share allows arbitrary users to send packets to end nodes to cause denial
the time-bound shared session key Ki(t). In every Δt, which is the of service. For the remainder of this paper we will assume
time that a LIT is valid, synchronized shared session keys are Δt ¼ 40 min, however, the work holds generally true for other
generated. As a consequence, the result of computing an LIT is values with suitable adjustment. This value is chosen as it gives
different. After creating the FID in the topology manager, the FID, reasonable security against packet injection with relatively relaxed
the flow ID i and the index d are sent to the publisher which then timing constraints as will be explored further in Section 5.
uses this as the packet header for forwarding data. Upon receiving In this section we propose an alternative method of updating
this packet at a forwarding node, the node extracts the FID, flow ID FIDs compared to that introduced in Rothenberg et al. (2009). For
i and d then it computes the dynamic LIT of its outgoing interface the mechanism of updating FIDs, consider that LITs start to be
using the function Z. Then it tests the LIT with the FID for the valid at t and that they are valid for Δt. Define the time of updates
packet to be forwarded. However, as the LITs are changed every to LIDs as occurring in a time window of t þ umin and t þ umax ,
Δt, the in-sessions' FIDs need to be updated according to the new where t þ umin is the time where the FID update period begins
values of LITs. Therefore, for long-lived flows, that last longer than whereas t þ umax is the time where the period ends. We require
Δt, the sessions need a new FID in order to extend the commu- umax o Δt to allow orderly updating without risking that LITs will
nication before the old FID expires. In Alzahrani et al. (2012), we not be updated in time. The valid LITs during period t; …; t þ Δt
have proposed a solution that provides this updated FID by adding will be overlapped with a new LIT being created at t þ Δc where
a new entity called a FID updater that maintains all the necessary Δc ¼ Δt=2, thus two sets of LIDs will be valid at any given time.
information for creating an updated FID. This method relies upon The Rendezvous keeps states of all events that have a match
the Rendezvous to extend the FID as it has knowledge of active between publisher and subscriber. Therefore we give the respon-
publishers and subscribers for each information item. The Ren- sibility of extending FIDs to the Rendezvous. If a subscriber is no
dezvous achieves this by sending an extension request to the FID longer interested in any subscribed item, then the subscriber
updater. Furthermore, in Alzahrani et al. (2013), we demonstrated unsubscribes by sending a request to the Rendezvous to be deleted
that the zFormation forwarding mechanism is still vulnerable to from the list, otherwise the Rendezvous assumes that the sub-
successful brute-force attacks if either the fill factor of the FID is scriber is still walling to receive this item. At time t þumin the
too large or Δt is too long. The fill factor specifies the most 1s that Rendezvous checks for matching event entries for all events that
can be set in a FID, as for example if a publisher was to set all the exist from the previous update slot and have not been updated
FID to 1s (highest fill factor) the packet would match every before. Then it sends them gradually to the topology management
possible LIT and thus would be sent to every end node. This could function until time t þ umax . This time window is designed to
be used by an attacker to launch a denial of service attack which is reduce the workload on updating in any given time slot Δt. Here
highly undesirable and even more dangerous if one considers that the Rendezvous needs to have an approximately synchronized
with one packet it would be duplicated to all end-nodes. This can clock with the topology managers, with a tolerance such that there
be virtually eliminated by only allowing a maximum fill-factor can be no risk that FID updates are not completed before the LIDs
such that it is not possible to send a packet to every node and it is become invalid. This update mechanism is shown in Fig. 4 where

umin umax 40+umin 40+umax


Y-Axis LIT change 1 FID update LIT change 2 FID update LIT change 3
LITx1

LITx2

LITx3

Update Update
FID1 FID2 Safety FID2 FID3 Safety
Information Item 1 margin margin
Information item 2
Information item 3

4 14 18 20 25 30 34 38 40 50 58 60 64 t (min)

Fig. 4. The proposed method for updating the FID to mitigate against malicious traffic injection. The figure shows how three information items, with different dissemination
durations, need different forwarding identifier updates due to the periodic changes in link identifiers.

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i
6 B.A. Alzahrani et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Δt ¼ 40 min and three different information item lifetimes are restricting the model to these application types is not likely to
considered. Information item 1 has a lifetime less than the FID and significantly affect the outcome of this analysis. In particular the
thus does not need to have an updated FID. Information item 3 has HTTP traffic encompasses a wider range of applications than just
a lifetime greater than the FID and thus will require updated FIDs. simply “web browsing.”
Additionally Item 3 becomes published after the first LID change The specific goal of this analysis is to estimate the number of
and thus uses the first updated FID. FIDs required per second so that we can estimate the topology
In practice the topology management function cannot know management workload and computation requirements needed to
the lifetime of an information item, as with Information item 2 in handle all FID requests. Additionally, we analyze the flow duration
the example. At time ΔC ¼ Δt=2 ¼ 20 min in the example Infor- with regard to the requirement to update FIDs for flows that have
mation item 2 is still required to be published and thus the duration that span a FID update every Δt. The methodology
topology management prepares to update the new FID. It does extrapolates from a flow-based model of Internet traffic to
not make sense to immediately update the FID for all such independently analyze the four types of traffic. Then we compose
information items as it might be the case that it finishes before all these types of traffic that form large proportion of today's
it is actually required and updating all FIDs immediately requires a Internet traffic as a total number of requests. The explanation of
high-data rate. Consequently, the FIDs are updated over a suitable how the estimates were obtained is given below.
window. In this case we propose updating over a window of 8 min
starting 10 min after the FID change and completing 2 min before 4.1. Voice model
the end of the valid period of the FID. This window allows for a
loose tolerance on the synchronization of the clocks in the Assumption: For the voice traffic we assume the model of
topology managers and the forwarding nodes. These values are Kaufman (1981). This model assumes a transmission link of fixed
not highly sensitive and are design parameters to be selected by bandwidth capacity C. Each voice call has a bandwidth require-
the network designer depending upon timing tolerances. The ment b. The capacity C and the requirement b are measured in
values given here are indicative of possible choices. bandwidth units (b.u.). For example 1 b.u. could be 24 kb/s if we
assume a G.729 coding for the voice channel. The arrival rate of the
voice calls is denoted by λ and follows a Poisson distribution.
4. Traffic modelling The call service rate is exponentially distributed and denoted by μ.
The system state j ðj ¼ 0; …; CÞ is defined as the total number of
An important aspect with regard to mediation models used in occupied b.u. in the system.
ICN is the scalability of the solution. We know from the existing IP Below we calculate the voice traffic utilization, U, with the aim
DNS model that it is possible to create and maintain a global of finding the inter-arrival time that gives the number of required
database used for the so-called “slow-path”, however, with the ICN FIDs per second. In Kaufman (1981) the author derives the
approaches this generally requires much finer granularity hence following recurrent formula for the calculation of the state
increased resource requirements. Here we are concerned with the probabilities:
scalability of topology management, others have considered the
qðjÞ ¼ b  1=j  λ=μ  qðj bÞ ð2Þ
scalability of matching function such as performed by the Ren-
dezvous in the PURSUIT model. There are two key aspects for with qðxÞ ¼ 0 for x o0 and ∑Cj¼ 1 qðjÞ ¼ 1.
calculating the scalability of the topology management solution: Using the state probabilities it is possible to calculate the link
the first is the number of topology management FID requests utilization as (Kaufman, 1981)
which is related to the number of new publish/subscribe events; C
the second is the number of publish/subscribe matches that exist U ¼ ∑ jqðjÞ ð3Þ
j¼1
beyond the time limit for FID expiry as these will need to have FIDs
that are refreshed. Thus, we need to consider two aspects of the The number of FID requests for voice traffic according to the
traffic model: how many new requests are made each second and assumption used in Gabale et al. (2010), with a village population
how long each item may need to remain available. Ideally, this around 1000 there will be on average 10 simultaneous voice calls.
should be estimated from an actual ICN scenario using ICN events, Therefore for the UK, we will need approximately 15 GB/s (i.e.
but as this ICN is far from deployment there are no existing 660,000 calls – 24,000 b/s,) to allocate all voice calls assuming 50%
statistical models of ICN traffic. Consequently, we estimate ICN utilization with zero blocking probability and G.729 compression
traffic from existing IP data as, while the ICN architecture is quite coding algorithm is used. Therefore, the total capacity for this
different from IP, we may expect that at least the existing services voice network is 30 GB/s. Also according to typical values reported
will be required from an ICN architecture. in Birke et al. (2007), Ramachandran and Beeram (2009) the mean
To estimate the required resources we investigate the future call duration, 1=μ, is 2 min, and with voice compression coding
growth of Internet traffic and estimate the expected number of flow algorithm G.729 the total bandwidth required to transmit each
requests per second. While we do not expect an ICN to follow a voice over an IP based network would be 24 kb/s. While other
traditional flow based delivery model we find in current TCP/IP, it is a codecs could be used, this gives a bandwidth mid-way between
reasonable assumption that there will be a similar number of user the older G.711 codec (likely to be less used by 2016) and other
instigated events that can be represented by a TCP/IP session start or more recent low-bandwidth codecs. If we apply the parameters,
by an ICN publisher/subscriber/information item matching event. mean call duration of 120 s, 30 GB capacity and mean call
Consequently the methodology used in the paper is to base ICN event bandwidth 24 kb/s, to the theoretical model described above, the
workload on Internet traffic scaled to a near future demand. Here we mean time between calls is 0.0002 s. This means the topology
use a model based on a carrier in a country the size and demo- management needs to create 5000 new FIDs each second to route
graphics of the UK with traffic models from known voice, video, each voice call.
HTTP and peer-to-peer application statistics. Specifically the model is Voice traffic growth: The previously calculated number of FIDs
based upon the UK population with 52 million Internet users and the for voice traffic is based on 2010 data. It is estimated that voice
carrier serving 30% of this population. These types of traffic have traffic has increased by an annual growth rate of 13% from the year
been considered as they form the vast majority of today's Internet 2010 to 2012 (Cisco, 2009), and there will be a further expected
traffic. Thus, although other applications are available in the Internet, increase of 3% from 2012 to 2016 (Cisco, 2011). This growth allows

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i
B.A. Alzahrani et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 7

an estimation of 7185 FIDs/s required for voice communication Table 1


in 2016. The number of FID/s (forwarding identifiers per second) that have to be created by
the topology manager in order to deliver information items from publishers to
subscribers. The number of updates/s represents the number of updated FIDs that
4.2. Video model need to be sent due to the security mechanism that requires periodic FID changes
every Δt.
Recent studies have indicated that video streaming is respon-
Flow type Number New FIDs Required Peak % of
sible for 25–40% of all Internet traffic and will reach 62% by the
of new FID/s every Δt updates each Δt updates/s update
end of 2015 (not including P2P video file sharing) (Cisco, 2011).
YouTube is the most popular Internet video service (Gehlen et al., Voice 7185 8.62  106 329,725 686 3.8
2012); consequently, YouTube is a good example of Internet video Video 87,507 1.05  108 561,769 1170 0.53
traffic to be applied to an ICN architecture, especially for the P2P 57,842 6.94  107 6.4  107 134,161 92.2
Web 1.05  106 1.27  109 E0 E0 E0
topology manager.
All flows 1.20  106 1.45  109 6.49  107 136,017 4.5
The number of FID requests for video traffic (YouTube): In order to
know how many FIDs are needed to handle YouTube traffic by the
topology management it is important to know how many
requests/s are sent to the topology management. The study by The results show that peer-to-peer flows have the highest
Zinka et al. (2009) measured YouTube traffic in a university number of updating requests with update percentage of more
campus of 25,000 students and found 4000 requests sent per than 92% of sessions requiring updates. This is due to the long
hour. So we can estimate that for the UK, with 52 million Internet mean flow duration of peer-to-peer traffic. Web flows have the
users, the number of FIDs to be created by the topology manage- highest number of FIDs in a second with more than one million
ment to handle all user requests is approximately 2000 FIDs/s. requests each second requiring a new FID, but almost no updates
Video traffic growth: The YouTube data used above was from are required. This is because web traffic is characterized with a
2009. Video traffic has generally increased by an annual growth very short session periods. In the same table we show the total
rate of 62% from 2009 to 2012, a further increase of 34% is percentage of all FIDs of all types of flows that are required for
expected from the year 2012 to 2016 (Cisco, 2011). Therefore the update with all FIDs originally created by the topology manage-
total number of FIDs requests of YouTube traffic by 2016 would be ment. This percentage is approximately 4.5% from the total flows'
87,507 FIDs/s. requests, almost all of which is accounted for to P2P traffic. This is
small and indicates that most sessions will be terminated before
4.3. HTTP and peer-to-peer traffic any update needs to take place. The total peak load on the
topology management function, including the new updated FIDs,
Web traffic (HTTP) has been found to follow an exponential is approximately 1:3  106 FID=s.
distribution (Mori et al., 2005; Bhole and Popescu, 2005). Study of With the knowledge of the peak FID calculation rate we can
Internet traffic at the University of Calgary (33,000 users) found estimate the number of topology management entities required
that web traffic arrival rate was 80 flows per second (Basher et al., for handling all types of traffic requests which have been calcu-
2008). Scaling this to the UK gives approximately 126  103 FIDs/s lated i.e. 1:3  106 requests per second. Using the Blackadder ICN
for web traffic. Web traffic growth: This number of web requests is platform that is publicly available under GNU GPL2 license (Parisis
built on a study done on 2008. Using an annual growth rate of 26% and Trossen, 2013), we experimentally measured the average time
from the year 2008 to 2012 (Cisco, 2009) and an expected growth taken by a single topology manager entity to create a unicast FID.
rate of 35% from the year 2012 to 2016 (Cisco, 2011) gives The Blackadder was running on a machine with a processor
1,054,841 FID/s by 2016 for web traffic. capability of an Intel Core2 Quad CPU Q6600 @ 2.40 GHz 4 and
In the same study of Basher et al. (2008), it was found that RAM capacity was 4 GB. Only one core was used to carry out the
peer-to-peer traffic has arrival rate of only 6 flows/s and from this processing. In this experiment we use a large real network
we estimate that there will be around 9400 FID/s for the UK based topology, the KDL network with 754 nodes and 899 edges as
on 2008 figures. P 2P traffic growth: Using an annual growth rate of reported by the Internet Topology Zoo (Knight et al., 2011). By
25% from the year 2008 to 2012 (Cisco, 2009) and 26% from the running the experiment many times with different random pub-
year 2012 to 2016 (Cisco, 2011) the total number of peer-to-peer lishers and subscribers we found that the average time is 0.26 ms/
traffic requests is expected to be 57,842 FID/s by the year 2016. FID. This time is for processing one request received from the
Rendezvous and includes finding the shortest path and creating a
FID. For efficiency, once a path between a publisher and a
5. Results and discussion subscriber was found it was cached for later use.
Using the average FID calculation rate for single entity, the total
Using the estimates of traffic models it is possible to investigate number of topology manager entities to handle all traffic requests
the number of FID updates needed for each type of traffic and also in a particular intra-domain can be calculated. Here we assume a
compare the total number of FIDs created by the topology traffic for the UK following the traffic models in Section 4. With
management for all types of traffic to the total number of FIDs 1:3  106 requests per second and 0.012 ms/FID we can estimate
required to be updated. This can be done once the mean flow that there is a need for approximately 16 topology manager cores,
duration of each traffic type and the time between each updated assuming similar CPU capabilities. By 2016 standards this is likely
Δt are known. Following measurements reported in the literature, to be a modest capability. However, for a single carrier this is likely
we use mean durations of voice, YouTube, web and P2P flows as 2, to be much smaller, for example in the UK, the largest carrier has
3, 0.033 and 266 min respectively (Birke et al., 2007; Zinka et al., approximately a 30% share of the total number of UK Internet
2009; Brownlee and Claffy, 2002; Steiner et al., 2009). Then users. Therefore, 5 topology managers are needed for a carrier of
following the traffic models described in Section 4 the number this size. However, the experiment of calculating the topology
of FID required can be determined for each traffic type giving the management capability has been carried out on a normal char-
results shown in Table 1. The results show the number of new FIDs acteristic machine to estimate future traffic in 2016. This is a fairly
required for each new publisher/subscriber event and the number modest CPU requirement and significantly less computational
due to updates for long-lived flows. power than in current IP routing platforms that would be required

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i
8 B.A. Alzahrani et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

at every forwarder. It should be remembered that the forwarding Bhole Y, Popescu A. Measurement and analysis of http traffic. J Netw Syst Manag
complexity of LIPSIN forwarding is considerably less than an IP or 2005;13(4):357–71.
Birke R, Mellia M, Petracca M, Rossi D. Understanding VoIP from backbone
even an IP/MPLS router. For example a network node using LIPSIN measurements. In: INFOCOM. 26th IEEE international conference on computer
forwarding with a node degree of 20 only needs 20  d LIT entries communications. Anchorage, USA: IEEE; 2007. p. 2027–35.
(with typically d ¼8). This is favorable compared to an IP routing Bloom BH. Space/time trade-offs in hash coding with allowable errors. Commun
table that is approximately equal to the number of intra-domain ACM 1970;13:422–6.
Brownlee N, Claffy K. Understanding internet traffic streams: dragonflies and
links (typically hundreds in a large operator) if using IP/MPLS or tortoises. IEEE Commun Mag 2002;40(10):110–7.
equal to the number of BGP routing prefixes in a pure IP Carrea L, Vernitski A, Reed M. Optimized hash for network path encoding with
implementation. Consequently, the overall routing/forwarding minimized false positives. Comput Netw 2014;58(January):180–91.
Cisco. Cisco visual networking index: forecast and methodology. Technical report;
complexity can be said to be considerably reduced compared to
2009.
an existing IP infrastructure. Cisco. Cisco visual networking index: forecast and methodology. Technical report;
With the zFormation scheme, and its extension of FID update 2011.
that we propose, the forwarding plane can resist an attacker's Fotiou N, Nikander P, Trossen D, Polyzos GC. Developing information networking
further: from psirp to pursuit. In: Broadband communications, networks, and
malicious packet injections, such as a denial of service attack. This systems. Athens, Greece: Springer; 2012. p. 1–13.
is because the LITs become dynamic and changeable within certain Gabale V, Raman B, Chebrolu K, Kulkarni P. LIT MAC: addressing the challenges of
periods thus any valid FID gets expired in due course. Conse- effective voice communication in a low cost, low power wireless mesh
network. In: Proceedings of the first ACM symposium on computing for
quently, determining a valid FID becomes difficult and an attacker
development. London, UK: ACM; 2010. p. 1–11.
that manages to determine a valid FID cannot use it to inject Gehlen V, Finamore A, Mellia M, Munaf M. Uncovering the big players of the web.
unwanted traffic indefinitely. Additionally, by including the incom- In: Traffic monitoring and analysis. Lecture notes in computer science, vol.
ing and outgoing interfaces in the FID construction process, an FID 7189. Berlin, Heidelberg: Springer; 2012. p. 15–28.
Handley M. Why the internet only just works. BT Technol J 2006;24(3):119–29.
is tightly bound to a specific path and originator, so the attacker Jacobson V, Smetters DK, Thornton JD, Plass MF, Briggs NH, Braynard RL. Network-
cannot reach another subscriber using an existing FID. ing named content. In: Proceedings of the fifth international conference on
emerging networking experiments and technologies. Room, Italy: ACM; 2009.
p. 1–12.
6. Conclusion Jokela P, Zahemszky A, Esteve Rothenberg C, Arianfar S, Nikander P. LIPSIN: line
speed publish/subscribe inter-networking. ACM SIGCOMM Comput Commun
Rev 2009;39(4):195–206.
There are a number of ICN architectures that are being Katsaros KV, Fotiou N, Vasilakos X, Ververidis CN, Tsilopoulos C, Xylomenos G, et al.
proposed with differing models for route determination. The On inter-domain name resolution for information-centric networks. In: NET-
scalability of these solutions needs careful inspection if we are to WORKING 2012. Lecture notes in computer science, vol. 7289. Berlin, Heidel-
berg: Springer; 2012. p. 13–26.
move from early prototypes towards proposing them as models for Kaufman J. Blocking in a shared resource environment. IEEE Trans Commun
a future network operating at the scale of the Internet. Here we 1981;29(10):1474–81.
have considered an ICN architecture following the architecture Knight S, Nguyen H, Falkner N, Bowden R, Roughan M. The internet topology zoo.
IEEE J Sel Areas Commun 2011;29(9):1765–75.
proposed by the PSIRP/PURSUIT which has a central network
Koponen T, Chawla M, Chun B-G, Ermolinskiy A, Kim KH, Shenker S, et al. A data-
function called topology management. We have extended this oriented (and beyond) network architecture. ACM SIGCOMM Comput Commun
function to provide mitigation of injected malicious traffic by using Rev 2007;37(4):181–92.
periodically updated forwarding identifiers. This gives protection Mori T, Uchida M, Goto S. Flow analysis of internet traffic: world wide web versus
peer-to-peer. Syst Comput Jpn 2005;36(11):70–81.
from attacks such as denial of service. Parisis G, Trossen D. Blackadder node implementation, v0.4. Source code published
Although the topology management function is termed a through GitHub. URL 〈https://github.com/fp7-pursuit/blackadder〉, 2013
central function we have shown that it can be distributed among [accessed 8.03.2014].
an arbitrary number of topology manager instances. By analyzing Ramachandran K, Beeram S. Supporting enterprise-grade audio conferencing on
the internet. In: Passive and active network measurement. Berlin, Germany:
traffic models of the current Internet and extrapolating them to Springer; 2009. p. 143–52.
levels in 2016 and applying them to the proposed architecture we Rothenberg CE, Jokela P, Nikander P, Sarela M, Ylitalo J. Self-routing denial-of-
have shown that the processing requirements are less than today's service resistant capabilities using in-packet bloom filters. In: European
conference on computer network defense (EC2ND); 2009. p. 46–51.
IP routing infrastructure. Thus we propose that this architecture is
Spagna S, Liebsch M, Baldessari R, Niccolini S, Schmid S, Garroppo R, et al. Design
a viable proposition for a future ICN based network operating at principles of an operator-owned highly distributed content delivery network.
the scale of a major Internet network provider. IEEE Commun Mag 2013;51(4):132–40.
Steiner M, En-Najjary T, Biersack EW. Long term study of peer behavior in the KAD
DHT. IEEE/ACM Trans Netw 2009;17(5):1371–84.
References Tagger B, Trossen D, Kostopoulos A, Porter S, Parisis G. Realising an application
environment for information-centric networking. Comput Netw 2013;57
Ahlgren B, Dannewitz C, Imbrenda C, Kutscher D, Ohlman B. A survey of (16):3249–66.
information-centric networking. IEEE Commun Mag 2012;50(7):26–36. Trossen D, Parisis G. Designing and realizing an information-centric internet. IEEE
Alzahrani B, Vassilakis V, Reed M. Mitigating brute-force attacks on bloom-filter Commun Mag 2012;50(7):60–7.
based forwarding. In: Conference on future internet communications (CFIC); Trossen D, Sarela M, Sollins K. Arguments for an information-centric internetwork-
2013. p. 1–7. ing architecture. SIGCOMM Comput Commun Rev 2010;40(April (2)):26–33.
Alzahrani BA, Reed MJ, Vassilakis VG. Enabling z-filter updates for self-routing Vasilakos X, Katsaros K, Xylomenos, G. Cloud computing for global name-resolution
denial-of-service resistant capabilities. In: The Fourth Computer science and in information-centric networks. In: The second symposium on network cloud
electronic engineering conference (CEEC). Colchester, UK: IEEE; 2012. p. 100–5. computing and applications (NCCA); 2012. p. 88–94.
Basher N, Mahanti A, Mahanti A, Williamson C, Arlitt M. A comparative analysis of Zinka M, Suhb K, Gua Y, Kurosea J. Characteristics of youtube network traffic at a
web and peer-to-peer traffic. In: Proceedings of the 17th international con- campus network—measurements, models, and implications. Comput Netw
ference on world wide web. WWW '08. Beijing, China: ACM; 2008. p. 287–96. 2009;53(4):501–14.

Please cite this article as: Alzahrani BA, et al. Scalability of information centric networking using mediated topology management.
Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.002i

You might also like