International Journal of Advances in Applied
Science and Engineering (IJAEAS)
ISSN (P): 2348-1811; ISSN (E): 2348-182X
Vol. 1, Issue 2, Apr 2014, 21-26
© IIST
ENERGY-EFFICIENT TRAFFIC MERGING FOR DATACENTER
NETWORKS
1
RUSHIKESH MADHUKAR BAGE, 2GAURAV DIGAMBAR DOIPHODE
1,2
Graduate Student, Department Of Computer Science, Mumbai, Maharashtra, India
1
Email Id: bagerushi@gmail.com
Abstract :-Number of studies that have done shows that datacenter networks typically see loads of between 5% – 25% but
the energy draw of these networks is equal to operating them at maximum load means energy consumption is more. In this
paper, we propose a novel way to make these networks more energy proportional – that is, the energy draw scales with the
network load. We propose the idea of traffic aggregation, in which low traffic from N links is combined together to create H
< N streams of high traffic. These streams are fed to H switch interfaces which run at maximum rate while the remaining
interfaces are switched to the lowest possible one. We show that this merging can be accomplished with minimal latency and
energy costs (less than 0.1W) while simultaneously allowing us a deterministic way of switching link rates between maximum
and minimum. Using simulations based on previously developed traffic models, we show that 49% energy savings are
obtained for 5% of the load while we get an energy savings of 22% for a 50% load. Hence, forasmuch as the packet losses
are statistically insignificant, the results show that energy-proportional datacenter networks are indeed possible.
Keywords: perceptual, network, power, data mining, data center, energy.
scheme suffers from the problem of packet losses due
to inaccurate traffic prediction as well as significantly
increased latency. Indeed, the mean increment in
latency is between 30−70μs for different loading
scenarios.
I. INTRODUCTION
The electricity consumption of datacenters has
become a significant factor in the overall cost of these
centers. As a result, there have been several recent
studies that aim to reduce the energy consumption
profile of servers and datacenter networks. Since the
cooling costs scale as 1.3x of the total energy
consumption of the datacenter hardware, reducing the
energy consumption of the hardware will
simultaneously lead to a linear decrease in cooling
costs, as well. Today, the servers account for around
90% of the total energy costs, regardless of loading.
However, since typical CPU utilization of server
clusters is around 10 − 50% , there are several efforts
to scale the energy consumption of the servers with
load. Indeed, it is expected that in the near future
sophisticated algorithms will enable us to scale the
energy consumption of the servers with load. When
this happens, as noted in,the energy cost of the
network will become a dominant factor. Hence, there
is significant interest in reducing the energy
consumption of the datacenter networks, as well.
Previous studies attempt to reduce network-wide
energy consumption by dynamically adapting the rate
and the speed of links, routers and switches as well as
by selecting routes in a way that reduces the total cost
, In this respect, these green networking approaches
have been based on numerous energy-related criteria
applied to network equipment and com- ponent
interfaces. These ideas tackle the minimization of the
network power consumption by setting the link
capacity to the actual traffic load.
In this paper, we present an innovative approach to
adapt the energy consumption to load for datacenter
networks. The key idea is to merge traffic from
multiple links prior to feeding it to the switch. This
simple strategy allows more switch interfaces to
remain in a low power mode1 while having a
minimal impact on latency. We have explored the
idea of traffic merging in depth in the context of
enterprise networks in ; where we show that savings
in excess of 60−70% are obtained with no affect on
traffic. Indeed, the big advantage of the merge
network is that, unlike most other approaches, it
works in the analog domain, so it does not introduce
delays for the store-and-forward Layer 2 (L2) frames,
rather it redirects such frames on-the-fly at Layer 1
(L1) between external and internal links of the merge
Abts et al., and Benson et al. note that the average
traffic per link in different datacenter networks tends
to range between 5% and 25%. The authors in
implement a link rate adaptation scheme to save
energy. Each link sets its rate every 10−100μs based
on the traffic prediction. The energy savings are
shown to be in the range 30 − 45% for different
workloads and loads less than 25%. However, the
International Journal of Advances in Engineering and Applied Science (IJAEAS) Vol-1 Iss-2, 2014
21
International Journal of Advances in Applied
Science and Engineering (IJAEAS)
ISSN (P): 2348-1811; ISSN (E): 2348-182X
Vol. 1, Issue 2, Apr 2014, 21-26
© IIST
network itself. In addition, the merge network allows
reducing frequent link speed transitions due to the use
of the low power mode. In our approach, such
transitions happen only infrequently thus allowing us
to minimize the delay due to the negotiation of the
new link rate and the additional energy required for
the rate transition.
cost than a Clos on load balanced traffic, and provide
better performance than a conventional butterfly.
A FBFLY topology is a multi-dimensional direct
network like a torus (k-ary n-cube) . Every switch in
the network is connected to hosts as well as other
switches. Unlike the torus, where each dimension is
connected as a ring, in the FBFLY each dimension is
fully connected. Hence, within a FBFLY dimension,
all switches connect to all others.
In this paper, we apply the merge network concept to
a datacenter network topology called Flattened
Butterfly [1] [8]. Using extensive simulations we
show that up to 22% − 49% energy savings are
possible for loads between 50% and 5%, respectively.
The choice to use the FBFLY topology is motivated
by the fact that it is the best type of datacenter
topology in terms of the energy consumption and we
use it in our simulations. However, it is possible to
consider other types of datacenter topologies, such as
hypercube, torus, folded-Clos, etc. .
An example of interconnection is shown in Fig. 1. It
is a 2-dimensional FBFLY (8-ary 2-flat) with 8 × 8 =
64 nodes2 and eight 7 + 8 = 15-port switches (7 ports
connected with the other switches and 8 ones
connected with the nodes). The concentration c refers
to the number of switch ports connected with the
nodes.
Scaling the number of dimensions in a FBFLY
consists, essentially, of taking this single 8-switch
group, replicating it 8 times, and interconnecting each
switch with its peer in the other 7 groups (i.e. the
upper-left switch connects to the 7 upper-left
switches in the other 7 groups). In this way, you have
an 8-ary 3-flat with 82 = 64 switches and 8 × 82 =
512 nodes each with 8 + 7 × 2 = 22 ports. So, the
total number of switches for the FBFLY is given by:
The rest of the paper is organized as follows. The
next section describes the flattened butterfly network
and in Section III we describe the traffic aggregation
introducing the merge network . In Section IV we
explain how the merge network is combined with the
flattened butterfly topology. The subsequent section
discusses our simulation methodology and results.
Our conclusions are presented in Section VI.
S(n, k) = k n−1
II. FLATTENED BUTTERFLY T OPOLOGY
(1)
Instead, the number of ports for switch can be written
as3 :
As outlined in [1], the flattened butterfly (FBFLY)
topology is inherently more power efficient than
other commonly pro- posed topologies for highperformance datacenter networks. It is proposed as a
cornerstone for energy-proportional commu- nication
in large-scale clusters with 10,000 servers or more. In,
the authors show why the topology, by itself, is lower
in power than a comparable folded-Clos one (i.e. fat
trees) .
P (n, k) = c + (k − 1) × (n − 1)
(2)
and the total number of nodes can be expressed as
follows:
N (n, k) = c × k n−1
(3)
Though a FBFLY scales exponentially with the
number of dimensions, it is possible to scale by
increasing the radix, too. Usually, it is advantageous
to build the highest-radix, lowest dimension FBFLY
that scales high enough and does not exceed the
number of available switch ports. This reduces the
number of hops a packet takes as well the number of
links and switches in the system.
The FBFLY k-ary n-flat topology exploits high port
count switches to create a scalable low-diameter
network, in order to reduce latency and network
costs . This is achieved using fewer physical links
compared to a folded- Clos at the expense of
increased routing complexity to load balance the
available links. Indeed, unlike the simple routing in
folded-Clos, the FBFLY topology requires a globaladaptive routing to load balance arbitrary traffic
patterns. However, a folded-Clos has a cost that is
nearly double that of a FBFLY with equal capacity.
The reason for this higher cost is due to the presence
of a double number of long cables in the network,
which approximately doubles the cost. The FBFLY
topology exploits high-radix switches to realize lower
International Journal of Advances in Engineering and Applied Science (IJAEAS) Vol-1 Iss-2, 2014
22
International Journal of Advances in Applied
Science and Engineering (IJAEAS)
ISSN (P): 2348-1811; ISSN (E): 2348-182X
Vol. 1, Issue 2, Apr 2014, 21-26
© IIST
Fig. 1. Logical diagram of an 8-ary 2-flat flattened butterfly
topology.
packets arrive along both incoming links, then the
earlier arriving packet is sent out along the top
outgoing link and the latter packet along the other one.
The hardware implementation, described in, is done
entirely in the analog domain. Thus, a packet is not
received and transmitted in the digital sense; rather it
is switched along different selectors in the network
much as a train is switched on the railroad. This
ensures that the latency seen by a packet through the
merge is minimal and the energy consumption is very
small, as well.
III. T RAFFIC AGGREGATION
In this paper, we propose the merge network for
energyefficient datacenters. The merge network is based on
the assumption of a low utilization of most ports in a
switch. Hence, the approach is to merge traffic from
multiple links and feeding the merged streams to a
smaller set of active switch ports. As shown in Fig. 2,
the traffic to/from N links are merged and fed to N
interfaces. Setting the parameter H, according to the
incoming traffic load, it is possible to reduce the
number of active interfaces exactly to H.
An important measure of the merge network is the
complex- ity. We can define the complexity with two
numbers: the depth of the network and the total
number of selectors. The minimum depth of an N × K
merge network is log2 N + K − 1 with K the number
of selectors needed equal to ∑k i =1 N − i.
For example, if the average traffic load on 8 links
coming in to a switch (as in Fig. 1) is 10%, we could
merge all the traffic onto one link and feed it to one
switch port running at maximum rate, thus allowing
the remaining ports to enter low power mode. This
solution is different from the approach in,where each
link individually makes a decision of which rate to
use every 10 − 100 μs resulting in high latency and
losses. Indeed, our approach results in almost optimal
energy savings with minimal increase in latency
(primarily due to the merge network).
On the downlink (i.e. from the switch to the N links)
the merge network has to be able to forward up to N
packets simultaneously from any of the switch ports
(connected to the K outputs of an N × K merge
network) to any of the N downlinks. This network
uses a simple implementation that consists of
multiplexers. However, in order for this part to work
correctly, we need to embed the control logic inside
the switch because the packet header has to be parsed
to determine which of the N links they must be send
out.
However, before evaluating the impact of merging
on datacenter network traffic, we need to develop a
better under- standing of the merge network itself. A
generic N × K (with K ≤ N ) merge is defined with
the property that if at most K packets arrive on the N
uplinks (i.e. from N links into the switch) then the K
packets are sent on to K sequential ports (using some
arbitrary numbering system). For example, we
consider a 4 × 4 merge network as in Fig. 3. The
incoming links from the hosts are identified with a –
d and the switch ports with 1 – 4 . The traffic coming
in from these links is merged such that the traffic is
first sent to interface 1 but, if that is busy, it is sent to
interface 2, and so on. In other words, we load the
interfaces sequentially. This packing of packets
ensures that many of the higher numbered interfaces
will see no traffic at all, thus allowing them to go to
the lowest rate all the time.
Finally, the merge network requires a special
software layer called port virtualization. The modern
day switches support a large number of protocols. For
instance, switches today support QoS (Quality of
Service) and VLANs (IEEE 802.1P, 802.1H), portbased access control (IEEE 802.1X), and many other
protocols. Hence, the solution is to build this software
The key hardware component needed to implement
this type of network is called a selector, whose logical
operation is described in Fig. 4. There are 2 incoming
links and 2 outgoing links. If a packet arrives only at
one of the two incoming links, then it is always
forwarded to the top outgoing link. However, if
Fig. 2. Switch without (a) and with (b) the merge network.
International Journal of Advances in Engineering and Applied Science (IJAEAS) Vol-1 Iss-2, 2014
23
International Journal of Advances in Applied
Science and Engineering (IJAEAS)
ISSN (P): 2348-1811; ISSN (E): 2348-182X
Vol. 1, Issue 2, Apr 2014, 21-26
© IIST
then in the event of an increase in load (from a host)
the next link that will need to run at full rate will be
link i + 1. This determinism in link loading gives us
the key to maximizing energy savings. Specifically,
the algorithm we use for changing link rates at
switches is as
follows:
1) if interfaces 1 to H are active (at full rate), then we
increase the rate of the H + 1th one to the full rate, as
well. This is done to offset packet loss in the event of
a burst of packets;
Fig. 3. A 4 × 4 uplink merge network
layer within the switch. It maps packets coming on
the uplink to one of the N virtual ports. Instead, on
the downlink it schedules packets for transmission
over one of the physical ports to appropriate
downstream links. This mapping is needed to ensure
that security protocols like 802.1X and VLANs work
unchanged.
2) if at most H − 2 interfaces of the H ones operating
at the full rate are active, then we reduce the rate of
the Hth interface to the lowest rate (after it goes idle).
This simple algorithm does not require any traffic
prediction and ensures very low packet loss assuming
that the time to change link rates is 1 − 10 μs as in.
V. EVALUATION
IV.MERGENETWORK+FLATTENED
BUTTERFLY
In order to demonstrate the usefulness and the
effectiveness of the traffic aggregation inside a highperformance datacenter, we evaluate the merge
network using the OMNeT++ discrete- event-driven
network simulator. OMNeT++ is an open-source (and
free for research and educational purposes)
sophisticated
system
used
for
modeling
communication networks, queuing networks,
hardware architectures, and manufacturing and business processes .
We propose adding the merge network to the FBLY
datacenter network of in order to maximize energy
savings. The manner in which we introduce the
merge network into the FBLY is simple – we
interpose the merge network between connections
from the hosts to the switches. However, the connections between the switches are unchanged. In the
context of the example from Fig. 1, we introduce
eight 8 × 8 merge networks that are connected to the
eight switches. Thus, the eight hosts that connect to a
switch have their traffic routed through a merge
network.
We model an 8-ary 2-flat FBFLY (with a
concentration c = 8 and 64 nodes) with no oversubscription, so that every host can inject and receive
at full line rate. Links have a maximum bandwidth of
40 Gbps. Switches are both input and output buffered.
We model the merge traffic network and port
virtualization in software using parameters from our
prototype in. For our simulations we use 8 × 8 merge
networks.
In order to save energy using the merge network, we
need to run some number of switch interfaces at full
rate while dropping the rate of the rest to the lowest
possible. As noted in a 40 Gbps interface can operate
at 16 different rates with the lowest rate equal to 1.25
Gbps. The challenge is to run most of the links into
the switch at the lowest rate (which consumes less
than 40% of the power of the maximum rate
In order to model the traffic in the network, we rely
on several previous studies. The authors in examine
the
characteristics
of
the
packet-level
communications inside different real datacenters
including commercial cloud, private enterprise, and
university campus datacenters. They note that the
packet arrivals exhibit an ON/OFF pattern. The
distribution of the packet inter-arrival time fits the
Lognormal distribution during the OFF period.
However, during the ON period, the distribution
varies in different datacenters due to the various types
of running applications. For example, MapReduce
will
display
different
inter-switch
traffic
Fig. 4. Operation of a selector.
ink for InfiniBand switches), minimizing, at the same
time, loss and latency. Since the merge network has
the unique property of loading links sequentially, we
know that, if link i is the highest numbered active link,
International Journal of Advances in Engineering and Applied Science (IJAEAS) Vol-1 Iss-2, 2014
24
International Journal of Advances in Applied
Science and Engineering (IJAEAS)
ISSN (P): 2348-1811; ISSN (E): 2348-182X
Vol. 1, Issue 2, Apr 2014, 21-26
© IIST
characteristics than typical university datacenters.
Furthermore, traffic between nodes and switches
displays patterns quite different from the inter-switch
traffic. The different traffic patterns fit typically one
of Lognormal, Weibull and Exponential. We can
consider the exponential distribution as the most
restrictive one among the various identified
distributions and we use it to represent the general
distribution of the packet inter- arrival times. In order
to obtain a comprehensive view of the benefits and
challenges of using the merge network, we use
average of only 4 interfaces are active. We note that
the packet losses are very small (statistically
insignificant) and only occur during the time that an
interface is being woken up. Fig. 6 shows the
throughput of the switch in terms of processed
packets per second. As we can see, the throughput
scales with the load without influence of the merge
network.
Let us now consider the energy savings obtained by
using the merge network. As noted above, the
maximum latency introduced by the merge network is
3 μs, which is far below that one reported in. As
described in the energy consumption of the merge
network is derived by simply extrapolating the energy
cost of the selectors and multiplying that with the
number of selectors needed plus a 10% increment to
account for the cost of the control logic. Although,
the number of selectors necessary to build a merge
network grows linearly with increasing the number of
output and input ports, its energy cost is very low and
even for the largest merge
Fig. 5. Average number of active interfaces as function of
the load.
TABLE I
ENERGY
SAVINGS
USING
A MERGE
NETWORK .
network it is below 0.1W. Therefore, we can
effectively ignore the cost of the merge network in
the overall energy calculation.
Fig. 6. Throughput for switch as function of the load.
In order to compute the energy efficiency of our
scheme, we rely on the energy analysis of. As
described there, a 40 Gbps InfiniBand link can
operate at several lower rates as low as 1.25 Gbps.
This is accomplished by exploiting the underlying
hardware. Each link is composed of four lanes with
its own chipset for transmitting and receiving. The
chipset can be clocked at four different rates and thus
we have 16 different possible rates on any link. The
energy consumption of the lowest rate is 40% that of
the maximum. In our system, the links are either
operating at the maximum rate (the packets are being
forwarded by the merge network to those ones) or at
the minimum. Thus, we can very easily calculate the
energy savings relative to the baseline, which is the
case when all links operate at the maximum rate.
different average traffic loads on each link. The
values we use are: 5%, 10%, 20%, 30%, and 50% of
the maximum link capacity of 40Gbps. The duration
of each simulation is 24 hours. In addition, each run
is repeated 10 times and the average performance
values have been calculated and plotted.
The metrics of interest are: energy savings, packet
loss due to merging traffic and aggregate throughput
achieved. We note that the increased latency due to
the merge network is 3μs (this is based on the time
for the selectors in the merge network to sense the
presence of packets and appropriately configure the
network to switch the packet ).
A. RESULTS
Fig. 5 plots the average number of active interfaces
as a function of the average load. It is interesting to
note that, even for a load of 50%, we see that an
International Journal of Advances in Engineering and Applied Science (IJAEAS) Vol-1 Iss-2, 2014
25
International Journal of Advances in Applied
Science and Engineering (IJAEAS)
ISSN (P): 2348-1811; ISSN (E): 2348-182X
Vol. 1, Issue 2, Apr 2014, 21-26
© IIST
Using the data from Fig. 5, we obtain the energy
savings for different loading patterns. Recall that
when H interfaces are active, H + 1 interfaces are
running at maximum rate while the remaining N − H
− 1 are operating at the lowest rate. Table I provides
the average energy savings calculated as:
REFERENCES
[1]
D. Abts, M. Marty, P. Wells, P. Klausler, and H. Liu,
“Energy Propor- tional Datacenter Networks,” in
Proceedings of the 37th International Symposium on
Computer Architecture (ISCA). Saint Malo, France: ACM,
June 2010, pp. 338–347.
[2] T. Benson, A. Akella, and D. Maltz, “Network Traffic
Characteristics of Data Centers in the Wild,” in Proceedings
of the 10th Conference on Internet Measurement (IMC).
Melbourne, Australia: ACM, November 2010, pp. 267–280.
These energy savings are greater than those obtained
in with only a minimal latency cost.
VI. C ONCLUSIONS
[3] R. Bolla, R. Bruschi, F. Davoli, and F. Cucchietti, “Energy
Efficiency in the Future Internet: A Survey of Existing
Approaches and Trends in Energy-Aware Fixed etwork
Infrastructures,” IEEE CommunicationsSurveys & Tutorials
(COMST), vol. 13, no. 2, pp. 223–244, Second Quarter 2011.
This paper discusses the idea of merging traffic
inside an energy-efficient datacenter. We consider the
FBFLY topology because it is inherently more power
efficient than other com- monly proposed topologies
for
high-performance
datacenter
networks.
Simulations with different configurations of traffic
load are used to characterize and understand the
effectiveness of the traffic aggregation. The results of
these simulations show that it is possible to merge
traffic inside a datacenter network in order to obtain
22 − 49% energy savings. An important implication
of this work is that the datacenters can be made very
lean by using merge networks. Given the fact that the
size of large-scale clusters is of the order of 10.000
nodes or more, this degree of energy savings has
enormous global impact.
[4] L. Chiaraviglio, M. Mellia, and F. Neri, “Minimizing ISP
Network Energy Cost: Formulation and Solutions,”
IEEE/ACM Transactions on Networking, vol. PP, no. 99, pp.
1–14, 2011. [5] C. Yiu and S. Singh, “Energy-Conserving
Switch Architecture for LANs,” in Proceedings of the 47th
IEEE International Conference on Communications (ICC).
Kyoto, Japan: IEEE Press, June 2011, pp. 1–6.
[6] S. Singh and C. Yiu, “Putting the Cart Before the Horse:
Merging Traffic for Energy Conservation,” IEEE
Communications Magazine, vol. 49, no. 6, pp. 78–82, June
2011.
[7] C. Yiu and S. Singh, “Merging Traffic to Save Energy in the
Enterprise,” in Proceedings of the 2nd International
Conference on Energy-Efficient Computing and Networking
(e-Energy), New York, NY, USA, May–June 2011.
In addition, in our current work, we are using the
merge network architecture to replace high port
density switches with lower port density switches,
thus yielding even greater energy savings. Despite the
positive results concerning energy saving, the
proposed merge network solution is not proven to be
optimal but we are studying that problem as part of
future work. In addition, it would be interesting to
test the merge network in other datacenters than the
FBFLY and with real traffic traces.
[8] J. Kim, W. Dally, and D. Abts, “Flattened Butterfly: a CostEfficient Topology for High-Radix Networks,” in
Proceedings of the 34th Inter- national Symposium on
Computer Architecture (ISCA). San Diego, CA, USA: ACM,
June 2007, pp. 126–137.
[9] D. Abts and J. Kim, High Performance Datacenter Networks:
Architec- tures, Algorithms, and Opportunities. Morgan &
Claypool Publishers, 2011.
[10] C. E. Leiserson, “Fat-Trees: Universal Networks for
Hardware-Efficient Supercomputing,” IEEE Transaction on
Computers, vol. 34, no. 10, pp. 892–901, October 1985.
***
International Journal of Advances in Engineering and Applied Science (IJAEAS) Vol-1 Iss-2, 2014
26