publi-6482

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identifier 10.1109/ACCESS.2017.DOI

Preventing RLC Buffer Sojourn Delays in


5G
MIKEL IRAZABAL1 , ELENA LOPEZ-AGUILERA 1 , ILKER DEMIRKOL 2 , ROBERT SCHMIDT 3
AND NAVID NIKAEIN.3
1
Dept. of Network Engineering, Universitat Politècnica de Catalunya, Barcelona, Spain (e-mail: name.surname@upc.edu)
2
Dept. of Mining, Industrial and ICT Engineering, Universitat Politècnica de Catalunya, Barcelona, Spain (e-mail: name.surname@upc.edu)
3
Dept. of Communication Systems, Eurecom, Sophia-Antipolis, France (e-mail: name.surname@eurecom.fr)
Corresponding author: Mikel Irazabal (e-mail: mikel.irazabal@upc.edu).
This work was supported in part by the EU Horizon 2020 research and innovation program under grant agreement No. 675806 (5GAuRA),
grant agreement No. 857201 (5G-Victori) and in part by the Secretaria d’Universitats i Recerca del Departament d’Empresa i Coneixement
from the Generalitat de Catalunya under grant agreement No. 2017 SGR 376.

ABSTRACT The 3rd Generation Partnership Project (3GPP) is investing a notable effort to mitigate the
endogenous stack and protocol delays (e.g., introducing new numerology, through preemptive scheduling or
providing uplink granted free transmission) to attain to the heterogeneous Quality of Service (QoS) latency
requirements for which the fifth generation technology standard for broadband cellular networks (5G) is
envisioned. However, 3GPP’s goals may become futile if exogenous delays generated by the transport layer
(e.g., bufferbloat) and the Radio Link Control (RLC) sublayer segmentation/reassembly procedure are not
targeted. On the one hand, the bufferbloat specifically occurs at the Radio Access Network (RAN) since the
data path bottleneck is located at the radio link, and contemporary RANs are deployed with large buffers
to avoid squandering scarce wireless resources. On the other hand, a Resource Block (RB) scheduling
that dismisses 5G’s packet-switched network nature, unnecessarily triggers the segmentation procedure at
sender’s RLC sublayer, which adds extra delay as receiver’s RLC sublayer cannot forward the packets to
higher sublayers until they are reassembled. Consequently, the exogenously generated queuing delays can
surpass 5G’s stack and protocol endogenous delays, neutralizing 3GPP’s attempt to reduce the latency.
We address RLC’s related buffer delays and present two solutions: (i) we enhance the 3GPP standard and
propose a bufferbloat avoidance algorithm, and (ii) we propose a RB scheduler for circumventing the added
sojourn time caused by the packet segmentation/reassembly procedure. Both solutions are implemented and
extensively evaluated along with other state-of-the-art proposals in a testbed to verify their suitability and
effectiveness under realistic conditions of use (i.e., by considering Modulation and Coding Scheme (MCS)
variations, slices, different traffic patterns and off-the-shelf equipment). The results reveal current 3GPP
deficits in its QoS model to address the bufferbloat and the contribution of the segmentation/reassembly
procedure to the total delay.

INDEX TERMS 5G, Bufferbloat, low-latency, SDAP, RLC, OpenAirInterface.

I. INTRODUCTION shows how reliability depends on the coding scheme and


LTRA-reliable low-latency communications (URLLCs) confirms that new coding schemes (e.g., low-density parity-
U are intended to address two orthogonal weaknesses
faced in contemporary cellular networks: reliability and low-
check (LDPC) codes) can achieve arbitrarily small equiv-
ocation rates at the expense of using big data blocks [2].
latency. 3GPP has already defined the new channel codings [3], and
the successful achievement of reliable communications is
On the one hand, as Shannon proved in his information indispensable for the URLLC adoption. On the other hand, a
theory founding paper [1], given a noisy discrete channel large amount of efforts are being invested by 3GPP trying to
and a transmission rate smaller than the channel capacity, mitigate or eliminate the latency introduced by endogenous
there exists an encoding scheme capable of generating an cellular stack design (e.g., new numerology for mini-slots
equivocation rate () arbitrarily small. This theoretical result

VOLUME 4, 2016 1
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

or pre-emptive scheduling [4]) and cellular protocol (e.g., minimum amount of data that can be accessed is 1 byte.
uplink granted free transmission to avoid the Scheduling Therefore, if the information lies between 2 bytes, some bit
Request procedure delay [5]). Moreover, a new sublayer manipulation assembly instructions need to be generated to
for addressing different services has been introduced (i.e., access the value (e.g., with three E-LIs pairs of 12 bits, the
Service Data Adaptation Protocol (SDAP) [6]), and a new information of the second packet starts at the bit 12+1 = 13,
QoS indicator (i.e., Quality of Service Flow Indicator (QFI)) which corresponds to the second byte, and ends at position
will classify the different flows according to their different re- 12 + 12 = 24, which corresponds to the third byte). 5G
quirements [7]. However, exogenous 5G stack latency causes simplifies this process as the packet starting position infor-
(i.e., latencies not directly induced by the 5G stack such as mation is byte aligned and, thus, it can be directly accessed,
the bufferbloat or the RLC packet segmentation/reassembly sacrificing some throughput to reduce the latency. However,
procedure) can ultimately become the main contributors to this latency reduction may not play an important role if
the delay. another phenomenon that contributes to augment the delay
Bufferbloat at the RAN specifically occurs and plays a in 5G is ignored: the segmentation/reassembly procedure.
central role in the delay of low-latency traffic since (i) the Every TTI, the MAC scheduler requests a PDU (i.e., an
data path bottleneck resides at the RAN as contemporary amount of bytes) from the RLC sublayer. This may involve
wireless channel capacity is inferior to wired channel ca- segmenting an RLC Service Data Unit (SDU) packet to fit
pacity in contemporary networks; (ii) RANs are equipped within the demanded total size of the PDU. If the packet is
with large buffers to absorb the unpredictable dynamic ra- segmented, part of it is transmitted, while the rest waits at the
dio channel capacity and thus, avoid squandering wireless sender’s RLC sublayer. Once the rest of the packet is trans-
resources; and, (iii) flows with distinct characteristic share mitted, a reassembly at the receiver’s RLC sublayer occurs,
buffers on the data path in the 5G stack due to 5G’s QoS and the packet is submitted to the Packet Data Convergence
funnel architecture. The first two premises combined with Protocol (PDCP) sublayer in the downlink procedure. Even
TCP’s congestion control’s greedy nature (e.g., TCP Cubic though the segmentation/reassembly procedure depends on
[8]) are necessary and sufficient to generate a plethora of 5G stack exogenous causes (i.e., packet sizes, radio link
packets at the bottleneck, which induces delays in the order conditions and MAC scheduler algorithm1 ), it has a non-
of seconds [9]. In essence, packets from a greedy flow start negligible contribution in the delay that a packet suffers, as
accumulating at the bottleneck’s link buffer, impeding a rapid demonstrated in this paper. Moreover, slicing has emerged
packet delivery from other flows that share the same bottle- as a new 5G pillar feature [7], which fundamentally can be
neck queue. The ongoing bufferbloat research has primarily reduced to a resource allocation issue, and thus, exacerbates
focused on the IEEE 802.3 and the IEEE 802.11 standards, the problem. Unfortunately, recent research studies have
achieving remarkable results [10] and proposing numerous mostly focused on resource distribution [15] [16], ignoring
new algorithms (such as Controlled Delay (CoDel) [11] and 5G’s specificities and the packet-switched network nature of
Fair Queuing CoDel (FQ-CoDel) [12]). The main challenge the Internet, and therefore, the delays associated with RLC’s
with regard to the bufferbloat is two-fold: (i) maintaining the segmentation/reassembly procedure have been overlooked.
buffer with enough data to fully use the available bandwidth This paper addresses the bufferbloat and the segmenta-
and thus, avoid buffer starvation; and (ii) preventing exces- tion/reassembly procedure in 5G, and in summary makes the
sive data in the buffer to minimize the packet’s sojourn time. following contributions:
3GPP has also introduced stack network improvements at • We analyze 5G’s exogenous delays that arise at the
the RLC sublayer in 5G [13] compared to 4G [14] aiming RLC sublayer’s buffers (i.e., bufferbloat and segmenta-
to decrease the latency. At 4G, the RLC Protocol Data Unit tion/reassembly procedure).
(PDU) header for data transmission consists of a fixed (i.e., • We propose an enhanced 3GPP QoS scenario for im-
one or two bytes depending on the configuration) and an proving the latency in 5G.
extension part. The extension part is only present when more • We introduce a novel bufferbloat avoidance algorithm
than one packet is assembled and is composed by an Exten- (i.e., e5G-BDP) based on the bandwidth delay product,
sion bit (E) and a Length Indicator (LI). The E field is one bit which is the optimal theoretical pacing rate for avoiding
long and indicates if another set of E and LI fields follows, or the bufferbloat, while fully utilizing the link [17].
if the next bit is part of the data. The LI indicates the length • We present a resource scheduling algorithm (i.e.,
of the packet and varies from 11 to 15 bits according to the Enhanced Quantum Partition (EQP)) considering
configuration. In 5G the extension part has been renamed to 5G specificities that minimizes RLC’s segmenta-
Segment Offset (SO), and consists of 16 bits for all cases. It tion/reassembly procedure.
indicates the absolute position of the packet in bytes within • We implement and evaluate the performance of our pro-
the RLC PDU. This change of paradigm from a relative posed algorithms (i.e., e5G-BDP and EQP) against cur-
position to an absolute position (i.e., LI vs. SO) deteriorates rent state-of-the-art solutions, emulating real network
the data compression ratio as the difference between packets
starting positions is necessarily smaller than the absolute 1 3GPP does not define the MAC scheduler algorithm, and therefore, it
packets starting position. However, in modern processors the cannot be considered an endogenous stack delay cause.

2 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

conditions and using off-the-shelf equipment, ultimately


validating our solutions as they significantly reduce the
RLC buffer sojourn delay.
The rest of this paper is structured as follows. In Sec-
tion II, 5G’s contemporary QoS scenario is presented along
with our enhanced 3GPP QoS architecture proposal. Section
III thoroughly discusses the bufferbloat and the segmenta-
tion/reassembly procedure in 5G and describes related works
in the field. Our proposed solutions appear in Section IV,
while the utilized evaluation framework is described in Sec-
tion V. In Section VI the results from our solutions and the
state-of-the-art solutions are evaluated, and lastly, in Section FIGURE 1. Proposed enhanced QoS model per UE.
VII we expose the conclusions of this paper.

II. BACKGROUND IN 5G’S QOS MODEL Furthermore, if no scheduling capabilities are added to the
5G use case heterogeneity inherently creates a non-trivial SDAP sublayer, the only sublayer at the 5G-AN where QoS
QoS scenario that is described in [7]. In the following, the traffic engineering techniques can be applied is the MAC,
most important aspects of the QoS scheme for the data path which will pull data from bloated RLC buffers if the packets
in the 5G Access Network (5G-AN) are presented, assuming are forwarded as they arrive. Until the RLC sublayer, data is
the downlink procedure unless otherwise mentioned. transmitted in packets, where a header is added to the original
Packets from the data network will flow through the N3 packets that arrived to the SDAP sublayer. However, once the
interface to the 5G-AN already marked with a QoS Flow MAC sublayer notifies the RLC sublayer about the amount
Identifier (QFI) [18]. The QFI is responsible to denote among of bytes that need to be forwarded, packets are joined to form
other things: the maximum data burst volume, the resource the requested Transport Block (TB). If the requested TB size
type, the packet priority level for scheduling purposes, the (TBS) cannot be filled with the current packets (e.g., two
tolerated delay referred to as the Packet Delay Budget (PDB) 1000 bytes packets in the RLC buffer, yet the MAC requests
or the tolerable error rate through the Packet Error Rate 1500 bytes) a packet segmentation occurs. Consequently, part
(PER). PDB indicates the upper bound for the permissible of the original packet is transmitted to the UE, while the rest
delay, measured from the N6 interface (i.e., from the moment waits at the 5G-AN. This phenomenon delays the information
that the packet arrives to the User Plane Function (UPF)) delivery, as packets cannot be submitted to the UE’s PDCP
until the packet is received by the UE, while PER is de- until reassembled. Lastly, the MAC scheduler pulls the re-
fined as the amount of packets received in the UE’s PDCP quired data from the RLC queues and forwards it through the
sublayer divided by the amount of packets forwarded by Downlink Shared Channel (DL-SCH), as observed in Fig. 1.
the RLC sublayer of the 5G-AN. Three different resource To gain finer control over the packets’ QoS, we enhance the
types are described by 3GPP: Delay-Critical Guaranteed Bit current SDAP standard with two new capabilities: (i) we add
Rate (Delay-Critical GBR), Guaranteed Bit Rate (GBR) and a queue per QFI (i.e., 64 queues per SDAP entity) with the
Non-Guaranteed Bit Rate (Non-GBR) [7]. The first entity purpose of segregating different traffic flows and retain the
to apply QoS traffic engineering techniques in 5G-AN, is packets at the SDAP sublayer, and (ii) we provide scheduling
the newly defined SDAP sublayer [6]. An SDAP entity per capabilities to the SDAP sublayer.
PDU session is foreseen in [6], although it is mentioned
that other implementations are valid. SDAP’s main function III. PROBLEM DESCRIPTION AND RELATED WORK
or raison d’être is mapping the QFI flows into Data Radio This section describes the two phenomena analyzed in this
Bearers (DRBs), according to the configuration provided by paper along with the existing related work. In the first subsec-
the Radio Resource Control (RRC). Therefore, it lacks any tion, the bufferbloat phenomenon at contemporary cellular
queue or scheduling capability, contrary to what is depicted networks along with the state-of-the-art solutions for address-
in Fig. 1. However, the QFI is a 6 bit field (i.e., 26 = 64 [19]), ing it are presented. In the second subsection, RLC’s seg-
while the maximum number of DRBs is 30 [19]. For every mentation/reassembly procedure is thoroughly studied and its
DRB a new RLC entity is instantiated and therefore, an RLC effect on augmenting the delay is exposed.
buffer per DRB exists. This inevitably generates a funnel,
as shown in Fig. 1, where a many-to-one relation will occur A. BUFFERBLOAT AT THE RLC SUBLAYER
following the pigeonhole principle. 3GPP has not explicitly 1) Problem description
defined any scheduling capabilities in the SDAP. However, if Bufferbloat is the name by which the effect of excessive
stringent and diverse QoS requirements must be met, mobile buffering of packets in the bottleneck’s data link buffer is
network operators will need to provide a packet based sched- known. Such effect in the 5G stack is caused by (i) sender’s
uler, since finding and arbitrarily dequeuing packets with congestion control algorithm (e.g., TCP Cubic [8]), and
different QFIs that are already queued is costly and complex. (ii) large buffers at the bottleneck link (i.e., RLC sublayer
VOLUME 4, 2016 3
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

buffers). The default Linux kernel congestion control algo- governed by two parameters: the desired delay (5 ms by
rithm (i.e., TCP Cubic) is loss-based. Therefore, the conges- default) and the interval time value (100 ms by default). A
tion is detected through a packet lost, which ideally coincides timestamp is added to every newly ingressed packet and the
with the available bandwidth. sojourn time is measured when packets egress. If the sojourn
However, if large buffers are deployed at the bottleneck’s time of all the packets egressed during the interval time
link, a queue is formed once the available bandwidth is have been above the desired delay value, the next packet is
reached, misinforming TCP’s congestion control algorithm, dropped, informing the sender that congestion is happening,
as the packets are not lost, but rather experience a larger delay and the control law that determines the next drop time is
than expected due to the sojourn time at bottleneck’s link. In updated. The next drop time is reduced in inverse proportion
essence, TCP’s congestion control algorithm cannot differen- to the square root of the number of packets dropped since
tiate between the delay generated by congested buffers, and the dropping state was entered. Such an approach permits
the delay produced by the packet propagation. Contempo- the existence of bursty traffic during periods shorter than the
rary wireless links are considerably slower than wired links interval value. Recent results on CoDel in cellular networks
(e.g., a 20 MHz bandwidth LTE base station can forward [21] [24] [26] show a latency reduction when adopted.
approximately 70 Mbit/s with a 28 MCS [20], in contrast with Other state-of-the-art solutions try to avoid the bufferbloat
400 Gbit/s of an IEEE 802.3db fiber-optic physical media through the keep the pipe justf ull, but not f uller prin-
interface), forming the bottleneck at the RAN, and specifi- ciple described by Kleinrock [17]. The TCP BBR [27] algo-
cally at the RLC sublayer, where the last buffer before the rithm fulfills such principle from the OSI Layer 4 perspective.
wireless transmission is located. Due to the dynamic nature BBR estimates the actual bandwidth observing the Round
of the radio channel capacity, service providers deploy large Trip Time (RTT) of the packets, and it interprets an increase
RLC buffers aiming to avoid squandering the scarce wireless in the RTT as a sign of bottleneck forming, thus reducing
resources, and thus, unintentionally generating the necessary its sending rate. However, as demonstrated in [24], such an
conditions for the bufferbloat to appear. The challenge to approach may not be optimum in a mobile network. A similar
avoid the bufferbloat in 5G is explained in the following approach is considered in [28], where the congestion control
dichotomy. On the one hand, the RLC buffer must contain algorithm is manipulated through the ECN bits to acceler-
enough bytes to feed the MAC sublayer. A failure to this ate or slow down the packet delivery rate. Other advanced
requirement results in wireless resource under-utilization. On algorithms that are based on the same principle within the
the other hand, no more bytes than the requested from the cellular network are 5G-BDP and USP [24], which reported
MAC sublayer should be waiting at the RLC queue, so that promising results.
a low-latency flow can avoid unnecessary queuing sojourn Segregating the flows into different buffers is one of the
time. solutions that can partly eliminate the sojourn time induced
by other flows. It has been successfully implemented in the
2) State-of-the-art solutions Fair Queuing scheduler [29], and presents many variants
Small queue sizes result in lower latencies as demonstrated (e.g., FQ-CoDel [12]). A greedy flow will not monopolize a
in [21]. The basic idea is to reduce the amount of packets queue’s assigned throughput as it rests in a separate queue,
in the queues caused by overdimensioned buffers, while not and a scheduler pulls data according to different policies
starving the transmission channel. Due to the queue size (e.g., round-robin or earliest deadline first). Recently, a new
limits, packets start accumulating at higher sublayers follow- research paradigm (i.e., Low Latency, Low Loss, Scalable
ing a phenomenon known as back-pressure. If the packets Throughput (L4S) [30]) proposes to segregate the packets
reside at different queues, a scheduler can easily pull the most into two queues: one for the traffic prone to generate queuing
stringent demanding packet from the higher layer queues, re- delay, and the other one for traffic that inherently avoids
ducing the sojourn delay. Such a scheme is used by Dynamic the bufferbloat (e.g., traffic generated through TCP’s BBR
RLC (DynRLC) [22] and Enhanced Bearer Buffer (EBB) congestion control algorithm). However, there exist many
[23], presented by the same authors. These methods estimate services that demand stringent low-latency delays to function
the available bandwidth measuring the packet’s sojourn time. correctly (e.g., VoIP [31] or mobile gaming [32]), where
If the sojourn time increases, the allowed size of the buffer, identifying and tagging them for segregation purposes rep-
which is defined as the maximum number of SDUs, decreases resents a challenge, due to their origin’s dynamic nature
and no more packets from higher sublayers (i.e., PDCP) are (i.e., servers can be relocated, altering their 5-tuple identifier)
delivered. If on the contrary, the sojourn time decreases, the and amount. Furthermore, the rising digital privacy concerns
buffer capacity limit of the RLC is augmented. The Dynamic in contemporary societies will increase the encrypted, as
RLC Queue Limit (DRQL) [24] is a similar solution based well as, relayed traffic, disabling deep packet inspectors and
on limiting the queue size, considering the amount of bytes origin/destination identifiers’ tagging capability. Lastly, as
instead of the number of SDUs remaining on the queue. explained in Section II, the 5G QoS architecture forms a
Another well studied policy for improving the buffer so- funnel with limited QFIs (i.e., 64) and DRBs (i.e., 30) per
journ time is Active Queue Management (AQM) [25], being UE, and therefore, the solution of segregating the packets
CoDel [11] the most widely applied AQM policy. CoDel is lacks scalability as the number of flows grows.
4 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

B. SEGMENTED PACKETS AT THE RLC SUBLAYER


1) Problem description
Packets shall flow through the 5G stack until the RLC buffer,
as shown in Fig. 1 and may start accumulating there, as the
wireless link is the slowest link in the data path. Packets wait
at the RLC sublayer until the MAC scheduler pulls a specific
number of bytes. While every UE will be provided with at
least one DRB, up to 30 DRBs per UE can coexist, and
therefore, 30 RLC buffers. Parallel queues will be formed,
and thus, the MAC scheduler has to map the available re-
sources to the RLC buffers according to the scheduler policy
(e.g., round-robin). The RLC buffer is a FIFO queue [33]
[34] since packets within a DRB should be treated equally
[7], and, therefore the MAC scheduler can only access the
most recently inserted packets after it has pulled the older
ones. This introduces a precedence relation for the packets
that belong to the same queue (i.e., packets from the same FIGURE 2. RLC UM functions as described by 3GPP [13].
queue can only be egressed following the arrival order).
Furthermore, the radio resource allocation performed by the
MAC scheduler does not map directly bytes to the RLC in the cases where the next packet size exceeds the TBS.
buffers, but rather it assigns Resource Blocks (RBs). RBs For example, in a static scenario with a LTE base station
are grouped into Resource Block Groups (RBG) according to with a 5 MHz bandwidth, under the best channel conditions
the cell configuration (e.g., in a 5 MHz bandwidth cell with (i.e., 28 MCS), approximately 2289 bytes can be transmitted
Type 0 and Configuration 1, the minimum number of RBs every TTI [20]. However, bulky flows in an IP network will
to distribute is 2, except for the last RB [35]), which forms use the maximum allowable packets size (i.e., 1500 bytes in
the smallest unit that can be assigned to a UE. Moreover, Ethernet) to minimize the protocol’s overhead and maximize
to assure an error rate below 10% [35], the UE delivers the transmitted information ratio. This example shows that
a channel quality estimation through the Channel Quality even ignoring the dynamic radio link channel’s capacity (i.e.,
Indicator (CQI) in uplink. The CQI is a scalar and its value assuming a static TBS of 2289 bytes), a myriad of fragmented
is translated into a Modulation and Coding Scheme (MCS) packets at the RLC sublayer are generated as the TBS notified
[35]. The MCS defines the modulation to use (i.e., BPSK, by the MAC would rarely coincide with the packets’ size, and
QPSK, 16 QAM, 64 QAM or 256 QAM that transmit 1, consequently, the delay is increased.
2, 4, 6 or 8 bits per symbol, respectively) and coding rate,
and thus, the channel capacity is determined by the radio The principal constraints to consider in RLC’s segmenta-
conditions (i.e, under good radio conditions, larger amount tion/reassembly procedure can be summarized as:
of information can be transmitted). • The RLC buffers are FIFO queues, and thus the packets
Furthermore, 3GPP defines three different modes in which cannot be pulled arbitrarily.
a RLC entity can be instantiated [13]: Transmission Mode • The resource allocation is performed through RBG,
(TM), Unacknowledged Mode (UM) and Acknowledged rather than bytes.
Mode (AM). Through a TM entity, only control information • The MCS determines the channel capacity, which dy-
can be forwarded, while data information can flow by either namically changes according to the radio link condi-
a UM or AM entity. Both UM and AM share the ability to tions.
segment a packet if the TBS notified by the MAC does not
fit within the size of the packets waiting, as seen in Fig.
2) State-of-the-art solutions
2. According to 3GPP [13], packets at the RLC sublayer
will be segmented if the RLC SDU size is larger than the Resource allocation has recently received significant atten-
bytes requested by the MAC sublayer. As seen in Fig. 2, tion as it is one of the pivot ideas around the slicing concept
once packets are segmented and a RLC header is added, they [36] [37] [38]. Unfortunately, most literature about slicing
are transmitted to the receiver’s RLC, where after removing discusses the resource allocation problem without consider-
the RLC header, they wait for a SDU reassembly before ing 5G’s packet-switched network nature (e.g., segmentation
submitting them to the next sublayer (i.e., UE’s PDCP in the problem where the information is not forwarded unless all
downlink procedure). Therefore, information will not be for- the fragments are reassembled), or 5G’s RB distribution
warded until a complete reassembly occurs, which in the best peculiarities (e.g., RBG).
case will occur in the next TTI. Segmentation/reassembly In contrast, in other protocols such as Ethernet (i.e., IEEE
procedure guarantees a full frequency spectrum utilization 802.3) or Wi-Fi (i.e., IEEE 802.11), the data is transmitted
VOLUME 4, 2016 5
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

asynchronously. In Ethernet, the MTU size is 1500 bytes2 throughput, and therefore, to avoid forwarding them unless
[39], while in Wi-Fi it reaches 2304 bytes [40]. Even though necessary also improves the sojourn time suffered by low-
modern versions of Ethernet do not use any collision de- latency packets.
tection mechanism, Wi-Fi employs a collision avoidance To avoid the situations described, we present the enhanced
mechanism to orchestrate the access of multiple stations to 5G-BDP (e5G-BDP) solution which is threefold. In the
the wireless channel. An acknowledgment frame from the re- first place, the BDP is calculated through an Exponentially
ceiver is used to confirm that the data arrived correctly. IEEE Weighted Moving Average (EWMA). An EWMA smooths
802.11 increases its probability of successfully transmitting out short term fluctuations while exposes longer term trends.
a packet in a noisy channel through packet fragmentation. This provides a more accurate bandwidth computation and
However, the asynchronous nature of accessing the channel, absorbs outlier bandwidth oscillations (e.g., bandwidth vari-
compels to aggregate frames for achieving full bandwidth ations due to HARQ/NACK or non-uniform scheduling). In
[41], and therefore, the fragmentation procedure is rare in the second place, packets are forwarded more actively in
comparison with the RLC segmentation/reassembly mecha- comparison with 5G-BDP, avoiding the non-virtuous cycle
nism that arises in cellular networks due to their synchronous previously explained. In e5G-BDP packets are also for-
nature. Therefore, the segmentation/reassembly procedure warded according to the BDP, yet ignoring the amount of
can be mostly considered as a cellular network specificity, accumulated bytes at the RLC buffer. However, if packets
and consequently, it has not been thoroughly researched in are not forwarded in the following TTI, the BDP will be
IEEE 802.3 and IEEE 802.11 standards. reduced, and thus this second measurement can be thought
as a smooth packet forwarding reduction mechanism. In the
IV. PROPOSED SOLUTIONS TO ADDRESS 5G’S third place, the size of the packet to forward to the RLC buffer
EXOGENOUS DELAYS is considered (i.e., small size packets are more proactively
In this section, the proposed solutions to mitigate the exoge- submitted to the RLC buffer while large packets tend to stay
nous delays suffered by the 5G network stack are presented. longer at the SDAP sublayer).
Specifically, a bufferbloat avoidance algorithm is presented e5G-BDP works within a 1 ms TTI, as it is the lowest
along with a novel algorithm that reduces the RLC SDU common denominator of 5G’s possible TTIs, since the slot
segmentation/reassembly procedure delay. duration in 5G fluctuates between 1 ms and 62.5 µs at the
expense of using more frequency spectrum [42].
A. ENHANCED 5G-BDP (E5G-BDP) The pseudo-code of e5G-BDP is presented through the
In [24], we presented different solutions that rely on the Algorithms 1, 2 and 3. e5G-BDP algorithms are executed by
communication between the SDAP and the MAC sublayers. different sublayers (i.e., SDAP and MAC). Algorithm 1 rep-
Among them, 5G-BDP was presented and shown as the most resents the main e5G-BDP function, which is executed by the
successful algorithm to address the bufferbloat. 5G-BDP SDAP scheduler per active RLC buffer periodically within a
calculates the Bandwidth Delay Product (BDP), which is the TTI, to determine whether a packet can be sent to the subse-
optimal pacing value to forward the packets from the SDAP quent lower sublayer. e5G-BDP calculates the BDP per RLC
to the RLC to work on the optimal rate [17]: not starving the buffer. It first checks whether the update_f lag variable was
MAC scheduler, but also avoiding to bloat the RLC buffer. set (line 1) by the MAC scheduler, and if so, it recalculates
However, 5G-BDP expects a uniform bandwidth schedul- the bandwidth calling the function update_last_bandwidth
ing between TTIs (i.e., it expects the MAC scheduler to given in Algorithm 3. The update_f lag is set every 1 ms by
distribute the RBs uniformly, such as {6,6,6,6,6} instead the MAC sublayer, after the packets from the RLC sublayer
of {0,12,0,0,18} during 5 TTIs). The resource distribution are received. At line 5, it is checked if the sum of the
relies on the MAC scheduler policy, from which 5G-BDP bytes submitted from the SDAP to the RLC (i.e., SDAP-
is decoupled, as it only acquires RLC buffer occupancy RLC submitted bytes srs_bytes) and the last accumulated
information. For example, the MAC scheduler algorithm may bytes in the RLC buffer surpasses the maximum capacity
assign RBs according to the current buffer occupancy. In such under the current MCS. If the sum surpasses the capacity,
scenarios, where the RBs are not uniformly distributed, a a true value is returned, thus, informing the SDAP scheduler
non-virtuous cycle may occur in 5G-BDP. 5G-BDP will not that the limit is reached, and therefore, no more packets are
forward packets to the RLC sublayer, since in the last TTI forwarded to the RLC buffer. If the sum is smaller than
no RBs were assigned to the RLC buffer, while the MAC the capacity, the paced_bytes (line 8) are calculated. This
scheduler might not assign more RBs to the RLC buffer feature, enables the pacing capability, as the paced_bytes
due to its low occupancy. Additionally, 5G-BDP does not depend on the elapsed time since the last 1 ms TTI (e.g.,
consider the size of the current packet to submit. Once a large if the actual bandwidth is 2200 bytes/TTI and the elapsed
packet is queued in the RLC buffer, completely submitting it time since the last 1 ms TTI is 0.5 ms, 1100 bytes are
can last several milliseconds, especially in scenarios with low the theoretical paced bytes, without any sum or multiplica-
2 The standard explicitly talks about octets rather than bytes. Through
tive factor). In this manner, large packets (e.g., 1500 bytes
this paper, no differentiation is made and the bytes are considered as 8 bits packets) are more likely submitted during the last moments
objects or octets. of the TTI (e.g., (750, 1000) µs interval), and therefore,
6 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

if a low-latency packet with higher priority arrives at the Algorithm 1: e5G-BDP limit_reached.
beginning of a TTI (e.g., (0, 750) µs interval) it can avoid It answers the SDAP sublayer whether to keep a
a bloated buffer. It can be argued that the best results could packet or forward it.
be achieved if the packets are kept at the SDAP sublayer and Input: Size of the next packet (data_size)
forwarded to the RLC sublayer just a moment before the TTI Output: Bool value whether to forward or keep the packet
(i.e., T T I − ). However, the cellular network is a real time 1: if update_f lag == T rue then
system, where the lower layers/sublayers (i.e., PHY layer or 2: update_bw_est(); // Alg. 2
MAC sublayer) have higher priority than the higher sublayers 3: update_f lag = F alse
(i.e., RLC, PDCP or SDAP sublayers). Therefore, it may 4: end if
occur than the T T I − forwarding opportunity at the SDAP 5: if srs_bytes + last_acc_bytes > max_bytes then
sublayer is missed, which would lead to starve the MAC 6: return T rue;
sublayer (i.e., not having enough bytes to forward from the 7: end if
RLC to the MAC), and thus, squander throughput. At line 8: paced_bytes = calculate_paced_bytes();
9, the paced_bytes is reduced by a constant in the range // Alg. 3
of (0.0, 1.0], and if it exceeds the sum of the next packet 9: if
data size (i.e., size of the packet candidate to forward to paced_bytes×reduce_const > data_size+srs_bytes
the RLC from the SDAP) and the already submitted bytes, then
a false value is returned, informing the SDAP scheduler to 10: return F alse;
submit the packet. This feature enables the forwarding of the 11: end if
packets independently of the last measured buffer occupancy, 12: extra_bytes = data_size/5;
contrarily to what happens at line 16, and mostly in the latter 13: if srs_bytes 6= 0 ∧ last_acc_bytes 6= 0 then
moments of the TTI (i.e., in the (500, 1000) µs range rather 14: extra_bytes = data_size/3;
than the (0, 500) µs range since last transmission, assuming 15: end if
a 1 ms TTI), as the paced_bytes value increases with the 16: if srs_bytes + last_acc_bytes + extra_bytes >
elapsed time and, therefore, dispatches packets more actively paced_bytes then
as compared with 5G-BDP. It also favors forwarding small 17: return T rue
packets. At line 12, the extra_bytes variable is initialized to 18: end if
data_size/5, and if a packet has already been transmitted 19: return F alse
and some bytes were accumulated during the last TTI, it is
set to data_size/3 at line 14. It incentivizes forwarding the
next packet if during the current TTI no packet has already Algorithm 2: e5G-BDP update_bw_est.
been forwarded and the RLC buffer was emptied in the last It estimates the actual bandwidth per RLC queue.
TTI. This variable also discourages submitting large packets Input: Current time (now)
that are one of the causes of the bufferbloat. Lastly, at line Output: Updated bandwidth and RLC buffer occupancy
16, the already submitted bytes (i.e., srs_bytes) plus the information (rms_bytes,bandwidth, srs_bytes,
last accumulated bytes on the buffer, and the extra_bytes last_acc_bytes and last_tti)
variable, are compared against the paced_bytes to decide to 1: acc_bytes = get_bytes_rlc_queue();
forward or keep the packet at the SDAP sublayer. Contrarily 2: rms_bytes =
to line 9, in this last condition (i.e., line 16) the occupancy of srs_bytes − (acc_bytes − last_acc_bytes);
the RLC buffer after submitting the RLC PDU is considered 3: bandwidth = EM W A(rms_bytes/T T I);
before forwarding the packet. 4: srs_bytes = 0;
Algorithm 2 and Algorithm 3 are helper functions, where 5: last_acc_bytes = acc_bytes;
the last bandwidth according to the RLC buffer occupancy 6: last_tti = now;
after the 1 ms TTI is estimated, and the paced_bytes are
calculated. In Algorithm 2, the bandwidth is estimated ac-
cording to the number of bytes that were pulled by the
MAC from the RLC buffer through an EWMA calculation. the elapsed time, the bandwidth and the last accumulated
The bytes that remained in the RLC buffer are gathered bytes. A multiplicative factor variable at line 2 (i.e., incr) that
(i.e., line 1), the bytes that were transmitted to the MAC depends on the elapsed time helps forwarding packets more
sublayer are calculated (i.e., line 2), and the new bandwidth actively during the latter moments of the TTI, since the value
is estimated (i.e., line 3). Lastly, the submitted bytes from the increases during the second half of the TTI. Therefore, the
SDAP sublayer to the RLC are reset (i.e., line 4), and the packets are forwarded more actively during the last moments
last_acc_bytes and last_tti updated. As previously men- of the TTI, avoiding a possible starvation that would have
tioned, e5G-BDP is intended to be executed per active DRB, an impact in the throughput. If the bandwidth is not equal to
and thus, Algorithm 2 is called for each active RLC buffer. zero, an additive factor of M T U/7 is used, again, to avoid a
In Algorithm 3 the paced_bytes are calculated, based on possible starvation in real time systems at line 5. The value
VOLUME 4, 2016 7
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Algorithm 3: e5G-BDP calculate_paced_bytes. quantitative discussion. Let us assume that at t = 1/4 of the
It calculates how many bytes could had been for- TTI since the last RLC to MAC forwarding event, e5G-BDP
warded theoretically, according to the current band- obtains an opportunity to forward packets from the SDAP
width and the elapsed time since last TTI. sublayer to the RLC sublayer. These opportunities cannot be
Input: Current time (now) precisely predicted, as the upper stack sublayers (i.e., RLC,
Output: Paced bytes (paced_bytes) PDCP and SDAP) may miss some events, in contrast with
1: elapsed = (now − last_tti)/T T I ; the lower layers/sublayers (i.e., PHY and MAC), where syn-
2: incr = elapsed < 0.5?1.2 : 1.33; chronization is mandatory, and thus, real time predictability.
3: paced_bytes = 0; As observed in Fig. 3, the packet to forward is large in
4: if bandwidth 6= 0 then comparison with the current computed bandwidth, which is
5: paced_bytes = calculated through an EWMA filter. The paced_bytes value
incr × elapsed × bandwidth + M T U/7; is 1.2×0.25 T T I ×1000 bytes/T T I +1500/7 bytes = 514
6: else if last_acc_bytes == 0 ∧ elapsed > 0.5 then bytes, while the sum of the srs_bytes (i.e., bytes forwarded
7: paced_bytes = M T U/4; from the SDAP to the RLC during this TTI, 0 bytes), the
8: end if last_acc_bytes (i.e., 600 bytes) and the extra_bytes (i.e.,
9: return paced_bytes 1500/5 = 300 bytes) reaches 900 bytes. Therefore, e5G-
BDP refuses to submit the packet, and the same reasoning
applies when e5G-BDP obtains a forwarding opportunity
at t = 1/2 (i.e., paced_bytes = 1.2 × 0.5 T T I ×
1000 bytes/T T I + 1500/7 bytes = 814 bytes), since
forwarding a large packet during the first moments of the
TTI, would block the possibility of other packets with higher
priority that may arrive during the TTI to be transmitted in the
next TTI. At t = 3/4, another forwarding opportunity occurs.
In this case, however, a smaller packet with higher priority
arrived into the SDAP between t = 1/2 and t = 3/4. Since
paced_bytes increases with the elapsed time (i.e., 1.33 ×
0.75 T T I × 1000 bytes/T T I + 1500/7 bytes = 1211.5
bytes), the next packet to submit is small (i.e., 200 bytes), and
thus the variable extra_bytes (i.e., 200/5 = 40 bytes), and
the RLC buffer is not bloated (i.e., last_acc_bytes = 600
bytes and srs_bytes = 0 bytes), the packet is forwarded.
However, the large packet (i.e., the 1500 bytes packet) is
FIGURE 3. e5G-BDP pacing mechanism outcome. not forwarded during this opportunity, as the paced_bytes
value did not change (i.e., 1211.5 bytes), and is smaller than
the sum of the already sent bytes (i.e., srs_bytes = 200
of M T U/7 is chosen as a compromise in a packet based bytes), the extra_bytes (i.e., 1500/3 = 500 bytes) and
network as 5G, where most of the packets will be based on the last_acc_bytes (i.e., 600 bytes). Lastly, a forwarding
TCP and transported through Ethernet (i.e., MTU of 1500 opportunity during the last moments of the TTI (t = T T I − )
bytes), and UDP packets, that are normally scarce, small in arrives at the e5G-BDP. This time, e5G-BDP forwards the
size and do not contribute as severely to the bufferbloat as large packet that it refused to submit before, as the sum of the
the TCP ones. The MTU value is divided by 4 (i.e., line 7) submitted bytes variable (i.e., srs_bytes = 200 bytes), the
with the intention of forwarding the packet to the RLC buffer last_acc_bytes (i.e., 600 bytes) and the extra_bytes (i.e.,
in the cases where the bandwidth was 0, the RLC buffer is 1500/3 = 500 bytes) is smaller than the paced_bytes (i.e.,
empty and the elapsed time is beyond 50% of the TTI, as seen 1.33 × 1.0 T T I × 1000 bytes/T T I + 1500/7bytes = 1544
in line 6. This assures that a packet will be forwarded from bytes). In this manner, e5G-BDP prioritizes utilizing full
the SDAP as the theoretical calculated value (i.e., MTU/4) bandwidth (i.e., it submits more bytes than the calculated
will be bigger than the data_size/5 for all the packet sizes, bandwidth) at the possible cost of generating some sojourn
as seen in Algorithm 1 at line 16. time. In Fig. 3, if we suppose that the next bandwidth
e5G-BDP pacing mechanism is illustrated in Fig. 3. Let will be equal to the calculated one, the packet forwarded
us assume that 600 bytes remained in the RLC buffer from at t = T T I − will be segmented and some bytes of it
the previous TTI (i.e., last_acc_bytes = 600 bytes), the shall block the path of future packets. However, through this
calculated bandwidth is 1000 bytes/TTI, the MTU is 1500 pacing mechanism, the buffer starvation is avoided, while the
bytes and that packets from two different flows share the sojourn time of packets with different QFIs that share the
RLC buffer, one bulky (i.e., 1500 bytes packets) and one with RLC buffer is reduced.
low-latency requirements (i.e., 200 bytes packets) [31] for a e5G-BDP was explicitly designed to maintain 5G’s sublay-
8 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

first and the second packet, d(1500 + 500)/88e = 23 RBs


are required. Hence, adding the second packet, has a total
contribution in the number of RBs of 23 − 18 = 5 instead of
d500/88e = 6. Therefore, that queue would contain a 18 RB
packet followed by a 5 RB packet. The objective function to
avoid the segmentation/reassembly procedure given the 5G
model is to maximize the number of packets pulled from
the RLC buffers, given a capacity C during a TTI, which
mathematically can be expressed as:
N
!
X
FIGURE 4. FP and EQP RB distribution algorithms.
max xi (1)
i=1
s.t.
N
ers decoupled since coupling impedes the 5G foreseen func-
X
xi ri ≤ C (2)
tional split [43], is prone to bugs [44], and is a undesirable i=1
software architecture feature overall.
xi ∈ {0, 1}, ∀i ∈ N (3)
B. ELASTIC QUANTUM PARTITION (EQP) xi ≤ xj , ∀(i, j) ∈ X (4)
As described in Section III-B, packet segmentation procedure
enables full wireless resource utilization, yet it should be The objective function is (1), where the number of packets
avoided whenever possible as it increases the latency. For ex- transmitted is to be maximized. Thus, x can be either 0
ample, a strict spectrum based distribution or Fixed Partition (i.e., not selected) or 1 (i.e., selected) according to constraint
(FP) (i.e., fixed number of RBs per RLC buffer every TTI) (3). Constraint (2) models the requirement that the sum
leads to a plethora of packets waiting at the receivers’ RLC of the RBs, where r represents the amount of RBs of a
buffer, which significantly augments the delay. In Fig. 4, the packet, cannot surpass the capacity C. Lastly, (4) models
problem is depicted with four RLC buffers, each containing the precedence constraint. If a precedence constraint exists
one packet of 4 RBs, and a bandwidth of 4 RBs/TTI. In the FP between two packets (i, j) (i.e., they belong to the same
approach, one fourth of each packet is sent every TTI. This queue and the packet xj arrived before packet xi ), the packet
leads to segmented packets that cannot be forwarded until the xi can only be selected (i.e., get the value 1) if the packet
4th TTI, leading to an average delay of 4 × 4 TTIs / 4 pack- xj has already been selected (i.e., has already a 1 value).
ets = 4 TTI/packet. Contrarily, a segmentation/reassembly This segmentation avoidance algorithm can be reduced to a
avoidance distribution would assign all the capacity to one well-known problem: the precedence constrained knapsack
RLC buffer every TTI, so that no segmentation/reassembly problem (PCKP)3 , which is also known as the open pit
happens, and packets can flow to the next sublayer, leading mining problem. Even though the PCKP is known to be N P-
to an average delay of (1 + 2 + 3 + 4) TTI / 4 packets = 2.5 complete [47], there exists a solution that runs in O(N C). In
TTI/packet. 5G, the capacity C is a modest number that depends on the
We mathematically formalize the RLC segmenta- channel bandwidth (e.g., in 5G the maximum number of RBs
tion/reassembly model as follows. Consider the partially is limited to 275 [42]). This fact mitigates the effect of C
ordered set (X, ≤) composed by N packets at the RLC in the scheduling algorithm, as no more than the equivalent
sublayer x1 , x2 , ..., xN , where N ∈ N. Since packets are to 275 RBs per queue have to be considered, and Ethernet
egressed in a FIFO order from every RLC queue, there packets will be mostly composed by packets larger than one
exists a precedence constraint (i.e., to egress a particular RB4 . However, N depends on the number of connected UEs,
packet all the packets that arrived before it have to be first which in commercial base stations can be of the order of 104
dequeued), and therefore, the precedence relation (i, j) exists [48], which should be multiplied by the maximum number
in X if item i can only be pulled once j has been pulled. of DRBs (i.e., 30 [19]) per UE. The PCKP applied to the
We represent the radio channel capacity with C ∈ N, and RB distribution, maximizes the number of non-segmented
the set R composed of r1 , r2 , ..., rN contains the ceil size 3 The Weighted Completion Time and Chains [46] algorithm which
in RBs of the RLC SDU packets. These values need to achieves the optimum scheduling for precedence constrained jobs, does
be recalculated every TTI due to the radio link channel not output the optimum permutation as the time is discrete rather than
dynamicity. As an example, and without loss of generality, continuous (i.e., it can only be a multiple of a TTI, but not a fraction of
it).
let us assume that a RB can transport up to 88 bytes (i.e., the 4 Since the RB set contains the ceil value in RBs of the packets, adding
approximate number of bytes per RB with 28 MCS [45]), a new small packet could result in a 0 size RB packet. This has no
and that a queue contains two packets of 1500 and 500 practical relevance as (i) no infinite number of packets with size zero can
be accumulated as eventually a new RB will be needed and, (ii) packets are
bytes, in that precise order. For transmitting the first packet, usually sent with at least tens of bytes in information to mitigate the header
d1500/88e = 18 RBs are needed, while for transmitting the overhead inserted by the transport layer.

VOLUME 4, 2016 9
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

packets and, therefore, the delay suffered by segmentation certain traffic patterns, and thus, when low-latency packets
is minimized. However, such scheduling algorithm does not arrive, an already indebted slice would lack the capability
include any fairness. A rogue flow that transmits smaller of borrowing enough resources to transmit the low-latency
packets than its competitors would monopolize the access to packets rapidly. In the last years, different buffer management
the resources if we only cared about the objective function policies for packet switches [51] using the competitive anal-
(1). Fortunately, the problem of the fairness in a packet based ysis [52] have flourished. The competitive analysis measures
network has been widely studied in the last decades [49] the optimality of an online algorithm (i.e., an algorithm that
[50]. In [49], the Deficit Round Robin scheduler is presented, takes decisions without knowing the entire input sequence
where the quantum is used to represent the total bandwidth from the beginning) against the optimal offline algorithm
fraction of each queue in bytes. At every egress opportunity, (i.e., an algorithm that knows the entire input sequence
the corresponding quantum value is added to a state vari- before starting) performance for any input packet sequence.
able that is maintained per queue. If the following packet’s Even though the cellular network segmentation/reassembly
size is smaller than the accumulated quantum, the packet is problem cannot be reduced to a known scheduling buffer
forwarded and the quantum value is reduced according to model due to its complexity (e.g., different packet sizes or
the packet’s size. This procedure is repeated until either the dynamic radio link capacity), some outputs can be applied
queue is empty or the following packet’s size exceeds the to it. In [53], five different buffer management policies are
quantum value. In this manner, a past bandwidth deficit is presented for packets that can have two values (i.e., high
fixed in future egress opportunities, achieving fairness. How- or low). The most successful policy (i.e., Dynamic Flexible
ever, in a typical spectrum sharing scenario, the percentage Partition) accepts low value packets according to the amount
of the total RBs is agreed between different parties through of enqueued low value packets and it’s free slots through
a Service Level Agreement (SLA), and such percentage of an exponential function. The reasoning being as follows.
RBs is respected within a window time, similarly to the Since in an online algorithm, the future packet sequence is
guarantees offered by QFI in 5G (e.g., GBR) [7]. Henceforth, unknown, the resources for packets that do not contribute to
a dichotomy between the short and long term objectives the total reward significantly should be used cautiously, as
is explicitly presented. On the one hand, in the short term the algorithm may need the resources in the near future for
we want to reduce the segmentation that causes delay. On more profitable packets. We use an analogous approach to
the other hand, inside the agreed window time, we want to discourage an already indebted slice from acquiring more
respect the percentage of RBs between different queues. resources. The size of the packets are multiplied by an
Aiming to bring fairness to our algorithm, we introduced exponential function that depends on the borrowed RBs (i.e.,
a variable named quantum q. Next, we assume a slicing the quantum value). In this manner, acquiring a larger amount
scenario with a SLA between different slices (i.e., an amount of quantum (i.e., acquiring a new debt) is discouraged when
of RBs per slice every TTI). Moreover, an analogous sce- the already borrowed quantum approaches the limit, as the
nario arises whenever a UE has different active DRBs, and packets seem larger. With this mechanism, the greedy effect
thus, our proposed algorithm is also valid in such scenarios. of selecting the packets from an indebted slice with small size
The quantum is increased with the theoretical amount of packets is limited, achieving the objective of reducing the
RBs corresponding to the slice during that TTI (e.g., 12 packet segmentation effect while maintaining the fairness in
RBs if there are 24 available RBs and the resource share the SLA within a small deviation. To achieve such objectives,
assigned is 50% of the total RBs). Our algorithm lets slices we propose the Elastic Quantum Partition (EQP) presented
lend (i.e., quantum to increase) or borrow (i.e., quantum in Algorithm 4.
to decrease) RBs within a limit. Consequently, a slice can EQP first assigns the corresponding quantum to every slice
borrow more RBs than planned in a TTI to avoid packet according to their SLA and the available RBs for the current
segmentation and, thus, reduce the delay. However, to avoid TTI (i.e., in a 50% slice where 24 RBs can be allocated,
unfairness, a limit to the amount of quantum is applied. Such b24 × 0.5c = 12 RBs). It then converts the current queues
quantum limit depends on the deviation from the agreed with packets in bytes, into queues with RBs considering the
percentage of RBs inside the window time (e.g., in a time cellular network characteristics (e.g., the MCS), as well as
window of 1 second, with a 1 ms TTI, a SLA of 12 RBs the quantum, through the generate_rbs function. Packets
and a quantum limit of 100 RBs, a maximum deviation of belonging to indebted slices (i.e., negative quantum) are
max_possible_RB_deviation/total_RBs = (100 - (-100)) considered larger (i.e., their number of bytes are multiplied
/ (12 RBs x 1000 TTIs/sec) = 1.6% is expected). If the by an exponential function that depends on the fraction
transmission of the following packet decreases the quantum quantum borrowed/quantum limit). In line 3, Algorithm
beyond the minimum limit, the access to new RBs is denied. 4 calls the segmentation avoidance algorithm. It returns
The sum of the resource quantums of all the slices is zero a permutation of packets considering the slices’ quantum
Pn that does not trigger the segmentation/reassembly procedure.
qi = 0 since the resources lent by a slice are borrowed by Next, EQP assigns the RBs to the permutation returned by the
i=1
another, resulting in a zero sum operation. However, a greedy seg_avoid function and updates the remaining total_rbs, as
approach leads to slices working at the quantum limit under well as the slices’ quantum. However, some RBs may still
10 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

be unallocated (i.e., total_rbs may not be 0). Therefore, the Algorithm 4: Elastic Quantum Partition (EQP).
slices are sorted in quantum descending order and the next It maps the free RBs to non empty RLC buffers.
UE (i.e., the Radio Network Temporary Identifier (RNTI) is Input: Total number of RBs to schedule and active RNTI
a temporal identifier for a UE) selected. The empty queues, as (total_rbs and active_rntis).
well as the queues that will get drained in the next TTI (i.e., 1: assign_quantum_slices();
the already assigned RBs in lines 4 – 8 will empty them), are 2: idx_q = generate_rbs(total_rbs, active_rntis);
excluded from the sorting. If there are still unallocated RBs 3: out_arr = seg_avoid(idx_q, 0, total_rbs, out_arr); //
and the slice quantum is not indebted above the 75% of the Alg. 5
quantum maximum limit, the RBs are distributed considering 4: for all pkt, rnti, slice ∈ out_arr do
5G specificities (i.e., using RBG). Such mechanism objective 5: assign_rbs(rnti, pkt);
is twofold. On the one hand, it avoids squandering RBs. 6: reduce_slice_quantum(slice, pkt);
For example, in a two slice scenario, if the buffers of the 7: total_rbs− = pkt;
first slice are empty while the second slice buffers contain 8: end for
packets, the second slice can use the RBs from the first slice 9: sort_slices();
to minimize the unused RBs, and thus, augment the total 10: rnti = next_rnti()
throughput. On the other hand, it limits the RBs assigned to 11: while total_rbs > 0 ∧ slice_quantum(rnti) >
a slice that does not achieve forwarding a full packet. In this −0.75 × limit_quantum do
way, slices maintain a quantum buffer of around 25% of the 12: assign_rbs(rnti, min_rbg);
total quantum limit, in case that these RBs are needed during 13: reduce_slice_quantum(slice, min_rbg);
the next TTIs. This is important in an online scenario, such 14: total_rbs− = min_rbg;
as 5G, where the traffic patterns cannot be foreseen, and a 15: sort_slices();
slice may use the quantum in a more rewarding manner (i.e., 16: rnti = next_rnti()
being able to forward a larger amount of packets). Lastly, 17: end while
the remaining RBs are assigned to the slices with the highest 18: if total_rbs > 0 then
positive quantum (i.e., line 19), even if the buffers of these 19: map_last_RB_slices()
slices are empty. Such a decision sacrifices some throughput 20: end if
in favor of fairness, since EQP interprets the lack of more
packets in a slice as a desired symptom (e.g., the application
may not have more information to transmit) and abides by
the SLA. packets is selected at line 28 (i.e., every packet belongs to
Algorithm 5 presents the segmentation avoidance algo- a slice with a quantum associated, so the larger sum of the
rithm. It consists in a recursive algorithm where a memoiza- quantum indicates the permutation of packets that reside at
tion5 (i.e., at lines 7, 8 and 31) in the packet position (i.e., slices that lent more RBs). Such measurement balances the
idx_q and idx_pkt), and the capacity is implemented with quantum values of the slices and achieves higher fairness in
the goal of obtaining a O(N C) algorithmic complexity rather the case where the same amount of packets can be pulled.
than O(2N ). Through memoization, repetitions of already The algorithm lastly saves the obtained result (i.e., out_arr)
calculated permutations in the packet index, queue index and according to the idx_q, idx_pkt and capacity at line 31
capacity are avoided. before returning.
Due to the recursive nature of Algorithm 5, we will start As input, Algorithm 5 receives the queue index (i.e.,
presenting it from the middle rather than from the beginning. idx_q), the packet index (i.e., idx_pkt), the remaining ca-
The core of the segmentation avoidance algorithm resides pacity (i.e., capacity) and a copy of the ongoing selected
at lines 23 and 24. The algorithm either selects the cur- array permutation (i.e., out_arr). The termination conditions
rent packet (i.e., adding the current packet act_pkt in the of the recursive Algorithm 5 are coded in the first lines
possible permutation) and reduces the remaining capacity (i.e., lines 1 – 6). At line 1, the capacity is checked, and
accordingly, or jumps to the next queue maintaining the if zero (i.e., no more RBs available), the current packets
current capacity, generating all the valid permutations along permutation (i.e., out_arr) is returned. At line 4 the end of
the way. Once the stack unwinds, two possible permutations the queues (i.e., no more queues to consider) are checked,
are available (i.e., the permutation of selecting the current and if the end is reached, returned. At line 7, whether the
packet (i.e., take) and reduce the capacity (line 23) or ignore packet permutation has already been resolved is checked,
it and jump to the next queue maintaining the available capac- thus avoiding unnecessary work and reducing the algorithm
ity (i.e., dont_take) (line 24)). At line 26, the permutation complexity. At line 10, the packet size in RBs according to its
with the largest amount of packets is selected. If lengths queue index (i.e., idx_q) and packet index (i.e., idx_pkt) is
are equal, the solution with the largest sum of the quantum acquired (i.e., p_size). At line 11, the remaining capacity is
obtained (i.e., t_capacity) if the current packet is selected in
5 Memoization is not a typo from the word memorization but rather an the permutation (i.e., packet at queue index idx_q and packet
optimization technique. index idx_pkt), with 5G’s RBG specificity adjusted. If after
VOLUME 4, 2016 11
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

considering the 5G RBG distribution, the capacity drops


below zero, this combination is discarded and the algorithm
recursively calls itself at line 15, after updating the queue and
packet index. If the current packet is the last one in the queue,
Algorithm 5: EQP seg_avoid. the queue and the packed indexes are updated (i.e., idx_q and
It selects a permutation of packets according to their idx_pkt at lines 19 and 20).
size and the slice quantum that does not cause seg- In summary, Algorithm 5 generates a packet permutation
mentation. where the number of packets to forward without generating
Input: Queue index, packet index, capacity and segmentation is maximized (i.e., line 26). In case that two
permutation of selected packets (idx_q, idx_pkt, permutations would forward the same number of packets,
capacity and out_arr) Algorithm 5 selects the permutation with the largest quantum
Output: Permutation of selected packets (out_arr) (i.e., line 28), and thus, it reduces the unfairness between
1: if capacity == 0 then slices, as the slices with larger quantum lent more RBs in the
2: return out_arr; past. Lastly, Algorithm 5 is aware of the RBG distribution
3: end if and adjusts the remaining capacity accordingly through the
4: if idx_q > max_idx_q then function adjust_rbgs.
5: return out_arr;
6: end if V. EVALUATION FRAMEWORK
7: if is_memoized(idx_q, idx_pkt, capacity) then In this section a complete overview of the evaluation condi-
8: return memoized(idx_q, idx_pkt, capacity); tions, the testbed configuration and the measurement method-
9: end if ologies are exposed.
10: p_size = pkt_size(idx_q, idx_pkt) ;
11: t_capacity = adjust_rbgs(capacity, out_arr, p_size) A. SOFTWARE
; To evaluate and validate the proposed e5G-BDP and EQP
12: if t_capacity < 0 then algorithms, a real cellular network testbed using OpenAir-
13: idx_q = next_idx_q(idx_q) ; Interface (OAI) [45] was built. For this, we enhanced the
14: idx_pkt = 0 ; OAI implementation with the i) SDAP sublayer with QFI
15: return queues and ii) the scheduler proposed in Section II. Downlink
seg_avoidance(idx_q, idx_pkt, capacity, out_arr) traffic was generated to validate the results, and analogous
16: end if outcomes are expected in uplink procedure due to 5G stack’s
17: idx_pkt+ = 1 ; symmetry. We configured OAI’s channel bandwidth to 5
18: if idx_pkt == max_idx_pkt(idx_q) then MHz (i.e., 25 RBs). The TTI in OAI is 1 ms. The default
19: idx_q = next_idx_q(idx_q) ; bearer was mapped to a RLC-UM bearer, as it was OAI’s
20: idx_pkt = 0 ; default option. The SDAP scheduler iteratively asks e5G-
21: end if BDP limit_reached function (i.e., Algorithm 1) whether it
22: act_pkt = {idx_q, idx_pkt}; should forward a packet or maintain it. However, OAI is a
23: take = soft real time system where an excessive burden to the CPU
seg_avoid(idx_q, idx_pkt, t_capacity, out_arr + leads to de-synchronize the UE and the eNodeB. Therefore,
act_pkt) ; a 200 µs sleep between the iterative calls was applied, which
24: dont_take = heuristically proved correct for the hardware tested.
seg_avoid(next_idx_q(idx_q), 0, capacity, out_arr);
25: out_arr = dont_take ; B. HARDWARE
26: if len(take) > len(dont_take) then As seen in Fig. 5, a B-200 Ettus USRP connected to a
27: out_arr = take ; computer with an Intel(R) Core(TM) i9-9980HK CPU @
28: else if 2.40 GHz processor with the Linux 5.3.0-51 low-latency
len(take) == len(dont_take) ∧ quantum(take) > kernel was deployed at the eNodeB side. For the first UE (i.e.,
quantum(dont_take) then UE0), a Commercial off-the-shelf (COTS) Huawei E3372
29: out_arr = take ; LTE USB stick connected to a Raspberry Pi 4 Model B
30: end if was used. For the second UE (i.e., UE1), another Huawei
31: memoization(idx_q, idx_pkt, capacity, out_arr) ; E3372 LTE USB stick connected to a computer with an AMD
32: return out_arr ; FX(TM) 8120 CPU @ 1.40 GHz processor with the Linux
4.4.0-141-generic kernel was used.

C. RADIO LINK CHANNEL CAPACITY


The radio link plays a central role establishing the channel
capacity. To realistically emulate the channel capacity, and
12 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

a rapid low-latency packet delivery from within the cellular


network due to their cross-layer communication capability
were evaluated: DynRLC, DRQL and e5G-BDP. DynRLC
measures the sojourn time suffered by the SDUs at the RLC,
and consequently adjusts the maximum number of allowed
SDUs in the RLC buffer. The sojourn time is captured for
every packet that egresses the RLC buffer, and depending
on whether the sojourn packet time is bigger or smaller than
a target delay, the number of SDUs permitted at the RLC
is increased or reduced accordingly. An α parameter marks
the increase or reduction rate. The new maximum number of
allowed SDUs is updated every interval of time [22], which
we set to 10 ms (i.e., a radio frame duration in LTE), in
FIGURE 5. Evaluation testbed. contrast to the 200 ms mentioned in [22]. We set the target
delay to 3 ms instead of the minimum of 30 ms reported by
[22], as a smaller value was heuristically found not to achieve
thus, validate our proposals as reliably as possible, we con- full bandwidth, and a larger value just adds unnecessary
verted the CQI data gathered from two major Irish opera- sojourn time. We set the maximum number of packets to 10,
tors [54] into MCS sequences according to OAI’s CQI to the minimum to 1 and the α value to 0.05. DynRLC lacks
MCS conversion function. In [54], LTE UE statistics for five any pacing capability as elapsed time is not considered while
different mobility patterns (i.e., static, pedestrian, car, tram forwarding the packets. DRQL [24], on the contrary, is based
and train), at a granularity of one sample per second, are on the number of bytes that remain at the RLC buffer after
provided. We selected two mobility patterns (i.e., pedestrian a TTI, and no parameters need to be adjusted as DRQL tries
and train), thus, exposing our solutions to diverse, and yet to maintain the equivalent of one MTU in the RLC buffer
realistic patterns. Every second, we set the MCS according after the TTI. Packets are also forwarded in a bursty manner,
to the data provided by the mobility patterns file, instead of as DRQL also lacks a pacing mechanism. Lastly, e5G-BDP
the MCS that would had been assigned in accordance with measures the bandwidth directly through the amount of bytes
the CQI provided by the UE (i.e., in our testbed conditions that were pulled from the RLC buffer in the last TTI. It also
the UE always reports the maximum possible CQI), and owns a pacing mechanism to augment the probabilities of
therefore, we realistically emulated the dynamic radio link delivering a packet within a TTI, and is a knob-less solution
channel capacity. as no parameters need to be configured. We did not use any
slicing to evaluate the bufferbloat as it would not report any
D. BUFFERBLOAT SOLUTIONS CONFIGURATION advantage over the simple scenario with 1 UE.

TABLE 1. Bufferbloat algorithms tested. E. EQP CONFIGURATION


To test the EQP we created two slices through FlexRAN
Van. CoDel BBR e5G-BDP Dyn. DRQL
[55], each one containing a single UE. We evaluated differ-
X-layer No No No Yes Yes Yes
Pacing No No Yes Yes No No ent SLAs for slices with 75%, 50% and 25% of the total
Knob-less Yes Yes Yes Yes No Yes floor number of RBs (e.g., for 25 RBs this is translated to
Drop pkts No Yes No No No No b(25 ∗ 0.75)c = 18, b(25 ∗ 0.50)c = 12 and b(25 ∗ 0.25)c =
6 RBs per TTI.). In this manner, we evaluated EQP for
We implemented the six solutions shown in Table 1 to slices with large, as well as small amounts of RBs, emulat-
compare and validate our proposed e5G-BDP. The Vanilla ing different, yet realistic, slicing scenarios. OAI’s default
case represents the default OAI implementation. It does not slice SLA was preserved, where the slices are generated
have any cross-layer communication mechanism or pacer, according to a percentage of the total data RBs excluding
is knob-less and does not drop packets. For the evaluation the retransmissions (i.e., due to HARQ/NACK mechanism),
of CoDel, we substituted the default RLC FIFO buffer with as well as the RBs assigned to transmit control informa-
a CoDel queue according to the values described in [11] tion. We compared our proposed EQP solution against the
(i.e., interval value set to 100 ms and target value set to default resource scheduling in OAI for slicing. We named
5 ms). CoDel, contrarily to the Vanilla implementation, such distribution FP as it corresponds to a fixed partition,
drops packets to inform the sender that excessive buffering which is independent of the size of the packets. Since the
is happening. For the BBR [27] evaluation, we used the bufferbloat effect introduces a delay several orders of mag-
default TCP BBR version from the Linux kernel 5.3.0-62- nitude larger than the segmentation/reassembly procedure,
lowlatency. As explained in Section III, BBR calculates we evaluated EQP in conjunction with e5G-BDP due to its
the BDP and transmits the packets accordingly through a superior results when compared with its direct competitors,
pacing mechanism. Lastly, the three algorithms that permit as proved by this paper. We also tested the effect of the
VOLUME 4, 2016 13
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

quantum limit through five different values (i.e., 25, 50, 100, (RTT) was reported through the irtt tool, which generated the
250 and 500 RBs) in a 25% slice with 8 VoIP flows, as it VoIP flows. The achieved bandwidth was measured through
represents our most challenging scenario for EQP. We set the iperf 3 tool (i.e., counting the TCP packets that arrived
the exponential function of the function generate_rbs to to the UE), as well as counting the unused transmission
5 × (quantum borrowed/quantum limit)5 to discourage opportunities that happened every TTI at the eNodeB (i.e.,
considering packets from indebted slices for scheduling, and unused RBs). In this manner, the throughput was verified
a quantum value limit of 100 RBs was set as the default value from two different perspectives. A UDP packet was sent from
as at demonstrated by this paper, it avoids a possible EQP the UE before the beginning of the emulated VoIP phone call
saturation for the scenarios evaluated. to start reading the MCS profile, when different scenarios
(i.e., train and pedestrian) were involved.
F. TRAFFIC GENERATION
To model realistic low-latency flows, the Isochronous Round- VI. EVALUATION RESULTS
Trip Tester (irtt) [56] was used, where a G.711 VoIP conver- In this section, we first evaluate the algorithms that address
sation is emulated through UDP data frames of 172 bytes the bufferbloat (i.e., Vanilla, CoDel, BBR, e5G-BDP, Dyn-
with an interval of 20 ms [31], resulting in a bandwidth RLC and DRQL). To verify their suitability they are tested
consumption of 64 Kbps. Conversely, we used the iperf3 tool in a static scenario (i.e., 28 MCS, one VoIP flow and one
in reverse mode to generate a bulky TCP flow that models a bulky flow) and a dynamic scenario (i.e, using the train and
service that demands bandwidth (e.g., a mobile application pedestrian reported MCS, one VoIP flow and one bulky flow).
update), with a maximum segment size (i.e., MTU minus Their scalability is tested with 1, 2, 4 and 8 VoIP flows in
40 IP header bytes) of 1460 bytes. We chose TCP as the parallel, emulating a group phone call conference scenario
competing flow as most of the Internet traffic is forwarded along a bulky traffic flow that tries to monopolize the access
through HTTP/2, which relies on TCP [57] as its transport to the RBs.
layer protocol. In this manner, we modelled two different Following, we show the effect of RLC’s segmenta-
flows (i.e., one bulky and one with low-latency requirements) tion/reassembly procedure and evaluate EQP. To this end,
that try to access limited resources and share the RLC buffer. we first show the results in a slicing scenario with different
Both flows were always segregated at the SDAP sublayer RB percentage shares (i.e., 25%-75% and 50%-50%) and a
according to their 5-tuple, and the UDP flow’s packets were VoIP and a bulky flow with a fixed 28 MCS. Next, the results
placed in a higher priority queue, imitating the behavior obtained from a scalability test in a 25% RB slice with 1, 2, 4,
of two flows with distinct QFI. TCP Cubic [8] was used and 8 VoIP flows in parallel along a bulky flow are displayed.
for all the bandwidth driven flows, except when BBR [27] Following the effect of the quantum limit in a 25% slice with
is explicitly mentioned, as it is the most deployed TCP 8 VoIP flows in parallel is evaluated. We finish this section
congestion control algorithm. To achieve TCP’s steady state, showing the effect of EQP on two UEs with 50% of the total
avoid its slow start, and generate the bufferbloat, the bulky RBs each, that belong to two different mobility patterns MCS
traffic started 5 seconds before the VoIP flow. Therefore, (i.e., pedestrian and train).
the VoIP flow avoided possible outlier results generated at
the beginning of the transmission and the repeatability of A. RLC BUFFERBLOAT
the results was improved. Every VoIP lasted for 60 seconds, 1) Static MCS scenario
which represents 60000 TTIs (i.e., 60000 ms x 1 TTI/ms) in 3GPP standard does not include any mechanism capable
which 3000 UDP packets were sent. Every test was repeated of addressing the bufferbloat problem in current cellular
5 times (i.e., 300000 TTIs) to correctly verify the results and networks. Therefore, 5G’s stack is susceptible to experience
minimize the possible outliers’ effect. Scalability was evalu- large delays, as it is equipped with large buffers, the RAN
ated with 1, 2, 4 and 8 VoIP flows in parallel, thus modeling a is commonly the slowest link of the data path, and packets
group phone call conference scenario with background TCP are usually transmitted through a loss-based TCP congestion
traffic. control algorithm (e.g., TCP Cubic). Thus, packets with low-
latency requirements that share the RLC buffer with bulky
G. MEASUREMENT METHODOLOGIES flows suffer from large buffer depletion times. Such an effect
To report the delay, a timestamp was added to all the packets can be clearly seen in Fig. 6, where the correlation between
once they enter the SDAP sublayer in the case of DynRLC, the VoIP packet’s delay and the RLC queue size can be
e5G-BDP and DRQL, or the RLC sublayer in the case of clearly observed in an OAI Vanilla deployment, when one
Vanilla, BBR and CoDel. The elapsed time was measured low-latency flow shares the RLC buffer with a bulky flow.
in the VoIP packets once they were forwarded to the MAC The delay suffered by low-latency flows due to bloated
sublayer, and thus, the HARQ/NACK delay was not mea- buffers generated by bulky flows in Vanilla deployments, can
sured in our setup. By doing so, the delay that occurs in the reach the order of seconds as reported by [9] and shown in
SDAP/RLC sublayers was isolated, and thus, the effective- Fig. 6.
ness of the proposed solutions could be verified minimizing The RLC buffer should contain just enough bytes to satisfy
the effect of other possible phenomena. The Round Trip Time the MAC sublayer demands. Any additional byte just aug-
14 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 6. Delay suffered by a VoIP flow when sharing the RLC buffer with a FIGURE 7. RLC buffer size for Vanilla, CoDel, BBR, e5G-BDP, DynRLC and
bulky flow and the RLC buffer occupancy in a Vanilla OAI deployment. DRQL with MCS 28.

ments the delay [24], as newly arrived data packets suffer the
depletion time of the previously enqueued packets. On the
other hand, if the RLC cannot satisfy MAC sublayer’s de-
mands, a transmission opportunity is squandered. Fulfilling
both objectives is very challenging in the 5G environment
due to the dynamic nature of the radio link channel. In Fig.
7, the RLC buffer occupancy for the six different methods
from Table I can be observed, where a bulky flow shares the
RLC buffer with a VoIP flow during 60 seconds. Note the
change of scale between the different solutions, where CoDel
and BBR reduce the amount of bytes at the RLC queue by at
least 5 times when compared with Vanilla. However, CoDel’s FIGURE 8. Cumulative Distribution Function (CDF) of the VoIP flow’s RTT.
mechanism for discarding packets results in transmission
opportunity losses (i.e., not containing enough bytes at the
RLC buffer when a MAC notification arrives). On the other tunity, while any additional byte just increments the delay. In
hand, BBR periodically drains the bottleneck queue after our evaluation, the bulky flow packets are 1500 bytes long,
10 seconds during 200 ms if the measured RTT does not while the packets from the low-latency flow are 200 bytes
decrease in that period [27]. This leads to a small total long (i.e., 172 bytes of data, 20 bytes of IPv4 header and
bandwidth utilization reduction, as the RLC buffer does not 8 bytes of UDP header). Consequently, precisely occupying
contain enough bytes to satisfy the MAC sublayer’s TBS de- the RLC queue with 2289 bytes may not be possible in many
mands during that 200 ms periods. This effect can be clearly TTIs. Hence the problem between the packet sizes and the
observed in Fig. 7, as the BBR RLC buffer is drained several TBS is explicitly shown, as the TB transmits information in
times during the emulated 60 seconds VoIP call. Moreover, bits while the RLC buffer stores packets. It is not possible to
Vanilla, CoDel and BBR create a large queue that impedes pass bits from the SDAP sublayer to the RLC sublayer, and
a fast packet delivery. Conversely, the algorithms that are thus, to assure full throughput, the RLC buffer should always
deployed directly in the cellular network stack (i.e., e5G- contain more than 2289 bytes, which increases the sojourn
BDP, DynRLC and DRQL), estimate more accurately the time.
data link capacity, forwarding enough bytes to feed the MAC In Fig. 8 the same conclusions from the RTT of the VoIP
requests every TTI, while maintaining the rest segregated at flow’s perspective reported by irtt are explicitly drawn. The
the higher sublayer queues (i.e., SDAP). Thus, a low-latency RTT is composed by the cellular stack delay (i.e., uplink and
flow, where e5G-BDP, DynRLC or DRQL is implemented, downlink procedure), the cellular protocol delay (e.g., the up-
sharing the bottleneck buffer (i.e., RLC buffer in 5G) with link Scheduling Request procedure), the packet routing and
bulky traffic flows considerably avoids large queuing sojourn various buffering delays (e.g., Network Interface Controller
times, as the amount of bytes is meticulously maintained low. (NIC) buffers). However, the importance of segregation in the
As observed in Fig. 7, e5G-BDP is able to maintain the cellular downlink procedure is clearly shown in Fig. 8, where
RLC buffer with fewer bytes than DynRLC or DRQL. Under the packets sent through e5G-BDP, DynRLC and DRQL
a MCS index of 28 and 25 RBs in OAI, the MAC scheduler surpass the alternatives that do not segregate the packets (i.e.,
pulls between 2289 and 1569 bytes, depending on whether Vanilla, CoDel and BBR). e5G-BDP also surpasses DynRLC
the subframe contains control and data information or just and DRQL as observed in Fig. 8. It can be also concluded
data. Hence, the queues shown at Fig. 7 should never contain that in our cellular network testbed a minimum of 20 ms RTT
less than 2289 bytes to avoid wasting any transmission oppor- delay exists.
VOLUME 4, 2016 15
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Fig. 9, depicts the CDF of the queuing delay in the down-


link procedure for the SDAP sublayer, the RLC sublayer and
the sum of both. Packets sent with Vanilla, CoDel and BBR
algorithms face an order of magnitude larger delay than pack-
ets sent with e5G-BDP, DynRLC or DRQL. The time where
Vanilla, CoDel and BBR’s CDF asymptotically converge into
1.0 are located far after the scope of Fig. 9. Indeed, only 12%,
76% and 91% of the total packets at Vanilla, BBR and CoDel,
suffer less than 50 ms of delay. Vanilla, CoDel and BBR lack
an SDAP sublayer, and therefore, no graph is depicted in the
first row of the Fig. 9. Between e5G-BDP, DRQL and Dyn-
RLC, the lack of a pacing mechanism in DRQL and DynRLC
has notorious effects. Every new TTI, the SDAP scheduler
forwards the packets of non-empty queues according to its
priority policy (i.e., in our evaluation scenario it schedules
FIGURE 9. CDF of the queuing delay at RLC, SDAP and the sum of both for a
low-latency packets first). This effect can be clearly seen in 28 MCS.
Fig. 9 at the RLC buffer, where the packets are forwarded
from the SDAP sublayer at the beginning of the TTI and
have to wait at the RLC buffer until the next transmission
opportunity occurs (i.e., most of DynRLC and DRQL packets
at RLC wait between (750, 1000) µs, meaning that they
were forwarded from the SDAP sublayer in the (0, 250) µs
time interval after a TTI). Moreover, if the packets have not
been forwarded at the beginning of the TTI, they have to
wait at the SDAP sublayer until the next TTI opportunity
(i.e., an additional delay of 1000 µs is added at the SDAP
sublayer for DynRLC and DRQL). On the other hand, the
pacing capabilities of e5G-BDP permit that a newly arrived
packet at the SDAP is forwarded to the RLC immediately,
avoiding having to wait for the next TTI. Such effect is also FIGURE 10. Average throughput reported by iperf 3 with MCS 28.
appreciable at the RLC queue, where the e5G-BDP permits
ingressing packets, due to its pacing capabilities, at any time
within a TTI, if the limit has not been reached, considerably therefore, achieves the highest possible throughput. BBR, as
reducing the delays at the RLC buffer (i.e., the packets stay shown in Fig. 7, drains the buffer to get a new measurement
mostly at the RLC sublayer between (500, 1000)) µs). This of the bottleneck link path every 10 seconds. This effect
pacing capability allows e5G-BDP to avoid waiting for a contributes to wasting 3.1% of the total RBs during the 60
new TTI, and therefore, outperforms DRQL and DynRLC seconds of VoIP conversation. The mechanism of discarding
as shown in Fig. 9, delivering more than 95% of the packets packets utilized by CoDel results in a 20.92% of transmission
within a TTI, while only around 50% and 30% of the packets opportunity losses. DynRLC and DRQL endure a 10.12%
using DRQL and DynRLC are delivered during the first 1000 and a 1.3% unused RBs. Lastly, e5G-BDP does not transmit
µs. data information in 1.9% of the total available RBs. These
Achieving low-latency for the new cellular network is results match the outcomes provided by iperf 3 as shown in
of vital importance. However, low-latency can be trivially Fig. 10. From this scenario, e5G-BDP clearly outperforms
achieved if some throughput is sacrificed, simply maintaining its competitors. While some RBs are squandered (i.e., 1.9%),
the RLC buffer with less bytes than the MAC requested more than 95% of the low-latency packets are forwarded
TBS. Therefore, the real challenge is to achieve low-latency within a TTI (i.e., 1000 µs) as shown in Fig. 9, while DRQL
while not starving the RLC buffer, and thus, utilizing all the needs two TTIs and DynRLC needs three TTIs to forward
available throughput. To this end, the throughput is quantified more than 95% of the packets. A summary of the results can
from two perspectives. First we measured the results from be observed in Table 2.
iperf 3 (i.e., the bulky flow) that rely on Layer 4 measure-
ments. The results can be observed in Fig. 10, where Vanilla, 2) Scalability evaluations
CoDel, BBR, e5G-BDP, DynRLC and DRQL reported 16.58, In contemporary cellular networks, different flows with het-
12.95, 16.05, 16.20, 14.35 and 16.30 MBits/sec. As a second erogeneous QoS requirements share the RLC buffer. Thus,
approach, we compared the number of unused RBs during the presented solutions must scale well when the number of
the 60 second VoIP conversation. The Vanilla case transports flows increases. To this end, we tested the effect of 1, 2, 4
data information in all the RBs during the 60 seconds and, and 8 VoIP flows in parallel along a bulky flow, emulating
16 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 2. Performance comparison of static MCS scenario: Percentage of


packets with < 1 ms queuing delay, the throughput of iperf 3 flow in
MBits/sec, and the percentage of used RBs.

Van. CoDel BBR e5G-BDP Dyn. DRQL


Delay 0% 9% 4% 95% 30% 50%
Throu. 16.58 12.95 16.05 16.20 14.35 16.30
RBs 100% 79.08% 96.9% 98.1% 89.88% 98.7%

FIGURE 13. RLC buffer size for Vanilla, CoDel, BBR, e5G-BDP, DynRLC and
DRQL for the train dataset.

3) Dynamic MCS scenario


We obtained the previously shown results in a controlled
static environment with the maximum MCS index (i.e., 28)
and nearly no interference. However, in real cellular network
deployments the radio link channel capacity abruptly changes
[54]. Therefore, we emulated a scenario with dynamic MCS,
FIGURE 11. Delay faced by VoIP packets with 1,2,4 and 8 flows with 28 MCS.
setting its value according to real LTE traces [54]. We chose
a pedestrian scenario, where the MCS differs more slowly,
and a train scenario, where the MCS changes more abruptly.
a group phone call conference scenario. Note that DynRLC In Fig. 12, the MCS values from these two traces and the
relies on the number of SDUs in the RLC and the delay they number of bytes requested by the MAC sublayer during the
suffer to adjust the amount of SDUs at the buffer. However, 60 seconds VoIP conversation can be observed. As seen in
the MAC sublayer requests bytes from the RLC. This creates Fig. 12, they are highly correlated, and therefore, a MCS
a weakness on DynRLC when packets of different sizes are reduction directly affects the available bandwidth during the
mixed (i.e., DynRLC does not differentiate between a 1500 following TTI (i.e., bytes requested/TTI). The sudden MCS
or 200 bytes SDU), thus wasting transmission opportunities variation at the 35th second in the train scenario presents a
(i.e., not containing enough bytes at the RLC buffer when challenge for the algorithms, as they have to rapidly estimate
the MAC requests them) and creating additional delay (i.e., the data link bandwidth, so that the bufferbloat is prevented,
when low-latency packets bursts occurs, they may not all be and thus, the delays associated with it. Under such conditions
forwarded to the RLC if they exceed the number of optimal we can observe how the proposed algorithms behave in
SDUs). Additionally, as observed in Fig. 11, it does not Fig. 13. A reduction at the 35th second in the amount of
scale well in comparison with e5G-BDP and DRQL that are bytes at the RLC buffer can be observed for CoDel, BBR,
methods that rely on the amount of bytes rather than the e5G-BDP and DRQL, while no specific change at Vanilla
number of SDUs. and DynRLC is shown. The dynamic MCS scenario shows
DynRLC limitations, as its buffer capacity calculation is
performed according to the number of SDUs. Therefore,
the amount of lost transmission opportunities is increased.
However, e5G-BDP still delivers the low-latency packets
faster than DynRLC as it can be observed from Fig. 14, due
to the lower sojourn times experienced by the low-latency
flows at the SDAP sublayer. No algorithm is capable of fully
utilizing the bandwidth and, at the same time, forward the
low-latency packets within a TTI. Such outcome results from
the fact that no preemption is allowed from the RLC buffer in
our model. Therefore, if a packet of 1500 bytes has already
been forwarded from the SDAP to the RLC and just 500 bytes
are forwarded from the RLC to the MAC every TTI, a newly
arrived low-latency packet will suffer at least the depletion
time from the previous packet (i.e., at least 3 TTIs or 3000
FIGURE 12. Bytes requested by the MAC sublayer and MCS for the train and µs).
pedestrian datasets. The average throughput for the train scenario reported by
VOLUME 4, 2016 17
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

time from the low-latency flows vs. the normalized amount of


used RBs for the train and pedestrian scenarios is depicted.
A RB is composed of several bytes and therefore, a small
mismatch between the utilized RBs and the real bandwidth
may exist. In Fig. 16 the solutions that nearly maintain full
bandwidth while preventing the generation of the bufferbloat,
and hence large delays, at the RLC buffer are e5G-BDP
and DRQL. DynRLC suffers from the previously mentioned
effect of calculating the delay according to the number of
packets at the RLC, which maintains the queue depleted,
albeit squandering transmission opportunities, as also seen in
Fig. 15. Table 3 shows numerically the average delay values
FIGURE 14. CDF of the queuing delay at RLC, SDAP and the sum of both for
the train dataset. presented in Fig. 16. The solutions based on the 5G stack
(i.e., e5G-BDP, DRQL and DynRLC) notably surpass the
solutions that are based on other mechanisms (i.e., CoDel and
BBR), while the default scenario (i.e., Vanilla) manifestly
shows the present bufferbloat problem in cellular networks.
Table 4 depicts the 95% CDF value for average delay as
shown in Fig. 14, while Table 5 displays the percentage
of used RBs in the train and pedestrian scenarios. It can
be clearly observed that in both scenarios, e5G-BDP out-
performs in average every competitor in average sojourn
time and the 95% delay packet, while squandering less than
1% of the available resources. DynRLC and DRQL show
comparable results as DynRLC sacrifices bandwidth, while
DRQL suffers larger delays. CoDel and BBR show their
FIGURE 15. Average throughput reported by iperf3 for train dataset. limitations, while Vanilla clearly exposes the contemporary
bufferbloat problem. From the results provided, we conclude
that e5G-BDP is the best solution tested in this paper for
iperf3 can be observed in Fig. 15, with 6.20, 5.20, 6.20, 5.98, addressing the bufferbloat.
4.30 and 6.0 MBits/sec for Vanilla, CoDel, BBR, e5G-BDP,
DynRLC and DRQL. TABLE 3. Average Delay in ms at the eNodeB for the train and pedestrian
One of the most important features when analyzing the scenarios.
bufferbloat problem is the bandwidth vs. delay dichotomy.
Van. CoDel BBR e5G-BDP Dyn. DRQL
It is relatively easy to obtain the lowest possible latency
Train. 2366.08 24.27 84.06 1.93 2.67 4.42
if some bandwidth is squandered and, analogously, it is Ped. 2841.73 31.58 85.60 2.19 2.92 5.59
relatively easy to obtain full bandwidth if large queues (i.e.,
bufferbloat) can be formed. However, achieving both is very
challenging as not only the dynamic cellular network radio
link channel effects have to be taken into account, but also TABLE 4. 95% Delay in ms at the eNodeB for the train and pedestrian
scenarios.
TCP’s behavior has to be considered. In Fig. 16, the sojourn
Van. CoDel BBR e5G-BDP Dyn. DRQL
Train. 5325.93 52.93 186.94 3.93 5.56 8.94
Ped. 5261.16 138.94 218.93 3.90 5.57 8.77

TABLE 5. Percentage of used RBs for the train and pedestrian scenarios.

Van. CoDel BBR e5G-BDP Dyn. DRQL


Train 100% 85.0% 98.1% 99.1% 75.6% 100%
Ped. 100% 93.0% 97.6% 99.3% 80.9% 100%

FIGURE 16. Normalized Utilized RBs vs Average Delay for the train and B. RLC SUBLAYER PACKET
pedestrian datasets. SEGMENTATION/REASSEMBLY

18 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 17. CDF of the queuing delay at SDAP, RLC and the sum of both, for FP and EQP in two slicing scenarios: a) 75%-25% and b) 50%-50% resource
distribution with 28 MCS.

flow.
As it can be observed in Fig. 17, the EQP significantly
reduces the delay suffered by low-latency flows when com-
pared to the FP resource scheduling. However, the benefit is
more evident in the slices with scarce RB share. In a 25%
RB slice, approximately 78% of the packets are delivered
during the first 1000 µs, in contrast with the approximately
15% when FP is used. The improvement is reduced, albeit
is still significant, when the RBs are equally shared, with an
approximate 64% vs. 14% of packets being forwarded within
the first 1000 µs. Moreover, the results are non-negligible
FIGURE 18. Average throughput reported by iperf3 in two slicing scenarios
for the slice with 75% of the RBs, where a 88% vs. a 73%
(i.e., 75%-25% and 50% - 50% resource distribution) for both UEs with 28 is observed. These results confirm the validity of the model
MCS for FP and EQP. exposed in Section IV-B. A fixed RB partition does not
maximize the number of packets forwarded (i.e., objective
function 1), and thus, the delay is larger in comparison with
1) Static MCS scenario the EQP scheduling.
As analysed in Section IV-B, RLC sublayer’s capacity to EQP’s objective is to foster the forwarding of full packets
segment the packets generates an unnecessary, yet avoidable, to avoid the sojourn time suffered at the UE’s RLC when
delay. In this subsection, we evaluated the EQP algorithm segmentation occurs. This effect can be clearly seen in Fig.
against the Fixed Partition or FP RB distribution algorithm 17 where the use of EQP reduces the latency of the VoIP
(i.e., the RBs are scheduled in a fixed manner) in a slicing packets for both scenarios. However, EQP squanders bytes
scenario with the e5G-BDP algorithm as our base bufferbloat when the last packet of the queue is transmitted. For example,
avoidance solution. Similar comparative results are expected 18 RBs can transport up to 1692 bytes with a 28 MCS
for DRQL or DynRLC. We created two different slicing [20], so if the queue contains 1650 bytes, 42 padding bytes
scenarios, one where both slices are assigned the 50% of the are added into the last RB, and therefore, 42 bytes do not
available resources, and a second one where the resources transport information, reducing the throughput. This effect
are shared with a 75%-25% ratio. For both scenarios we is more pronounced due to the RB distribution based in
generated a low-latency traffic flow along with a bulky traffic RBGs [35] (e.g., for a 5 MHz bandwidth cell, the minimum
VOLUME 4, 2016 19
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

RBG size is 2). Such behavior supposes a challenge for


e5G-BDP, as enough packets should be forwarded to the
RLC to minimize the buffer starvation, without bloating it.
However, the padding effect, as seen in Fig. 18, for 5 MHz
bandwidth base stations, does not substantially reduce the
throughput, when compared with FP. A 15.80 vs. 15.60,
15.45 vs. 15.35 and 16.20 vs. 15.40 MBits/sec for the 25%-
75%, 50%-50% and 75%-25% resource distribution for FP
and EQP is reported. OAI’s default resource distribution (i.e.,
FP) discards the last RB (i.e., b0.75 ∗ 25RBsc = 18RBs
and b0.25 ∗ 25RBsc = 6RBs for the total of 25 RBs),
and therefore, the average throughput of Fig. 18 is slightly
FIGURE 19. Total CDF queuing delay for VoIP packets with 1,2,4 and 8 flows
smaller than the average throughput of e5G-BDP in Fig. 10. with 28 MCS in a 25% slice for FP and EQP.
The possibility of acquiring more resources than the ones
assigned in a strict RB distribution (i.e., FP), emerges here
again. As explained in Section IV-B and more concisely similar results (i.e., 96%, 95%, 95% and 88% of the packets)
in [51], momentarily using more resources than the ones the FP needs 4000 µs.
agreed in a SLA, reduces the latency in realistic use case
scenarios, while achieving competitive throughputs as seen 3) Effect of quantum limit
in the summary of results in Table 6. As explained in Section IV-B, the quantum limit plays a
central role in EQP, as it indicates the amount of RBs that
TABLE 6. Performance comparison of static MCS scenario: Percentage of a slice can lend or borrow, thus denoting the maximum RB
packets with < 1 ms queuing delay, and the throughput of iperf 3 flow. deviation between the SLA and the slice. To test the effect
of the quantum limit within the EQM, we generated two
Delay Throughput slices with a 25%-75% RB distribution. A bulky flow along
25% EQP 78% 3.50 with 8 VoIP flows traverse the 25% slice, while only a bulky
25% FP 15% 4.10
Ratio (EQP/FP) 5.20 0.85 flow is instantiated at the 75% slice. As observed in Fig.
50% EQP 64% 7.45 20, augmenting the quantum limit from 25 to 100 improves
50% FP 14% 7.65 the latency of the VoIP packets. However, after a value of
Ratio (EQP/FP) 4.57 0.97
100, augmenting the quantum limit value does not provide
75% EQP 88% 11.65
75% FP 73% 11,75 any latency reduction. This fact shows that there exists a
Ratio (EQP/FP) 1.14 0.99 boundary to the achievable latency reduction for the current
packet sequence tested and thus, augmenting the quantum
limit value beyond it does not report any benefit. However,
even though we tested the quantum value in our most strin-
2) Scalability evaluations gent scenario, 5G is an heterogeneous network with myriads
As previously mentioned, the cellular network is exposed of different traffic patterns. Therefore, if the quantum value
to services with heterogeneous QoS requirements, that can is to be optimized, the traffic patterns that traverse the slices
contain several flows each. To this end, we evaluated 1, 2, must be known beforehand. Although it consist in a very
4, and 8 VoIP flows in parallel, to emulate a multi user VoIP interesting problem for contemporary research topics (e.g.,
call, in a 25%-75% slice resource sharing scenario, where the machine learning), predicting the future traffic flows lies out
25% slice contains the low-latency flows along a bulky traffic of the target of this paper, and therefore is not analyzed.
flow, while the 75% slice, only transports one bulky traffic In Fig. 21 the boxplot for the quantum distribution can
flow. As it can be observed in Fig. 19, EQP considerably be observed. The zero value indicates the equilibrium (i.e.,
surpasses the FP solution, reducing the sojourn time of the no lent or borrowed RB), while a negative value shows that
low-latency packets. However, a saturation effect is observed the slice is indebted, and a positive value manifests that
when the number of VoIP flows increases. When 8 flows are less than the SLA RBs have been used by that slice. The
generated a packet every 2.5 ms is in average created (i.e., a 25% slice tends to borrow resources to forward the VoIP
packet is created every 20 ms per flow to emulate a VoIP flow packets, and thus, to maximize the number of forwarded
[31] / 8 flows = 2.5 ms/packet). When the interarrival time of packets, while the 75% slice tends to lend resources as only
low-latency packets approaches to one TTI (i.e., 1 ms in our bulky packets traverse it. Moreover, EQP tends to be a fair
testbed), the quantum mechanism’s advantage is attenuated RB scheduling (i.e., the median slightly deviates from the
as EQP’s core idea is to momentarily borrow some resources equilibrium point) even if large amounts of quantum RBs are
and give them back during the next TTIs. The percentage of available. This is an important property in an environment
packets that are submitted within 2000 µs with EQP are 98%, where the future traffic demands are unknown, as it lets a
98%, 96% and 94% for 1, 2, 4, and 8 VoIP flows while for margin for borrowing RBs in the future if needed, while
20 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 20. Total CDF queuing delay for VoIP packets with 8 flows in a 25% FIGURE 22. Total CDF queuing delay for VoIP packets in two 50%-50% slices
slice with 28 MCS for FP and EQP with 25, 50, 100, 250 and 500 RBs for the train and pedestrian datasets for FP and EQP.
quantum limit.

is more explicit. Within the first 4000 µs, EQP forwards


approximately 66% and 76% of the total low-latency packets
for the pedestrian and train slices, in contrast with the 44%
and 77% with OAI’s default approach (i.e., FP). This is an
expected result as the throughput is reduced when compared
to the static MCS scenario (i.e., a 28 MCS index against
the MCS reported in Fig. 12). Here again EQP significantly
reduces the latency of the slice with lower throughput (i.e.,
pedestrian), while achieves similar results for the slice with
higher throughput (i.e., train).

VII. CONCLUSION
FIGURE 21. Quantum boxplot in a 25%-75% slicing scenario with 8 VoIP
flows and a bulky flow in the 25% slice and a bulky flow in the 75% slice.
In this paper we have extensively evaluated RLC buffer
delay avoidance solutions in a testbed. The bufferbloat phe-
nomenon in the hierarchical multi-queue 5G QoS architec-
maintaining the fairness stipulated by the SLA. As expected, ture has been presented. We verified that 3GPP’s Vanilla
the median values of the scenarios with lower quantum values solution does not address the packet accumulation problem
are closer to the equilibrium point in comparison with higher (i.e., the bufferbloat) at RLC’s buffer, for which, we pre-
quantum values. sented a solution (i.e., e5G-BDP) and evaluated it against
different state-of-the-art proposals. We also exposed RLC’s
4) Dynamic MCS scenario segmentation/reassembly procedure contribution to the la-
The previously tested scenarios were based in a static MCS. tency, presented the EQP algorithm for minimizing such
However, the radio link conditions can abruptly change, and effect and evaluated it against a strict spectrum scheduling
thus, directly impact the throughput. Moreover, the MCS algorithm. The most remarkable results from this paper can
changes do not affect all the UEs equally. For example, be summarized as follows:
one pedestrian carrying a UE and one UE inside a train • Contemporary cellular networks bloat their RLC sub-
will be exposed to different MCS profiles. Therefore, in our layer buffers, and therefore, ruin 3GPP’s efforts to re-
last scenario, we analyzed two slices with 50% share of the duce the latency through stack and protocol improve-
total resources with a bulky flow and a VoIP flow each in a ments.
pedestrian and train scenario, to realistically validate EQP. • Introduction of the SDAP sublayer is insufficient if the
In the first slice, a UE (i.e., UE0) with the pedestrian MCS heterogeneous QoS requirements of different services
scenario is attached, while for the second slice a UE (i.e., has to be fulfilled. 3GPP should consider the insertion of
UE1) with the train MCS scenario is utilized. Fig. 22 shows a queue-based architecture in the SDAP sublayer with
the CDF sojourn time for both slices, where EQP unam- scheduling capabilities to avoid bloating its stack and
biguously surpasses the FP distribution, especially for the achieve its low-latency envisioned goals.
pedestrian slice. This occurs since the average MCS for the • Classic bufferbloat solutions (e.g., CoDel and BBR)
pedestrian slice is smaller than the average MCS for the train do not suffice to meet the most stringent latency re-
slice (i.e., 11.02 vs. 14.48 on average, cf. Fig. 12) which is quirements in 5G. New solutions that directly rely on
strongly connected with the throughput. Therefore and since the cellular network stack and gather information from
EQP especially benefits the scenarios with scarce resources, it must be considered as they outperform the classic
the difference between EQP and FP for the pedestrian slice solutions by an order of magnitude (e.g., e5G-BDP).
VOLUME 4, 2016 21
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

• In a packet based network such as the cellular net- [18] 3GPP, “Interface between the Control Plane and the User Plane nodes,”
work, packet sizes must be considered by the resource 3rd Generation Partnership Project (3GPP), Technical Specification (TS)
29.244, Jun. 2019, version 16.0.0.
scheduling algorithm, if latency is a concern. Solutions [19] ——, “NR, Radio Resource Control (RRC) protocol specification,” 3rd
ignoring such fact only generate a myriad of segmented Generation Partnership Project (3GPP), Technical Specification (TS)
packets at the RLC sublayer increasing the total latency 38.331, Apr. 2019, version 15.5.1.
[20] Open Air Interface Feature Set. [Online]. Available: https://gitlab.
and impeding a rapid packet delivery. We presented, eurecom.fr/oai/openairinterface5g/blob/develop/doc/FEATURE\_SET.md
evaluated and validated an algorithm (i.e., EQP) that re- [21] M. Irazabal, E. Lopez-Aguilera, and I. Demirkol, “Active Queue Manage-
duces such effect, while deviating from the SLA within ment as Quality of Service Enabler for 5G Networks,” in 2019 European
Conference on Networks and Communication (EuCNC), April 2019, pp.
a limit. 1–5.
As future work, we plan to study the wired network delays [22] R. Kumar, A. Francini, S. Panwar, and S. Sharma, “Dynamic control of
RLC buffer size for latency minimization in mobile RAN,” in 2018 IEEE
(e.g., Time Sensitive Networks) in conjunction with the 5G Wireless Communications and Networking Conference (WCNC), April
stack delays. 2018, pp. 1–6.
[23] R. Kumar, A. Francini, S. Panwar, and S. Sharma, “Design of an enhanced
bearer buffer for latency minimization in the mobile ran,” in 2019 IEEE
REFERENCES Global Communications Conference (GLOBECOM), 2019, pp. 1–6.
[24] M. Irazabal, E. Lopez-Aguilera, I. Demirkol, and N. Nikaein, “Dynamic
[1] C. Shannon, “A Mathematical Theory of Communication,” The Bell
Buffer Sizing and Pacing as Enablers of 5G Low-Latency Services,” IEEE
System Technical Journal, Oct. 1948.
Transactions on Mobile Computing, 2020.
[2] S.-Y. Chung, G. Jr, T. Richardson, and R. Urbanke, “On the Design of
[25] F. Baker and G. Fairhurst, “IETF Recommendations Regarding Active
Low-density Parity-check Codes within 0.0045 dB of the Shannon Limit,”
Queue Management,” Internet Requests for Comments, RFC Editor, RFC,
Communications Letters, IEEE, vol. 5, pp. 58 – 60, 03 2001.
Jul. 2015.
[3] 3GPP, “NR, Multiplexing and channel coding,” 3rd Generation Partnership
[26] S. Jung, J. Kim, and J. Kim, “Intelligent Active Queue Management
Project (3GPP), Technical Specification (TS) 38.212, Abr. 2020, version
for Stabilized QoS Guarantees in 5G Mobile Networks,” IEEE Systems
16.1.0.
Journal, pp. 1–10, 2020.
[4] ——, “Study on physical layer enhancements for NR ultra-reliable and
[27] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson,
low latency case (URLLC),” 3rd Generation Partnership Project (3GPP),
“BBR: Congestion-based Congestion Control,” Commun. ACM, vol. 60,
Technical Specification (TS) 38.824, Mar. 2019, version 15.5.1.
no. 2, pp. 58–66, Jan. 2017.
[5] G. Berardinelli, N. Huda Mahmood, R. Abreu, T. Jacobsen, K. Pedersen, [28] P. Goyal, M. Alizadeh, and H. Balakrishnan, “Rethinking congestion
I. Z. Kovács, and P. Mogensen, “Reliability Analysis of Uplink Grant-Free control for cellular networks,” in Proceedings of the 16th ACM Workshop
Transmission Over Shared Resources,” IEEE Access, vol. 6, pp. 23 602– on Hot Topics in Networks, ser. HotNets-XVI. New York, NY,
23 611, 2018. USA: Association for Computing Machinery, 2017, p. 29–35. [Online].
[6] 3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA) and NR; Available: https://doi.org/10.1145/3152434.3152437
Service Data Adaptation Protocol (SDAP) specification,” 3rd Generation [29] P. E. McKenney, “Stochastic Fairness Queueing,” in Proc. of IEEE Int.
Partnership Project (3GPP), Technical Report (TR) 37.324, Sep. 2018, Conf. on Computer Communications, June 1990, pp. 733–740 vol.2.
version 15.1.0.
[30] B. Briscoe, K. D. Schepper, M. B. Braun, and G. White, “Low
[7] ——, “System architecture for the 5G System,” 3rd Generation Partnership Latency, Low Loss, Scalable Throughput (L4S) Internet Service:,”
Project (3GPP), Technical Report (TR) 23.501, Dec. 2018, version 15.4.0. Internet Engineering Task Force, Internet-Draft draft-ietf-tsvwg-l4s-
[8] S. Ha, I. Rhee, and L. Xu, “Cubic: A new tcp-friendly high-speed tcp arch-06, Mar. 2020, work in Progress. [Online]. Available: https:
variant,” SIGOPS Oper. Syst. Rev., vol. 42, no. 5, p. 64–74, Jul. 2008. //tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-06
[Online]. Available: https://doi.org/10.1145/1400097.1400105 [31] Voip bandwidth consume. [Online]. Available: https://www.cisco.com/c/
[9] The bufferbloat project. [Online]. Available: https://www.bufferbloat.net/ en/us/support/docs/voice/voice-quality/7934-bwidth-consume.html
projects/ [32] Google stadia. [Online]. Available: https://stadia.google.com/
[10] T. Høiland-Jørgensen, M. Kazior, D. Täht, P. Hurtig, and A. Brunstrom, [33] Open Air Interface. [Online]. Available: https://www.openairinterface.org/
“Ending the Anomaly: Achieving Low Latency and Airtime Fairness in [34] srslte. [Online]. Available: https://github.com/srsLTE/srsLTE
WiFi,” in 2017 USENIX Annual Technical Conference (USENIX ATC [35] 3GPP, “NR; Physical layer procedures for data,” 3rd Generation Part-
17). Santa Clara, CA: USENIX Association, 2017, pp. 139–151. nership Project (3GPP), Technical Specification (TS) 38.214, Sep. 2019,
[11] K. Nichols, V. Jacobson, A. McGregor, and J. Iyengar, “Controlled De- version 15.7.0.
lay Active Queue Management,” Internet Requests for Comments, RFC [36] C. Chang and N. Nikaein, “Ran runtime slicing system for flexible and
Editor, RFC 8289, Jan. 2018. dynamic service execution environment,” IEEE Access, vol. 6, pp. 34 018–
[12] T. Hoeiland-Joergensen, P. McKenney, D. Taht, J. Gettys, and E. Dumazet, 34 042, 2018.
“The Flow Queue CoDel Packet Scheduler and Active Queue Management [37] H. Halabian, “Distributed resource allocation optimization in 5g virtu-
Algorithm,” Internet Requests for Comments, RFC Editor, RFC 8290, Jan. alized networks,” IEEE Journal on Selected Areas in Communications,
2018. vol. 37, no. 3, pp. 627–642, 2019.
[13] 3GPP, “NR, Radio Link Control (RLC) specification,” 3rd Generation [38] R. Schmidt, C. Chang, and N. Nikaein, “Flexvran: A flexible controller
Partnership Project (3GPP), Technical Specification (TS) 38.322, Jan. for virtualized ran over heterogeneous deployments,” in ICC 2019 - 2019
2019, version 15.4.0. IEEE International Conference on Communications (ICC), 2019, pp. 1–7.
[14] ——, “LTE, Radio Link Control (RLC) protocol specification,” 3rd Gen- [39] C. Hornig, “A Standard for the Transmission of IP Datagrams over
eration Partnership Project (3GPP), Technical Specification (TS) 36.322, Ethernet Networks,” Internet Requests for Comments, RFC Editor, RFC,
Jul. 2020, version 16.0.0. Apr. 1984.
[15] P. Caballero, A. Banchs, G. de Veciana, and X. Costa-Pérez, “Multi-tenant [40] IEEE, “IEEE Standard for Information technology—Telecommunications
radio access network slicing: Statistical multiplexing of spatial loads,” and information exchange between systems Local and metropolitan area
IEEE/ACM Transactions on Networking, vol. 25, no. 5, pp. 3044–3058, networks—Specific requirements - Part 11: Wireless LAN Medium Access
2017. Control (MAC) and Physical Layer (PHY) Specifications,” IEEE Std
[16] F. Fossati, S. Moretti, P. Perny, and S. Secci, “Multi-resource allocation for 802.11-2016 (Revision of IEEE Std 802.11-2012), pp. 1–3534, 2016.
network slicing,” IEEE/ACM Transactions on Networking, vol. 28, no. 3, [41] C. A. Grazia, N. Patriciello, T. Høiland-Jørgensen, M. Klapez, M. Ca-
pp. 1311–1324, 2020. soni, and J. Mangues-Bafalluy, “Adapting TCP Small Queues for IEEE
[17] L. Kleinrock, “Internet Congestion Control Using the Power Metric: Keep 802.11 Networks,” in 2018 IEEE 29th Annual International Symposium
the Pipe Just Full, But No Fuller,” Ad Hoc Networks, pp. 142–157, on Personal, Indoor and Mobile Radio Communications (PIMRC), Sep.
November 2018. 2018, pp. 1–6.

22 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

[42] 3GPP, “NR, Physical channels and modulation,” 3rd Generation Part- ELENA LOPEZ-AGUILERA received the M.Sc.
nership Project (3GPP), Technical Specification (TS) 38.211, Jun. 2019, and the Ph.D. degrees in telecommunications
version 15.6.0. engineering from the Universitat Politècnica de
[43] ——, “Study on new radio access technology: Radio access architecture Catalunya (UPC), in 2001 and 2008, respectively.
and interfaces,” 3rd Generation Partnership Project (3GPP), Technical She is currently an Associate Professor and a
report (TR) 38.801, Abr. 2017, version 14.0.0. member with the Wireless Networks Group, Net-
[44] M. Mondal, B. Roy, C. K. Roy, and K. A. Schneider, “Investigating the works Engineering Dept., UPC. She has published
relationship between evolutionary coupling and software bug-proneness,”
papers in journals and conferences in the area of
in Proceedings of the 29th Annual International Conference on Computer
wireless communications and has been involved
Science and Software Engineering, ser. CASCON ’19. USA: IBM Corp.,
2019, p. 173–182. in projects with public and private funding. Her
[45] N. Nikaein, M. K. Marina, S. Manickam, A. Dawson, R. Knopp, and main research interests include the study of IoT enabling technologies and
C. Bonnet, “Openairinterface: A flexible platform for 5g research,” 5G networks. Her experience comprises QoS, radio resource management,
SIGCOMM Comput. Commun. Rev., vol. 44, no. 5, p. 33–38, Oct. 2014. location mechanisms, and wake-up radio systems.
[46] M. Pinedo, Scheduling: Theory, Algorithms, and Systems, 5th edition.
Springer, 2016.
[47] D. S. Johnson and K. A. Niemi, “On knapsacks, partitions, and a new
dynamic programming technique for trees,” Mathematics of Operations ILKER DEMIRKOL received the B.Sc., M.Sc.,
Research, vol. 8, no. 1, pp. 1–14, 1983. and Ph.D. degrees in Computer Engineering from
[48] Dbs3900 huawei distributed base stations. [On- Bogazici University, Istanbul, Turkey. He is cur-
line]. Available: https://e.huawei.com/en/products/wireless/elte-trunking/ rently an Assoc. Prof. in Dept. of Mining, In-
network-element/dbs3900 dustrial and ICT Engineering with the Universi-
[49] M. Shreedhar and G. Varghese, “Efficient Fair Queueing Using Deficit tat Politecnica de Catalunya, where he works on
Round Robin,” in Proceedings of the Conference on Applications, wireless networks and wake-up radio systems. His
Technologies, Architectures, and Protocols for Computer Communication, research targets communication protocol develop-
ser. SIGCOMM ’95. New York, NY, USA: ACM, 1995, pp. 231–242. ment for the aforementioned networks, along with
[50] A. K. Parekh and R. G. Gallager, “A generalized processor sharing performance evaluation and optimization of such
approach to flow control in integrated services networks: the single-node
systems. He was a recipient of the 2010 Best Mentor Award from the
case,” IEEE/ACM Transactions on Networking, vol. 1, no. 3, pp. 344–357,
Electrical and Computer Engineering Department, University of Rochester,
1993.
[51] M. H. Goldwasser, “A survey of buffer management policies for packet NY.
switches,” SIGACT News, vol. 41, no. 1, p. 100–128, Mar. 2010. [Online].
Available: https://doi.org/10.1145/1753171.1753195
[52] A. Borodin and R. El-Yaniv, Online Computation and Competitive
Analysis. Cambridge University Press, 1998. ROBERT SCHMIDT obtained a diploma with
[53] W. A. Aiello, Y. Mansour, S. Rajagopolan, and A. Rosén, “Competitive distinction in Information Systems Engineering
queue policies for differentiated services,” Journal of Algorithms, vol. 55, from Dresden University of Technology, Germany,
no. 2, pp. 113 – 141, 2005. and a diploma in Engineering from Ecole Centrale
[54] D. Raca, J. J. Quinlan, A. H. Zahran, and C. J. Sreenan, “Beyond Through- Paris/CentraleSupélec, France, in 2017. He cur-
put: A 4G LTE Dataset with Channel and Context Metrics,” in Proceedings rently pursues a Ph.D. in communications within
of the 9th ACM Multimedia Systems Conference, ser. MMSys ’18. New the Communication Systems Department of Eure-
York, NY, USA: ACM, 2018, pp. 460–465. com, France. Robert is involved in collaborative
[55] X. Foukas, N. Nikaein, M. M. Kassem, M. K. Marina, and K. Kontovasilis, research projects in the context of the EU H2020
“Flexran: A flexible and programmable platform for software-defined framework programme and an active contributor
radio access networks,” in Proceedings of the 12th International on to the OpenAirInterface and Mosaic5G projects. His main research interests
Conference on Emerging Networking EXperiments and Technologies, include 4G and 5G wireless cellular networks, heterogeneous software-
ser. CoNEXT ’16. New York, NY, USA: Association for Computing defined (radio access) networks, network slicing and MAC layer scheduling.
Machinery, 2016, p. 427–441. [Online]. Available: https://doi.org/10.
1145/2999572.2999599
[56] IRTT (Isochronous Round-Trip Tester). [Online]. Available: https:
//github.com/heistp/irtt NAVID NIKAEIN is a Professor in Communica-
[57] M. Belshe, R. Peon, and M. Thomson, “Hypertext Transfer Protocol
tion System Department at Eurecom. He received
Version 2 (HTTP/2),” Internet Requests for Comments, RFC Editor, RFC
his Ph.D. degree in communication systems from
7540, 2015.
the Swiss Federal Institute of Technology EPFL
in 2003. Broadly, his research contributions are in
the areas of experimental 4G-5G system research
related to radio access, edge, and core networks
with a blend of communication, computing, and
data analysis with a particular focus on industry-
driven use-cases. He is a board member of Ope-
nAirInterface.org software alliance as well as the founder of the Mosaic-
MIKEL IRAZABAL received the M.Sc. degree 5G.io initiative whose goal is to provide software-based 4G/5G service
in telecommunications engineering from the Uni- delivery platforms.
versitat Politècnica de Catalunya, BarcelonaT-
ech (UPC), in 2011. He is currently pursuing
the Ph.D. degree with the Department of Net-
work Engineering, UPC. He participated as an
Early Stage Researcher (ESR) in the European
funded Project Application-Aware User-Centric
Programmable Architectures for 5G multi-tenant
networks (5G-AuRA) and an Innovative Training
Network (ITN) of the Marie Skłodowska-Curie Actions (MSCA). His main
research interests include the areas of low-latency wireless communications
and programmable wireless communication systems.
VOLUME 4, 2016 23

You might also like