Mobile Traffic Prediction From Raw Data Using LSTM Networks: Hoang Duy Trinh, Lorenza Giupponi, Paolo Dini

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2018 IEEE 29th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC)

Mobile Traffic Prediction from Raw Data


Using LSTM Networks
Hoang Duy Trinh, Lorenza Giupponi, Paolo Dini
CTTC/CERCA, Av. Carl Friedrich Gauss, 7, 08860, Castelldefels, Barcelona, Spain
{hdtrinh, lgiupponi, pdini}@cttc.es

Abstract—Predictive analysis on mobile network traf- physical resources is crucial to improve the users’ quality
fic is becoming of fundamental importance for the next of experience, but it is also beneficial for the energy
generation cellular network. Proactively knowing the user efficiency of the overall network.
demands, allows the system for an optimal resource allo-
cation. In this paper, we study the mobile traffic of an In recent years, the development of cheaper and more
LTE base station and we design a system for the traffic powerful hardware had made possible to unleash the
prediction using Recurrent Neural Networks. The mobile potential of machine learning algorithms, in particular
traffic information is gathered from the Physical Downlink of deep-learning, for a wide range of applications (e.g.
Control CHannel (PDCCH) of the LTE using the passive
tool presented in [1]. Using this tool we are able to collect objects identification, speech recognition, etc.). Using
all the control information at 1 ms resolution from the base modern GPUs, it is possible to run complex algorithms
station. This information comprises the resource blocks, the on large scale datasets, with minimal efforts [4]. There
transport block size and the modulation scheme assigned are numerous applications in which deep learning algo-
to each user connected to the eNodeB. rithms show excellent results (e.g. in computer vision),
The design of the prediction system includes long short
term memory units. With respect to a Multilayer Perceptron where the amount of data for the learning and for the
Network, or other artificial neurons structures, recurrent training tasks are widely available.
networks are advantageous for problems with sequential From an academic perspective, one major problem
data (e.g. language modeling) [2]. In our case, we state related to the cellular networks, is the lack of traffic
the problem as a supervised multivariate prediction of
the mobile traffic, where the objective is to minimize datasets to be studied. Users traffic data are not always
the prediction error given the information extracted from made available by network operators or they can be
the PDCCH. We evaluate the one-step prediction and the found but with very limited information [5]. Commonly,
long-term prediction errors of the proposed methodology, the available datasets consist of the aggregated traffic
considering different numbers for the duration of the derived from the Call Detail Records (CDRs), where text,
observed values, which determines the memory length of
the LSTM network and how much information must be voice and data are mixed without additional information
stored for a precise traffic prediction. on the technology and on which base station the users
are attached to [6]. Therefore, no information on the
I. I NTRODUCTION utilization of the physical resources or on the scheduling
Understanding the dynamic of the traffic demands in optimization can be assessed.
a wireless network represents a complex task, due to Numerous efforts have been devised to understand
the massive densification of the mobile devices attached the dynamic of the cellular networks. The prediction of
to the network. This is made more challenging by the mobile traffic patterns has been usually studied through
huge variety of devices available today, and by the time-series analysis methods. Most of the works use
different typologies of service that they can offer. Within techniques such as Auto Regressive Integrated Mov-
few years, the fifth-generation (5G) cellular network ing Average (ARIMA) and its different flavours (e.g.
is promising to enable a plethora of new applications, SARIMA, ARIMAX, mixed ARIMA), to capture the
including M2M communications, autonomous driving trends of the temporal evolution of the mobile traffic,
and virtual reality applications, which will require a boost [7], [8]. However, one of the known limitations of such
in the performance of the network in terms of latency, techniques is the poor robustness to the rapid fluctuations
capacity and context awareness [3]. of the time-series, since the prediction tends to over
To meet these strict requirements it is fundamental that reproduce the average of the past observed values [9].
the network becomes aware of the traffic demands. The Additionally, these methods work with homogeneous
analysis of the traffic and the precise forecast of the user time-series, where the input and the prediction are within
demands are essential for developing an intelligent net- the same set of values.
work. Knowing in advance the user demands makes the In this paper, we exploit the exceptional abilities of
network able to promptly manage the resource allocation the Long-Short Term Memory (LSTM) units to present
among the contending users. A smart optimization of the a multistep predictive algorithm for the mobile data

978-1-5386-6009-6/18/$31.00 ©2018 IEEE 1827


traffic. LSTM units are the elementary parts to form which contain the modulation and coding scheme (MCS),
Recurrent Neural Networks (RNN), which are a par- the number of resource blocks and the transport block
ticular extension of the FeedForward Neural Networks. size assigned to the users at each millisecond. This
In particular, RNNs show outstanding results for time- information is available for both the downlink and the
domain problems and sequential data, and they have uplink directions. Anonymity of data is ensured since the
been heavily adopted for text prediction and machine users are identified by a limited number of Radio Net-
translation problems [2]. These neural network structures work Temporary Identifiers (RNTI), which are refreshed
perfectly fit our traffic prediction problem, where the col- after 10.15 seconds of inactivity. Only scheduling control
lected dataset is multivariate and presents heterogeneous, information are available: the PDCCH is unencrypted,
non-linear information about the users communications. while downlink and uplink actual data are transmitted
Moreover, the time-domain characteristic of the mobile over encrypted physical channels to secure the users
traffic can be assisted by the LSTM properties, which are privacy.
able to capture the temporal trends of the data.
Results in literature show that LSTM networks out- B. Dataset Aggregation
perform other machine learning approaches for time- The collected dataset consists of one-month of
series analysis in the traffic prediction. LSTM structures scheduling information that we gathered by monitoring
have been proposed in [9]. Here, the datasets consist different eNodeBs located in the city of Barcelona, Spain.
of the spatio-temporal distribution of the mobile traffic Let D = {Dc1 , Dc2 , ..} be the dataset, where Dck is the
in different base stations. Spatial correlation has been set of measurements for the monitored cell k. Given a
used to evidence similarities between neighbouring base Dck , for each connected user at the time t, temporary
stations. Even though LSTM has been applied for the identified by a RNTI r, we decode the DCI message
traffic prediction, the input data consider only one metric containing the resource blocks, the transport block size
(e.g. the traffic, spatially distributed), and only a one-step and other scheduling information for the uplink and
prediction is considered. the downlink directions. We store this information in a
At the time of writing, to the best of our knowledge, measure sr (t), where t corresponds univocally to an LTE
we are the first to present a complete methodology for subframe number, or TTI, which is 1 ms long.
the data collection and for the design of a multistep We calculate the aggregate cell traffic measurements
predictive network, which exploit the ability of the LSTM for a given timeslot T , which is the sum of the traffic
to enhance the prediction accuracy. In particular, we use generated by all the RNTIs connected during the timeslot
directly the raw data obtained from the traffic control T , R(T ) X X
channel, and only an aggregative operation is performed S(T ) = sr (t) (1)
to predict the traffic for multiple steps onwards. This is r(t)∈R(T ) t∈T
critical in terms of speed of prediction, which can be done
with very little preprocessing of the data, with a limited Thus, S(T ) is the vector that contains the number
number of observations. Moreover, working with the raw of resource blocks allocated in the uplink and in the
data, the forecast of the traffic can be accomplished with downlink directions, the number of the messages sent
a high time resolution, since we can capture the traffic in both the directions and the sum of the total transport
variation, which is large, even in a few milliseconds time block sizes for a given timeslot T . Moreover, for each
window. timeslot T , we include the number of the attached users
to the eNodeB in the timeslot T . In our case we consider
The paper is organized as follows: in Section II we
T , as the number of TTIs for which we aggregate the
describe how we collect the data from the LTE control
traffic.
channel and how we aggregate the dataset. In Section
III we present the architecture for the mobile traffic III. A L ONG -S HORT T ERM M EMORY N ETWORK
prediction, which includes LSTM units. Section IV and
V are devoted to the numerical results and to conclusions, Recurrent neural networks are a generalization of
respectively. feedforward neural networks, that have been devised for
handling temporal and predictive problems. LSTM are
II. T HE LTE S CHEDULING T RAFFIC DATASET a particular kind of RNN, that have been introduced in
[11]. They have been explicitly designed to avoid the
A. LTE Control Channel long-term dependency issue, which is the cause of the
Using the methodology as done in [1] and [10], we vanishing-gradient problem in normal RNNs [12].
can collect the LTE scheduling information of the users The capability of learning long-term dependencies is
connected to a certain eNodeB. The advantage of this due to the structure of the LSTM units, which incor-
tool is the richness of information and the temporal porates gates that regulate the learning process. In a
granularity of the data. We can decode the Downlink standard LSTM unit (see Fig.2), the basic operations
Control Information (DCI) messages from the PDCCH, are accomplished by the input gate it , the forget gate

1828
i, f , o or of the cell state c. The subscript t is the time
index and is the element-wise multiplication.

Fig. 3: Single-layer LSTM network.

The LSTM unit combines the output of the previous


unit ht−1 with the current input xt using the input, the
output and the forget gates to update the memory of
the cell. The variables it and ft represent respectively
the information that need to be kept or to be forgotten
from the past and the current input. The cell state ct
is updated by summing the previous cell state ct−1 and
Fig. 1: Normalized weekly traffic signature of two mon- the candidate cell state c̃t , weighted respectively with ft
itored LTE eNodeBs. and it . Finally, we obtain the output ht applying the tanh
function to ct and multiplying it by ot . Then, the current
output ht is passed to next unit and combined with the
ft and the output gate ot . Moreover, the cell state ct input at the next time index t + 1.
represents the memory of the unit and it is updated with
the information to be kept (or to be forgotten), provided
by the input gate (or forget gate).

Fig. 4: Proposed architecture for the mobile traffic pre-


Fig. 2: Standard LSTM unit. diction.

Multiple LSTM units are concatenated to form one


it = σ(Wi · (ht−1 , xt ) + bi ) layer of the LSTM network. Each unit computes the
operations on one time index and transfer the output
ft = σ(Wf · (ht−1 , xt ) + bf ) to the next LSTM unit. The number of concatenated
cells indicates the number of observations of the data
ot = σ(Wo · (ht−1 , xt ) + bo ) that are considered before making the prediction. In our
case, the input xt is the eNodeB traffic vector S(T ), and
c˜t = φ(Wc · (ht−1 , xt ) + bc ) the number of observations is the number of selected
timeslots T .
ct = ft ct−1 + it c˜t The proposed architecture for the mobile traffic pre-
diction is depicted in Fig. 4. In our design, we consider
ht = ot φ(ct )
multiple layers of basic LSTM units to form a stacked
In the previous equations, σ(·) is the sigmoid function LSTM network. The intuition is that the deep LSTM
and φ is hyperbolic tangent function (tanh). W and b are network is able to learn the temporal dependencies of
respectively the weight matrix and the bias of the gates the aggregate mobile traffic: the LSTM unit of each

1829
layer extract a fixed number of features which are passed [13], to update the network weights iteratively based on
to the next layer. The depth of the network (e.g. the the training data.
number of layers) is to increment the accuracy of the
prediction, which is done by the last fully connected B. Results Analysis
layer. For the one-step prediction we use a many-to-
one architecture, which means that the network observes Next, we present the results of multi-step prediction,
the mobile traffic for a fixed number of timeslots until that is when the output is delayed for a fixed number
T and, then try to predict the traffic in the next time of timeslots and the prediction is performed for later
slot T + 1. In the multi-step prediction, we delay the time instants. We show how the accuracy decreases when
prediction for a chosen number of timesteps, similarly we try to predict the traffic data in future timesteps.
to what is done for language modeling problems when Furthermore, we analyze the effect of the number of
they try to predict a sequence of words. Last, the output observations that the LSTM network can see, and the
of the LSTM network is passed to a fully connected duration of the timeslots T : these are design parameters
neural network, which performs the actual prediction. that need to be estimated, since they determine the
The motivation for this last layer is for the regression memory length of the LSTM network and how much
problem we are trying to solve: from an implementation traffic information is needed to be stored for an accurate
perspective, this feedforward layer applies the softmax prediction.
activation function, which is needed during the training In Fig. 5, we show the results of the mobile traffic
phase to optimize the weights of the network neurons. prediction for two cells: since they are located into
two different areas, the monitored eNodeBs present two
IV. N UMERICAL R ESULTS distinct traffic profiles in terms of profile and traffic
magnitude. We can see that the prediction is precise
A. Evaluation Setup for the whole week, despite the oscillating behaviour of
We use the set of mobile traffic data from two different the traffic. In this case, the prediction is one-step ahead,
eNodeBs, that we collected during one month, to eval- that means that we use a fixed number of past values
uate the performance of the proposed architecture. For (K = 10) to predict the traffic for the next timeslot.
each eNodeB, we calculate the aggregate cell traffic, as In Fig.6 and in Fig.7 we evaluate the prediction error
described in Section II-B. with respect to past observed values. It is relevant also
We choose the Normalized Root Mean Square Error to consider different values for the timeslot duration of
(NRMSE) as the metric to measure the accuracy of the T , which affects the calculation of the aggregated traffic
prediction algorithm, which is given as S(T ) from the raw LTE traces. Figures are related to the
s first eNodeB (results are comparable for eNodeB 2). In
PN
˜t − xt )2 Fig. 7, we see that the NRMSE is larger for a higher
1 t=1 (x
NRMSE = (2) duration of T and, as expected, the error decreases with
x̄ N
a larger number of observations. To emphasize the effect
where N is the total number of points, x˜t and xt are the of the number of past observations, we plot the increasing
predicted value and its correspondent observation at the accuracy (with respect to observing only one past value)
time t and x¯t is their mean. This same metric is used to for different values of T . We can observe that the major
compare the accuracy of the proposed architecture with increase in percentage is given for larger values of T .
the one obtained using other predictive algorithms. For 10 past observed timesteps the accuracy can increase
The implementation of the mobile traffic prediction more than 40%.
algorithm is done in Python, using Keras and Tensorflow,
as backend. The chosen hyperparameters are reported in
Table 1. The number of hidden layers is fixed to 5: this is TABLE I: Training Hyperparameters
one of the hyperparameters that need to be selected and
can affect the tradeoff between the prediction accuracy Initial Learning Rate 0.001
and the time needed to train the network. A higher
Num. of Epochs 100
number of layers may increase the precision of the
prediction, but we want to focus on the the relationship LSTM Hidden States 64
between the number of past observed values and the
precision of the multi-step prediction, which determines LSTM Hidden Layers 5
the quantity of information needed to be memorized and Feedforward Hidden Layers 1
utilized by the network. For the same reason, we fix the
number of epochs to 100. Three weeks of data are used Optimization Algorithm Adam
to train and to validate the architecture. Next results are
Loss Function MAE
related to the last week. We use the Adam optimization

1830
Fig. 6: Prediction error versus number of past observed
values.

Fig. 5: Prediction of the weekly mobile traffic for two


different eNodeBs.

In Fig.8 we show the prediction for 15 timesteps ahead.


We fix K = 10 and T = 10 TTIs. At first, the prediction
almost corresponds to the measured values, while after
some steps the prediction error is more dominant. In Fig.
9, we plot the increasing error with respect to future
prediction steps. As expected, longer prediction causes Fig. 7: Percentage of accuracy gaining versus number of
an increment in the accuracy of the algorithm. The error past observed values.
increases with different trends and it is around 40% when
we predict for 15 steps ahead. This is similar to what
happens in the problem of language synthesis for the
prediction of long sentences: for further words prediction,
the number of candidate words is larger, therefore there is
more uncertainty in the correct choice. Conversely to the
number of past observations, changing the duration of the
timeslot T , does not give useful insight on the prediction
error: for longer periods, the variability of the mobile
traffic is larger, leading to an oscillating and randomic
error for future predictions.
Finally, we compare the proposed architecture with
two time-series prediction methods: an ARIMA model is
a well-established technique for the time-series analysis,
and it is defined by 3 parameters (p, d, q) that determine Fig. 8: Lookup for the predicted mobile traffic vs.
the auto-regression, the differentiation and the moving ground-truth measurements for 15 steps ahead (T = 10).
average, respectively. Here, we use a (10, 1, 5) model.
Also, note that we use only one variable for the prediction
(i.e. the aggregate traffic), instead of multiple information window using these two techniques. The accuracy using
obtained by the raw data. The other traffic prediction the ARIMA model is lower, since the prediction tends
is obtained using a deep FeedForward Neural Network to be closer to the average value of the traffic. On the
(FFNN), where we replace the LSTM neural network other hand, the FFNN is able to follow the periodic trend
with a network of fully connected neurons. For a fair and the traffic oscillations, but it still lack of a high
comparison, we use the same number of hidden layers. precision. Also, we compare the average error for the
Figure 10 shows the traffic prediction on the same time three prediction methods on the two traffic profiles: as

1831
Fig. 10: Traffic prediction obtained with different model and errors.

ness under grant TEC2017-88373-R (5G REFINE) and


from the European Union Horizon 2020 research and
innovation programme under the Marie Skodowska-Curie
grant agreement No 675891 (SCAVENGE).
R EFERENCES
[1] N. Bui and J. Widmer, “Owl: a reliable online watcher for lte
control channel measurements,” ACM All Things Cellular, 2016.
[2] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau,
F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase
representations using rnn encoder–decoder for statistical machine
translation,” in Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing (EMNLP), pp. 1724–
1734, 2014.
[3] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C.
Fig. 9: Percentage of error increasing versus number of Soong, and J. C. Zhang, “What will 5g be?,” IEEE Journal on
selected areas in communications, vol. 32, no. 6, pp. 1065–1082,
predicting timesteps. 2014.
[4] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu,
G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio,
“Theano: A cpu and gpu math compiler in python,” in Proc. 9th
expected, thanks to the LSTM properties, the proposed Python in Science Conf, vol. 1, 2010.
algorithm captures the temporal characteristics of the [5] V. D. Blondel, M. Esch, C. Chan, F. Clérot, P. Deville, E. Huens,
mobile traffic, and it provides superior accuracy with F. Morlot, Z. Smoreda, and C. Ziemlicki, “Data for develop-
ment: the d4d challenge on mobile phone data,” arXiv preprint
respect to the FeedForward Neural Network or to the arXiv:1210.0137, 2012.
classic ARIMA model. [6] T. Italia, “Big data challenge 2015,” aris.me/contents/teaching
/data-mining-2015 /project/BigDataChallengeData.html, 2015.
V. C ONCLUSION [7] Y. Shu, M. Yu, O. YANG, J. Liu, and H. Feng, “Wireless traffic
modeling and prediction using seasonal arima models,” vol. E88B,
In this work, we study the effectiveness of recurrent 01 2003.
neural networks applied to the prediction of the mobile [8] G. P. Zhang, “Time series forecasting using a hybrid arima and
traffic. The choice of using LSTM network is imposed neural network model,” Neurocomputing, vol. 50, pp. 159–175,
2003.
by the dataset characteristics, since we use multivariate [9] J. Wang, J. Tang, Z. Xu, Y. Wang, G. Xue, X. Zhang, and D. Yang,
traffic information that derive directly from the DCI of “Spatiotemporal modeling and prediction in cellular networks: A
the LTE control channel. The LSTM units succeed in big data enabled deep learning approach,” in INFOCOM 2017-
IEEE Conference on Computer Communications, IEEE, pp. 1–9,
capturing the temporal correlation of the traffic even for IEEE, 2017.
distant timeslots. Applying the prediction of the traffic [10] H. D. Trinh, N. Bui, J. Widmer, L. Giupponi, and P. Dini,
using raw aggregate data from the physical channel, is “Analysis and modeling of mobile traffic using real traces,”
PIMRC, 2017.
fundamental in time-critical applications and avoids the [11] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
need for additional resources to process the traffic data. Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[12] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:
ACKNOWLEDGEMENT Continual prediction with lstm,” 1999.
[13] D. P. Kingma and J. Ba, “Adam: A method for stochastic
The research leading to this work has received funding optimization,” arXiv preprint arXiv:1412.6980, 2014.
from the Spanish Ministry of Economy and Competitive-

1832

You might also like