Mobile Traffic Prediction From Raw Data Using LSTM Networks: Hoang Duy Trinh, Lorenza Giupponi, Paolo Dini
Mobile Traffic Prediction From Raw Data Using LSTM Networks: Hoang Duy Trinh, Lorenza Giupponi, Paolo Dini
Mobile Traffic Prediction From Raw Data Using LSTM Networks: Hoang Duy Trinh, Lorenza Giupponi, Paolo Dini
Abstract—Predictive analysis on mobile network traf- physical resources is crucial to improve the users’ quality
fic is becoming of fundamental importance for the next of experience, but it is also beneficial for the energy
generation cellular network. Proactively knowing the user efficiency of the overall network.
demands, allows the system for an optimal resource allo-
cation. In this paper, we study the mobile traffic of an In recent years, the development of cheaper and more
LTE base station and we design a system for the traffic powerful hardware had made possible to unleash the
prediction using Recurrent Neural Networks. The mobile potential of machine learning algorithms, in particular
traffic information is gathered from the Physical Downlink of deep-learning, for a wide range of applications (e.g.
Control CHannel (PDCCH) of the LTE using the passive
tool presented in [1]. Using this tool we are able to collect objects identification, speech recognition, etc.). Using
all the control information at 1 ms resolution from the base modern GPUs, it is possible to run complex algorithms
station. This information comprises the resource blocks, the on large scale datasets, with minimal efforts [4]. There
transport block size and the modulation scheme assigned are numerous applications in which deep learning algo-
to each user connected to the eNodeB. rithms show excellent results (e.g. in computer vision),
The design of the prediction system includes long short
term memory units. With respect to a Multilayer Perceptron where the amount of data for the learning and for the
Network, or other artificial neurons structures, recurrent training tasks are widely available.
networks are advantageous for problems with sequential From an academic perspective, one major problem
data (e.g. language modeling) [2]. In our case, we state related to the cellular networks, is the lack of traffic
the problem as a supervised multivariate prediction of
the mobile traffic, where the objective is to minimize datasets to be studied. Users traffic data are not always
the prediction error given the information extracted from made available by network operators or they can be
the PDCCH. We evaluate the one-step prediction and the found but with very limited information [5]. Commonly,
long-term prediction errors of the proposed methodology, the available datasets consist of the aggregated traffic
considering different numbers for the duration of the derived from the Call Detail Records (CDRs), where text,
observed values, which determines the memory length of
the LSTM network and how much information must be voice and data are mixed without additional information
stored for a precise traffic prediction. on the technology and on which base station the users
are attached to [6]. Therefore, no information on the
I. I NTRODUCTION utilization of the physical resources or on the scheduling
Understanding the dynamic of the traffic demands in optimization can be assessed.
a wireless network represents a complex task, due to Numerous efforts have been devised to understand
the massive densification of the mobile devices attached the dynamic of the cellular networks. The prediction of
to the network. This is made more challenging by the mobile traffic patterns has been usually studied through
huge variety of devices available today, and by the time-series analysis methods. Most of the works use
different typologies of service that they can offer. Within techniques such as Auto Regressive Integrated Mov-
few years, the fifth-generation (5G) cellular network ing Average (ARIMA) and its different flavours (e.g.
is promising to enable a plethora of new applications, SARIMA, ARIMAX, mixed ARIMA), to capture the
including M2M communications, autonomous driving trends of the temporal evolution of the mobile traffic,
and virtual reality applications, which will require a boost [7], [8]. However, one of the known limitations of such
in the performance of the network in terms of latency, techniques is the poor robustness to the rapid fluctuations
capacity and context awareness [3]. of the time-series, since the prediction tends to over
To meet these strict requirements it is fundamental that reproduce the average of the past observed values [9].
the network becomes aware of the traffic demands. The Additionally, these methods work with homogeneous
analysis of the traffic and the precise forecast of the user time-series, where the input and the prediction are within
demands are essential for developing an intelligent net- the same set of values.
work. Knowing in advance the user demands makes the In this paper, we exploit the exceptional abilities of
network able to promptly manage the resource allocation the Long-Short Term Memory (LSTM) units to present
among the contending users. A smart optimization of the a multistep predictive algorithm for the mobile data
1828
i, f , o or of the cell state c. The subscript t is the time
index and is the element-wise multiplication.
1829
layer extract a fixed number of features which are passed [13], to update the network weights iteratively based on
to the next layer. The depth of the network (e.g. the the training data.
number of layers) is to increment the accuracy of the
prediction, which is done by the last fully connected B. Results Analysis
layer. For the one-step prediction we use a many-to-
one architecture, which means that the network observes Next, we present the results of multi-step prediction,
the mobile traffic for a fixed number of timeslots until that is when the output is delayed for a fixed number
T and, then try to predict the traffic in the next time of timeslots and the prediction is performed for later
slot T + 1. In the multi-step prediction, we delay the time instants. We show how the accuracy decreases when
prediction for a chosen number of timesteps, similarly we try to predict the traffic data in future timesteps.
to what is done for language modeling problems when Furthermore, we analyze the effect of the number of
they try to predict a sequence of words. Last, the output observations that the LSTM network can see, and the
of the LSTM network is passed to a fully connected duration of the timeslots T : these are design parameters
neural network, which performs the actual prediction. that need to be estimated, since they determine the
The motivation for this last layer is for the regression memory length of the LSTM network and how much
problem we are trying to solve: from an implementation traffic information is needed to be stored for an accurate
perspective, this feedforward layer applies the softmax prediction.
activation function, which is needed during the training In Fig. 5, we show the results of the mobile traffic
phase to optimize the weights of the network neurons. prediction for two cells: since they are located into
two different areas, the monitored eNodeBs present two
IV. N UMERICAL R ESULTS distinct traffic profiles in terms of profile and traffic
magnitude. We can see that the prediction is precise
A. Evaluation Setup for the whole week, despite the oscillating behaviour of
We use the set of mobile traffic data from two different the traffic. In this case, the prediction is one-step ahead,
eNodeBs, that we collected during one month, to eval- that means that we use a fixed number of past values
uate the performance of the proposed architecture. For (K = 10) to predict the traffic for the next timeslot.
each eNodeB, we calculate the aggregate cell traffic, as In Fig.6 and in Fig.7 we evaluate the prediction error
described in Section II-B. with respect to past observed values. It is relevant also
We choose the Normalized Root Mean Square Error to consider different values for the timeslot duration of
(NRMSE) as the metric to measure the accuracy of the T , which affects the calculation of the aggregated traffic
prediction algorithm, which is given as S(T ) from the raw LTE traces. Figures are related to the
s first eNodeB (results are comparable for eNodeB 2). In
PN
˜t − xt )2 Fig. 7, we see that the NRMSE is larger for a higher
1 t=1 (x
NRMSE = (2) duration of T and, as expected, the error decreases with
x̄ N
a larger number of observations. To emphasize the effect
where N is the total number of points, x˜t and xt are the of the number of past observations, we plot the increasing
predicted value and its correspondent observation at the accuracy (with respect to observing only one past value)
time t and x¯t is their mean. This same metric is used to for different values of T . We can observe that the major
compare the accuracy of the proposed architecture with increase in percentage is given for larger values of T .
the one obtained using other predictive algorithms. For 10 past observed timesteps the accuracy can increase
The implementation of the mobile traffic prediction more than 40%.
algorithm is done in Python, using Keras and Tensorflow,
as backend. The chosen hyperparameters are reported in
Table 1. The number of hidden layers is fixed to 5: this is TABLE I: Training Hyperparameters
one of the hyperparameters that need to be selected and
can affect the tradeoff between the prediction accuracy Initial Learning Rate 0.001
and the time needed to train the network. A higher
Num. of Epochs 100
number of layers may increase the precision of the
prediction, but we want to focus on the the relationship LSTM Hidden States 64
between the number of past observed values and the
precision of the multi-step prediction, which determines LSTM Hidden Layers 5
the quantity of information needed to be memorized and Feedforward Hidden Layers 1
utilized by the network. For the same reason, we fix the
number of epochs to 100. Three weeks of data are used Optimization Algorithm Adam
to train and to validate the architecture. Next results are
Loss Function MAE
related to the last week. We use the Adam optimization
1830
Fig. 6: Prediction error versus number of past observed
values.
1831
Fig. 10: Traffic prediction obtained with different model and errors.
1832