Prediction of Traffic Congestion Based on LSTM Thr

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Prediction of Traffic Congestion Based on


LSTM through Correction of Missing Temporal
and Spatial Data
Dong-Hoon Shin1, Kyungyong Chung2, Roy C. Park3
1
Department of Computer Science, Kyonggi University, Suwon-si, Gyeonggi-do, 16227, South Korea
2
Division of Computer Science and Engineering, Kyonggi University, Suwon-si, Gyeonggi-do, 16227, South Korea
3
Department of Information Communication Software Engineering, Sangji University, Wonju-si, Gangwon-do, 26339, South of Korea
Corresponding author: Roy C. Park
This work is supported by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and
Transport (Grant 20CTAP-C157011-01).

ABSTRACT With the rapid increase in vehicle use during the fourth Industrial Revolution, road resources
have reached their supply limit. Active studies have therefore been conducted on intelligent transportation
systems (ITSs) to realize traffic management systems utilizing fewer resources. As part of an ITS, real-time
traffic services are provided to improve user convenience. Such services are applied to prevent traffic
congestion and disperse existing traffic. Therefore, these services focus on immediacy at the expense of
accuracy. As these services typically rely on measured data, the accuracy of the models are contingent on the
data collection. Therefore, this study proposes a long short-term memory (LSTM)-based traffic congestion
prediction approach based on the correction of missing temporal and spatial values. Before making
predictions, the proposed prediction method applies pre-processing that consists of outlier removal using the
median absolute deviation of the traffic data and the correction of temporal and spatial values using temporal
and spatial trends and pattern data. In previous studies, data with time-series features have not been
appropriately learned. To address this problem, the proposed prediction method uses an LSTM model for
time-series data learning. To evaluate the performance of the proposed method, the mean absolute percentage
error (MAPE) was calculated for comparison with other models. The MAPE of the proposed method was
found to be the best of the compared models, at approximately 5%.

INDEX TERMS Long Short-Term Memory, Traffic, Intelligent Transportation System, Deep Learning,
Missing Data Correction, Big Data-based AI

I. INTRODUCTION that have arisen with the increased demand for road resources
Based on the core technologies of the fourth industrial have focused on traffic welfare. Traffic welfare consists of
revolution, smart vehicles are being produced in diverse factors including operation service costs, passage of time,
forms [1]. The role of the automobile has been extended from accident costs, parking costs, punctuality, and accessibility,
a simple means of transportation to a living space and finally, with the most important being traffic congestion. As a part of
to a type of infotainment system that provides new forms of an ITS, traffic surfaces can be put in place to collect traffic
user convenience [2], 3]. With the increase in the demand for information on all roads in real time in order to provide users
smart automobiles, it is extremely important to collect and with information including which regions are congested,
process traffic information to enable smooth traffic traffic volumes, and the locations of traffic accidents. In this
management. Furthermore, it is necessary to take a way, an ITS can improve the functionality of a road traffic
qualitative rather than a quantitative approach [4]. To this network. An ITS can also provide a real-time traffic-
end, research has been conducted on intelligent transportation information service. By suggesting an optimal path to each
systems (ITSs) developed in concert with conventional driver, road congestion decreases and traffic is dispersed. An
traffic management systems and information technology [5]- ITS thus focuses on immediacy but achieves relatively low
[7]. To improve user convenience, studies on the problems accuracy. To solve this problem, active research has been

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

conducted on real-time traffic pattern predictions based on it is possible to increase the accuracy of predictions and to
deep learning models and multiple prediction modes, with a provide a smooth flow of traffic information to users [20], [21].
particular focus on traffic predictions based on time-series This paper is organized into the following sections. Section
data. 2 describes the relevant studies on ITSs and ITS-based traffic
Weilin et al. [8] proposed a multi-resolution support vector predictions. Section 3 details the data collection process, data
regression (SVR) traffic flow prediction model based on pre-processing, and model design for traffic congestion
wavelet decomposition and topological space reconstruction. prediction. In Section 4, an experiment conducted to evaluate
For their experiment, the researchers utilized data collected the model performance and its results are described. In
from January to December 2011 by performance measurement addition, a comparison of different methods used to verify the
systems, which collect data in 5-min intervals. The mean performance and a description of the system implementation
absolute percentage error (MAPE) rate for their model was are also provided. Finally, Section 5 presents the concluding
12.8%. remarks regarding this study.
Filmon et al. [9] proposed a nonparametric, data-centric
methodology to achieve short-term traffic predictions based II. RELATED WORK
on the identification of similar traffic patterns through the
improved K-nearest neighbor (K-NN) algorithm. Recently, A. Research of traffic congestion prediction
the weighted Euclidean distance has also been used as a In traffic data, outliers and missing values negatively
similarity measurement for K-NN. For their experiment, the influence traffic control and traffic congestion prediction in
researchers used 12 datasets from highways in the UK and 24 intelligent traffic systems. To address this problem, many
datasets from highways in the US. A MAPE rate of 22% was missing value correction methods have been proposed.
achieved. Conventional methods of missing value correction focus on
the correction of individual missing values. Although these
Zhang [10] proposed a short-term traffic prediction model
methods provide a simple and fast estimate for the missing
based on a convolutional neural network (CNN) deep learning
value, they often produce biased results. To resolve this,
framework. In the proposed framework, the optimal input data
historical imputation methods (HIMs) that provide multiple
time delay and amount of spatial data are determined based on estimation values for one missing value have been proposed
the space-time feature selection algorithm. The selected space- [22], [23]. In these methods, a missing value is replaced by
time traffic feature is then transformed into a two-dimensional the mean value of multiple data points collected at the same
matrix after being extracted from the actual data. The function position and date. Correction methods based on nearest
is learned by the CNN, and a prediction model is constructed. neighbor imputation (NIM) use the mean value from the
According to a performance analysis, the MAPE rate was neighboring roads to estimate a missing value [24], [25].
approximately 8.3% on average. However, such methods cannot be applied when there is no
The methods described above tend to achieve higher data from neighboring roads. The missing value correction
prediction accuracies than those focusing on immediacy. The method proposed in this study makes it possible to correct a
prediction modes used in these studies are based on one of missing value and thereby to design complete data, using
three models: SVR [11], [12], CNNs [13], [14], and KNN [15], past data patterns even when there is no information from
[16]. Because these models fail to consider the features of neighboring roads. In addition, machine learning and deep
time-series data, they may be inappropriate. For prediction, learning are applied to model more complicated data for
this study therefore utilizes the long short-term memory traffic prediction. The deep learning model exhibits better
(LSTM) model, which provides accurate predictions and performance since it has more functions and more
makes it possible to account for the time-series features of complicated architecture than the conventional model.
traffic data. The LSTM model solves the problem of the long- Sun et al. [26] proposed a traffic prediction method using
term dependence inherent in recurrent neural network (RNN) GPS trajectory data based on an RNN. Their method used
models [17]-[19]. With the LSTM model, the result of a the missing values from existing road speed data to estimate
hidden layer is passed to the same hidden layer as an input. the average speeds on stretches of road with GPS trajectory
data. However, because an RNN fails to memorize past data
Owing to the recursive construction of hidden layers, it is
features and deletes them with a lapse in time, it has
possible to consider sequential or temporal aspects. For this
problems dealing with long-term dependency. Accordingly,
reason, this model is conducive for learning the time-series
traffic prediction based on the LSTM model, which resolves
features of traffic data. Traffic data include outliers or missing problems associated with RNNs, is actively being researched.
values due to unexpected traffic variables. Outliers and Mou et al. [27] proposed the temporary information
missing values lower model performance and therefore should improvement (T-LSTM) model to predict the traffic flow on
be corrected when designing an accurate prediction model. a single stretch of road. In consideration of the similar
The correction can be achieved by removing outliers, features exhibited each day by traffic flows at a given time
correcting missing temporal and spatial values, and applying and place, the model extracted the unique correlation
pattern data. Then a system can be established to provide the between the traffic flow and time information, thereby
predicted traffic information to users. With more accurate data, improving the prediction accuracy. Yu et al. [28] proposed
2

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

STGCN to solve the problem of previous studies that had


ignored spatial and temporal attributes in traffic prediction.
They argued that the method was able to obtain a faster
training speed with a smaller number of parameters since it
formalized the problem in graph and established a model
with a complete convolution structure, rather than applying
regular convolution and repetition units. Many researchers
have tried to increase the accuracy of traffic predictions and
reduce the calculation time through their theories and
experiments.
FIGURE 1. Transportation prediction system architecture of USC media
B. Research on ITS-based traffic prediction
University of Southern California Information Lab has
established spatial and temporal data using sensors for road As shown in Fig. 1, data are received in real time by the
measurements and traffic information (e.g., CCTV and GPS) User Interface and Data Interface. Through the adaptive
and uses real-time data and past traffic data to predict on- segmentation of the Context Space, the effect of each base
road traffic [29]. The extent to which a prediction model prediction device is efficiently estimated. In this way, it is
established using past data depends on the state of real-time possible to predict traffic conditions in diverse situations.
traffic is important, and an important task is to evaluate the
III. Prediction of Traffic Congestion Based on LSTM
extent to which models built using past predictions depend
through Correction of Missing Temporal and Spatial
on current status data. It is necessary to overcome the Data
limitation of previous data becoming irrelevant in the model The congestion prediction method developed to provide traffic
over time. To this end, in the USC model, current traffic information to users consists of data collection, correction of
information is learned in real time and is used as historical missing data, and prediction modeling. In this study, the
data. The framework can predict traffic at an accuracy collected data include node/link and traffic speed data
comparable to that of the most effective prediction-trained provided by an ITS. The node/link data represent a road region
model. Fig. 1 shows the transportation prediction system or road connection point. The traffic speed data from the ITS
architecture of USC media. The artificial intelligence (AI)- are collected by traffic information collectors installed on the
based transportation prediction system offered by Blue roads or along the roadsides. The traffic data include missing
Signal in the Republic of Korea provides road map values and outliers. An outlier may be generated by an
information and predicts traffic flows and accident risk information collection failure, when there are errors in the
through big data analysis [30]. An AI-based transportation collectors, or by shaded zones without automobiles travelling
prediction engine was also developed based on in them. The traffic data also include time-series features. For
transportation theory. Whereas a conventional GPS service this reason, a missing value makes it difficult to extract the
provides information such as routes around traffic jams, the feature values when a deep learning model is used for
shortest travel time, and the shortest path, the prediction prediction. Therefore, preprocessing of the outliers and
engine of Blue Signal predicts the safest and most convenient missing values is required [31], [32]. During the data pre-
route. This engine can achieve 98% accuracy for traffic processing, an outlier is processed, and filtering is then applied
accident prediction on domestic highways. using the median absolute deviation [33], [34]. Missing data

FIGURE 2. Process of LSTM-based traffic congestion prediction through time-space correction

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

are corrected using spatial trends, temporal trends, and pattern method corrects missing data values from a temporal or spatial
utilization. With the pre-processed data, an LSTM model is perspective.
used to predict traffic congestion. Fig. 2 shows the entire
process of LSTM-based traffic congestion prediction through Algorithm 1: Outlier Removal Algorithm
the correction of temporal and spatial data.
Input: [x1, x2, …, xn]

A. Outliers, types of missing values, and correction def Detection of Outlier


methods according to traffic data features
Traffic speed data include outliers that distort the flow of the MED ← Median([x1, x2, …, xn])
average traffic speed and missing values. An outlier represents for xi in [x1, x2, …, xn]
a value that is either too small or too large in the context of the
average traffic flow on each road. Such values are removed to do x’i = |xi -median|
avoid influencing the feature values at the time of prediction. MAD ← Mean[x’1, x’2, …, x’n]
There are two types of missing values. The first type is missing
for x’i in [x’1, x’2, …, x’n]
temporal values that occur when not all of the traffic data
0.6457(𝑥𝑖 −𝑚𝑒𝑑𝑖𝑎𝑛)
(which are collected every 5 min) are gathered. The second if > threshold
𝑀𝐴𝐷
type is missing spatial values that occur when data are not
then outlier_set ← [outlier_set, xi]
collected at each road in a given collection interval. Fig. 3
shows the time in a link matrix with examples of an outlier and Output: [x1, x2, …, xn] - outlier set
each type of missing value.

FIGURE 3. Outliers and types of missing values from traffic data

To correct for outliers and missing values in the traffic data,


the outlier removal process is first applied. There are a variety
of typical outlier removal methods that use, for example, the
median absolute deviation, truncated mean, or Winsorized
mean. Methods may be combined depending on the features
of the roads and traffic data. This study applies the median
absolute deviation to identify and remove outliers. That is, the
median value of the collected data is used to detect whether a
value is abnormally large or small. When a value is identified
as an outlier, it is removed. Algorithm 1 shows the outlier FIGURE 4. Outlier removal process using median absolute deviation
removal algorithm. Fig. 4 shows the outlier removal process
using the median absolute deviation. The missing value The spatial trends are used to correct missing values in regions
correction is an algorithm-based filtering process for with similar traffic patterns under the assumption that the
correcting data that was removed after being identified as traffic flow of the upper regions influences that of the lower
outliers. The missing-value correction methods include the regions. Algorithm 2 is a missing value correction method
application spatial trends from data from regions with a similar based on the use of spatial trends. For instance, if detector 𝑥𝑏
traffic pattern, the use of temporal trends to correct the value has a problem and its data are missing, the mean of the
in question using past data, and the use of pattern data. Each adjoining link data from 𝑥𝑎 and 𝑥𝑐 is used for the correction.
The use of the spatial trend-based missing value correction
process is shown in Fig. 5.
4

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

Algorithm 2: Spatial Data Correction Algorithm temporal values are constant, and the data pattern disappears,
Input: Xa (Adjoining Northbound Link) Xb (Target Link) as shown in Fig. 7.
Xc (Adjoining Southbound Link)

def Spatial Data Correction


if Xa=Exist && Xb=None && Xc=Exist
then Xb = (Xa+ Xc)/2
else if Xa=None && Xb=None && Xc=Exist
then Xb= Xc
else if Xa=Exist && Xb=None && Xc=None
then Xb= Xa

Output: Xb (Target Link)

FIGURE 6. Time- trend based correction procedure

𝐴𝑡−1 + 𝐴𝑡−2 + ⋯ + 𝐴𝑡−𝑛


𝐹𝑡 = (1)
𝑛
Therefore, if the temporal trend is not useful, the pattern
data are applied. This final method estimates the missing
values by applying data collected in the connected parts, such
as the data entrance and the entrance access parts. For the
pattern data generation procedure, the data from previous days
are checked to find the passage features of each day, and the
data are saved as one of six types: a special day, Sunday,
Saturday, Monday, weekday (Tuesday through Thursday), or
FIGURE 5. Spatial-trend based correction procedure Friday. The pattern data of each type are generated every 5
min and are updated by applying a weight to the current
If missing data occur at three continuous points, the spatial collection speed.
trend correction is not possible because there are no adjoining
links. In this case, the temporal trend is applied. The temporal
method calculates the mean of the n previous observations at
missing observations location. Equation 1 shows the
correction equation using the temporal trend. In the equation,
Ft is the missing value at the current time 𝑡 and is to be
estimated, 𝐴𝑡−𝑘 is the detected data at time 𝑡 − 𝑘, and 𝑛 is the
number of past detected observations. In Fig. 6, the use of the
temporal trend-based correction procedure is illustrated. Using
the temporal trend, the missing values in the traffic data can be
fully corrected. Nevertheless, if there are many sequential
FIGURE 7. Data pattern collapse due to continuous temporal data
missing values, when applying the method, the estimated

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

FIGURE 8. LSTM-based traffic prediction process

B. LSTM-based traffic congestion prediction IV. Experiment and Results


For traffic speed prediction, we use time series-based deep The LSTM-based traffic congestion prediction method
learning (LSTM or long-term memory) for modeling [35], proposed in this study was implemented using the following
[36]. The data used for prediction are pre-processed using the hardware and operating system: Windows 10 Pro, an AMD
method described in the previous section. The input data used Ryzen 5 1600 6-Core processor, an NVIDIA GeForce GTX
for modeling are the mean speeds from 10 min and 5 min 1070, and 16 GB of RAM. In terms of software, a TensorFlow
earlier, the current speed, and the speed of adjoining upper back-end engine and the deep learning library Keras were used
region. The output data is the predicted speed 5 min after the in the design. The traffic speed data used in this study was
current time. Fig. 8 shows the LSTM-based traffic prediction collected in Gangnam-gu, Seoul during one month in
process proposed in this study. An LSTM cell consists of a November, 2018. There are a total of 1,630 links in Gangnam-
memory cell and gates. Input information is saved in the gu, and data was collected at each link [41], [42]. There were
memory cell, and a gate controls the saved information. The a total of 8,640 observations collected at each link according
parameters of the proposed LSTM model are shown in Table to the collection cycle (5 min * 30 days) and the collection
1. The learning rate is a Hyper parameter in an optimization period. Some data were missing; data may have failed to be
algorithm that determines the step size at each iteration while collected due to a sensor or software error in the process of
moving toward a minimum of a loss function. [37]. Dropout is data collection. The average missing rate of Gangnam-gu
used to prevent overfitting, which can occur during the traffic speed data is approximately 33%.
learning process [38]. In other words, dropout is used when a
model lacks flexibility due to overfitting (which means that the A. Implementation of traffic congestion prediction
error is small when testing with the learning data and large system
when testing with the test data) and can therefore not be In this study, a system for pre-processing traffic data and a
generalized. The batch size represents the data input to the traffic congestion prediction model were established. Fig. 9
model concurrently with the training data. The optimization shows the pre-processing system for the traffic data. The table
function is an algorithm for updating the weights. The number in Fig. 9 shows an example of the speed data for all regions in
of hidden neurons and layers, the number of epochs, and the Gangnam-gu. By entering a LINK_ID in the Setting field at
loss function, all of which affect performance, are frequently the bottom right, selecting a pre-processing method (outlier
changed to induce improved performance [39], [40]. removal, correction of missing spatial or temporal values, or
the use of pattern data), and clicking the Start button, data pre-
TABLE 1. Hyper parameter values
processing is applied. The pre-processed region data appear in
Hyper parameter Value the bottom left of Fig. 9. It is possible to save the pre-processed
Learning rate 0.001
data by clicking the Save button. region is selected in the
region selection window in the top-left of the prediction
Dropout 0.005
system. In the LINK overview window, the description of the
Batch size 100
selected region (LINK_ID, LINK_NAME, Velocity) is
Optimization RMSprop displayed. In the data collection window, the speed data
Epoch 500 collected in the selected region are shown for the date provided.
When ‘15 min later’ is selected, and the Predict button is
clicked in the Status window, the overall congestion results for
6

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

the Gangnam-gu region of Seoul are displayed. The traffic congested’ to less than 40 km/h. These criteria are suggested
congestion criteria differ depending on the road type. For by the Ministry of Land, Infrastructure and Transport [41].
general roads, ‘smooth’ refers to speeds of 30 km/h or higher, Numerical information for the expected congestion region is
‘congested’ to 15 km/h ~ 30 km/h, and ‘very congested’ to less provided in the table below the simulation map. Fig. 10 shows
than 15 km/h. For highways, ‘smooth’ refers to speeds of 70 the LSTM-based traffic congestion prediction system [43],
km/h or higher, ‘congested’ to 40 km/h ~ 70 km/h, and ‘very [44].

FIGURE 9. Pre-processing system for LSTM-based traffic congestion prediction

FIGURE 10. LSTM-based traffic congestion prediction system

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

B. Comparative evaluation of performance according to by subtracting the actual value from the predicted value and
the missing value correction method dividing the result by the actual value; this quantity is summed
If a model learns on data that includes missing values, the for all of the observations, and the sum is dividing by 𝑛. The
predication ability can be diminished. For this reason, it is lower a MAPE value is, the higher the model accuracy is.
necessary to correct missing values, and the model accuracy
may change according to the correction method.
𝑛
We therefore evaluated the performance of our correction 100 𝐴𝑖 −𝐹𝑖
𝑀𝐴𝑃𝐸 = ∑ | | (2)
𝑛 𝐴𝑖
methods through repeated experiments varying the missing 𝑖=1
rate. In the experiments, historical imputation methods (HIM)
and nearest neighbor imputation (NIM) are used as In addition, the data used in the experiment is the traffic data
conventional missing value correction methods for of a day. The data includes data on an urban area with high
comparison with the missing value correction method congestion and a suburban area with relatively low congestion.
proposed in this study. The performance comparison was The performance of the LSTM model for congestion
conducted through the data missing rate based MAPE. The prediction was evaluated using uninterrupted and interrupted
data missing rate ranged from 10% to 90% in increments of flow regions. An uninterrupted flow region has no external
10%. Fig. 11 shows the results of the performance evaluation influences that control the traffic flow. An interrupted flow
for each of the missing value correction methods. region refers to a region with interrupted traffic flow that has
crossroads and trunk lines that cause interruptions due to
traffic signals or traffic control facilities. An example of an
uninterrupted flow region is a suburban area with highways,
while an example of an interrupted flow region is an urban
area with traffic signals and traffic control facilities. Fig. 12
shows a graph of the MAPE results for suburban and urban
areas. For the suburban areas, three regions were extracted,
and the northbound and southbound speeds were predicted. As
shown in the graphs of the prediction results, the average
MAPE was approximately 4.297%. As for the suburban areas,
three regions were extracted from the urban areas, and the
northbound and southbound speeds were predicted. The
average MAPE for the urban areas was approximately 6.087%.
The urban areas showed a somewhat lower accuracy than the
FIGURE 11. Results of performance evaluation according to missing value
correction method suburban areas, and the reasons for this were analyzed. The
suburban areas included fewer surrounding buildings and no
As shown in Fig. 11, the proposed method performed better traffic signals, and the speed limit within these regions was
in terms of MAPE than the conventional missing value higher than in the urban areas. By contrast, the urban areas
correction methods. HIM corrects temporal missing values but included numerous buildings, the large influence of a floating
fails to correct spatial missing values. In addition, its population other than drivers, traffic signals at crossroads, and
numerous variables interrupting the traffic flow. For these
performance deteriorates when a large proportion of the data
reasons, it is more difficult to predict the traffic flow in urban
is missing. Unlike the HIM, the NIM cannot correct the
areas. In addition, Fig. 13 shows the results of a comparative
temporal missing value, but it is possible to correct the data
analysis of the actual and predicted values in terms of the
when the data in the neighboring space is not recorded. In MAPE for three sections of the city center, while Fig. 14
contrast, the data correction method proposed in this study is shows the same comparison for three sections on the outskirts
able to correct both spatial and temporal data and exhibits of the city. However, the MAPE reduces the denominator as
excellent performance in terms of MAPE. the actual measurement approaches 0. This results in a
significant increase in Absolute Percentage Error (APE) even
C. Performance evaluation of prediction model if the absolute error value is small, resulting in a biased value
For the performance evaluation and loss function of the model when the average is taken. Therefore, RMSE (Root Mean
used in this study, the MAPE was used [45], [46]. The MAPE Squared Error) and MAE (Mean Absolute Error) is used for
can be applied to overcome the effect of size-dependent error measuring performance in order to prevent the distortion of
and represents the mean of the absolute error between the overall prediction performance. MAE calculates results
actual and predicted values. It was used for the loss function through identical standards in different circumstances. Also,
because it is sensitive to small values in low-speed regions RMSE reduces distortion through route about errors
such as congested areas. It was also used for the performance dependent on size, which is the problem of MSE (Mean
evaluation of the proposed method. The MAPE can be Squared Error), and displays the average of errors themselves
calculated by Equation (2), where 𝐴𝑖 is an actual value and 𝐹𝑖 intuitively.
is the predicted value. The MAPE is expressed as a percentage
8

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

FIGURE 12. MAPE results of suburban and urban areas

FIGURE 13. Analysis of prediction results for urban areas

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

FIGURE 14. Analysis of prediction results for suburban areas

FIGURE 15. RMSE results of suburban and urban areas

10

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

FIGURE 16. MAE results of suburban and urban areas

In this study, the performances are compared between urban performed worse than the LSTM models. This is because the
area and suburban area to evaluate the performances of RNN has the problem of long-term dependency. According to
prediction. Figure 13 and 14 show the results of performance the comparison, there is performance improvement of 0.97 for
evaluation through RMSE and MAE of urban and suburban the proposed model over that of Mou et al. [27]. The LSTM
areas. In the results of performance evaluation through RMSE model used in this study is therefore good for traffic
in Figure 13, Southbound of Seocho-daero shows the best congestion prediction, since it accounts for temporal features.
performance, which is 1.543. The Northbound of National
TABLE 2. Evaluation of model goodness-of-fit in comparison with
Route 47 shows relatively low performance, which is 5.524. different models
The results of 12 routes of MAE show 3.27 in average. MAE Reference Model MAPE
in Figure 14 shows the best performance in the Southbound of 7.22
Sun et al. [26] RNN
Seocho-daero like RMSE, and the Northbound of Teheran-ro Mou et al. [27] LSTM 6.09
shows the lowest performance, which is 3.83. The results of Yu et al. [28] STGCN 6.43
12 routes of MAE show 2.24 in average. The results through Ranjan et al. [47] LSTM 6.81
MAPE show that the congestion in urban areas has poor Zheng et al. [48] LSTM 6.72
prediction performance. But, the results of RMSE and MAE Zhao et al. [49] LSTM 9.70
show that the performance of some suburban areas is poorer Current work LSTM 5.12
than the prediction of urban areas’ congestion. This is because
MAE and RMSE do not depend on the speed values or VII. CONCLUSION
situation changes in urban areas with high congestion level In this study, an LSTM-based traffic congestion prediction
and suburban areas where congestion level is not high, but are method using a correction for missing temporal and spatial
the results of calculation through identical standards. data was proposed. Based on experimental results, outliers and
Therefore, when the three performance evaluation indexes are missing values in the traffic data influenced the prediction
comprehensively analyzed, the prediction performance of results. To improve the model performance, the outliers were
urban areas except the Northbound of Teheran-ro is mostly removed, and the data were pre-processed using spatial and
better than that of suburban areas. temporal trends and pattern data. As a predictive model,
LSTM was applied. It is derived from the RNN model and
D. Evaluation of model goodness-of-fit in comparison solves the problem of long-term dependency. In the LSTM
with different models model, the result of a hidden layer is passed into the same
To demonstrate the reliability of the model proposed in this hidden layer as an input. Because the model considers
study, the goodness-of-fit of the model was evaluated. The sequential or temporal aspects, it can be applied to learn the
proposed model was compared with other models presented in time-series features of traffic data. In an experiment to
relevant studies. The data used for the comparison was evaluate the model performance, suburban areas were used as
preprocessed by the method proposed in this study. The an example of uninterrupted flow regions and urban areas as
performance index used for the comparison was the MAPE. an example of interrupted flow regions. The suburban areas
The models used for comparison are RNN, LSTM, and were less influenced by the traffic flows with external
STGCN models. Table 2 presents the prediction results for the interference than the urban areas, and therefore had fewer
different data and models in the comparison. As shown in variables at the time of prediction. The model thus
Table 2, in terms of the MAPE, the proposed method had demonstrated higher prediction accuracy for suburban areas.
better goodness-of-fit than the other methods. The RNN In comparison with relevant models, the proposed method was
found to achieve better performance with a difference in the
11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

MAPE of 3%–17%. As a future study, we plan to increase the LVCSR," Proc. IEEE Int. Conf. Acoust. Speech Signal Process.
accuracy of the traffic congestion prediction in low-speed (ICASSP), pp. 8614-8618, 2013.
regions and urban areas and to establish a model with better [15] K. Weinberger and L. Saul, "Distance Metric Learning for
user performance. Large Margin Nearest Neighbor Classification", The J.
Machine Learning Research, vol. 10, pp. 207-244, 2009.
REFERENCES
[16] M. Mejdoub and C. Ben Amar, "Classification improvement of
[1] Mihyun Chung and Jaehyoun Kim, "The internet information local feature vectors over the KNN algorithm," Multimedia
and technology research directions based on the fourth Tools and Applications, vol. 64, no. 1, pp. 197-218, 2013.
industrial revolution," KSII Transactions on Internet &
[17] M. Schuster and K. K. Paliwal, "Bidirectional Recurrent Neural
Information Systems., vol. 10, no. 3, 2016.
Networks," IEEE Transactions on Signal Processing, vol. 45,
[2] R. Coppola and M. Morisio, "Connected Car: Technologies pp. 2673-2681, 1997.
Issues Future Trends," ACM Computing Surveys, vol. 49, no. 3,
[18] H. Sak, A. Senior and F. Beaufays, "Long short-term memory
pp. 46, 2016.
recurrent neural network architectures for large scale acoustic
[3] D. Gunnarsson, S. Kuntz, G. Farrall, A. Iwai and R. Ernst, modeling," Proceedings of the Annual Conference of
"Trends in Automotive Embedded Systems," CASES’12: International Speech Communication Association
Proceedings of the 2012 ACM International Conference on (INTERSPEECH), 2014.
Compiliers Architectures and Synthesis for Embedded Systems,
[19] X. Ma, Z. Tao, Y. Wang et al., "Long short-term memory neural
2012.
network for traffic speed prediction using remote microwave
[4] S. Oh, and K. C, “Performance Evaluation of Silence-Feature sensor data," Transport. Res. C Emerging Technol., vol. 54, pp.
Normalization Model using Cepstrum Features of Noise 187-197, 2015.
Signals,” Wireless Personal Communications, vol. 98, no. 4,
[20] J. C. Kim and K. Chung, "Prediction model of user physical
2018.
activity using data characteristics-based long short-term
[5] D. George and P. Demestichas, "Intelligent transportation memory recurrent neural networks," KSII Trans. Internet Inf.
systems," IEEE Veh. Technol. Mag., vol. 5, no. 1, pp. 77-84, Syst., vol. 13, no. 4, pp. 2060-2077, Apr. 2019.
Mar. 2010.
[21] J.-C. Kim and K. Chung, "Associative feature information
[6] S.-H. An, B.-H. Lee and D.-R. Shin, "A survey of intelligent extraction using text mining from health big data," Wireless
transportation systems," Proc. Int. Conf. Comput. Intell., pp. Pers. Commun, vol. 105, no. 2, pp. 691-707, Mar. 2019.
332-337, Jul. 2011.
[22] D. Ni, J. D. Leonard, A. Guin and C. Feng, "Multiple
[7] M. Böhm, S. Fuchs, R. Pfliegl and R. Kolbl, "Driver behavior imputation scheme for overcoming the missing values and
and user acceptance of cooperative systems based on variability issues in ITS data", J. Transp. Eng., vol. 131, no. 12,
infrastructure-to-vehicle communication," Proc. Transp. Res. pp. 931-938, Dec. 2005.
Rec., pp. 136-144, 2009.
[23] X. Luo, X. Meng, W. Gan and Y. Chen, “Traffic Data
[8] Ge, W., Cao, Y., Ding, Z., Guo, L., “Forecasting Model of Imputation Algorithm Based on Improved Low-Rank Matrix
Traffic Flow Prediction Model Based on Multi-resolution SVR.” Decomposition,” Journal of Sensors, 2019.
In Proc. of the 2019 International Conference on Innovation in
[24] J. Chen and J. Shao, "Nearest neighbour imputation for survey
Artificial Intelligence, pp. 1-5, Mar. 2019.
data," J. Off. Stat., vol. 16, no. 2, pp. 113-131, 2000.
[9] F. G. Habtemichael and M. Cetin, "Short-term traffic flow rate
[25] L. Beretta and A. Santaniello, "Nearest neighbor imputation
forecasting based on identifying similar traffic patterns,
algorithms: a critical evaluation," BMC medical informatics
"Transp. Res. C Emerg. Technol., vol. 66, pp. 61-78, May 2016.
and decision making, pp. 74, 2016.
[10] W. Zhang, Y. Yu, Y. Qi, F. Shu and Y. Wang, "Short-term
[26] S. Sun, J. Chen and J. Sun, "Traffic congestion prediction based
traffic flow prediction based on spatio-temporal analysis and
on GPS trajectory data," Int. J. Distrib. Sensor Netw, vol. 15,
CNN deep learning," Transportmetrica A Transp. Sci., vol. 15,
no. 5, 2019.
pp. 1688-1711, 2019.
[27] L. Mou, P. Zhao, H. Xie and Y. Chen, "T-LSTM: A long short-
[11] D. H. Hong and C. H. Hwang, "Support vector fuzzy regression
term memory neural network enhanced by temporal
machines," Fuzzy Sets Syst., vol. 138, no. 2, pp. 271-281, 2003.
information for traffic flow prediction," IEEE Access, vol. 7, pp.
[12] G. Santamaría-Bonfil, A. Reyes-Ballesteros and C. Gershenson, 98053-98060, 2019.
"Wind speed forecasting for wind farms: A method based on
[28] B. Yu, H. Yin and Z. Zhu, "Spatio-temporal graph
support vector regression," Renew. Energy, vol. 85, pp. 790-809,
convolutional networks: A deep learning framework for traffic
Jan. 2016.
forecasting," Proc. Int. Joint Conf. Artif. Intell, 2018.
[13] A. Krizhevsky, I. Sutskever and G. Hinton, "ImageNet
[29] USC Infolab. Accessed: Jun. 12, 2020. [Online]. Available:
Classification with Deep Convolutional Neural Networks,"
https://infolab.usc.edu/.
Proc. Neural Information and Processing Systems, 2012.
[30] Blue Signal. Accessed: Jun. 12, 2020. [Online]. Available:
[14] T. N. Sainath, A.-R. Mohamed, B. Kingsbury and B.
https://www.bluesignal.co.kr/.
Ramabhadran, "Deep convolutional neural networks for

12

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016469, IEEE Access

[31] A. Famili, W. Shen, R. Weber and E. Simoudis, "Data Congested States Prediction,” In International Conference on
Preprocessing and Intelligent Data Analysis," Intelligent Data Genetic and Evolutionary Computing, pp. 399-406, 2019.
Analysis, vol. 1, pp. 3-23, 1997.
[49] J. Zhao, Y. Gao, Z. Bai, H. Wang and S. Lu, "Traffic Speed
[32] S. Garcia, J. Luengo and F. Herrera, Data Preprocessing in Data Prediction Under Non-Recurrent Congestion: Based on LSTM
Mining, Springer, 2015. Method and BeiDou Navigation Satellite System Data," in
IEEE Intelligent Transportation Systems Magazine, vol. 11, no.
[33] T. Pham-Gia and T. L. Hung, "The mean and median absolute 2, pp. 70-81, 2019.
deviations," Math. Comput. Modeling, vol. 34, no. 7-8, pp. 921-
936, 2001.
[34] P. J. Rousseeuw and C. Croux, "Alternatives to the median
absolute deviation," J. Amer. Statist. Assoc., vol. 88, no. 424,
pp. 1273-1283, Dec. 1993.
[35] Z. Zhao, W. Chen, X. Wu, P. C. Y. Chen and J. Liu, "LSTM
network: A deep learning approach for short-term traffic Dong-Hoon Shin has received B.S. degree from
the Department of Computer Engineering,
forecast," IET Intell. Transp. Syst., vol. 11, no. 2, pp. 68-75, Jan. Dongseo University, South Korea in 2019. He is
2017. currently in the Master course of Department of
[36] Luo Xianglong, Jiao Qinqin and Niu Liyao, "Short-term traffic Computer Science, Kyonggi University, Suwon,
South Korea. He has been a researcher at Data
flow prediction based on deep learning," Computer application Mining Lab., Kyonggi University. His research
research, vol. 34, no. 1, pp. 91-97, Jan. 2017. interests include Data Mining, Artificial
Intelligent, Healthcare, Biomedical and Health
[37] R. A. Jacobs, "Increased Rates of Convergence Through
Informatics, Knowledge System, VR/AR, and
Learning Rate Adaptation," Neural Networks, vol. 1, no. 4, pp. Deep Learning.
295-308, 1988.
[38] Y. Gal, "A theoretically grounded application of dropout in
recurrent neural networks," Adv. Neural Inform. Process. Syst.,
pp. 1019-1027, 2016.
[39] Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow and A.
Y. Ng, "On optimization methods for deep learning," Proc. 28th
Kyungyong Chung has received B.S., M.S., and
Int. Conf. Machine Learning, pp. 265-272, 2011.
Ph.D. degrees in 2000, 2002, and 2005,
[40] M. Denil, B. Shakibi, L. Dinh, N. de Freitas and M. Ranzato, respectively, all from the Department of
"Predicting parameters in deep learning," Proc. Adv. Neural Inf. Computer Information Engineering, Inha
University, South Korea. He has worked for
Process. Syst., pp. 2148-2156, 2013.
Software Technology Leading Department,
[41] Ministry of Land, Infrastructure and Transport. Accessed: Jun. Korea IT Industry Promotion Agency (KIPA).
12, 2020. [Online]. Available: http://openapi.its.go.kr/. From 2006 to 2016, he was a professor in the
School of Computer Information Engineering,
[42] Intelligent Traffic System Standard Node Link Management Sangji University, South Korea. Since 2017, he is
System. Accessed: Jun. 12, 2020. [Online]. Available: currently a professor in the Division of Computer
http://nodelink.its.go.kr/. Science and Engineering, Kyonggi University, Suwon, South Korea. His
research interests include Data Mining, Artificial Intelligent, Healthcare,
[43] J.-C. Kim and K. Chung, “Emerging Risk Forecast System Biomedical and Health Informatics, Knowledge System, HCI, and
using Associative Index Mining Analysis,” Cluster Comput., Recommendation System.
vol. 20, no. 1, pp. 547-558, Mar 2017.
[44] K. Chung and R. C. Park, "Cloud based u-healthcare network
with QoS guarantee for mobile health service," Cluster Comput.,
vol. 22, no. 1, pp. 2001-2015, Jan. 2019.
[45] A. De Myttenaere, B. Golden, B. Le Grand and F. Rossi, "Mean
absolute percentage error for regression models," Roy C. Park has received the B.S. degrees from
Neurocomputing, vol. 192, pp. 38-48, 2016. Dept. of Industry Engineering, and M.S., Ph. D.
degrees from Dept. of Computer Information
[46] U. Khair, H. Fahmi, S. A. Hakim and R. Rahim, "Forecasting Engineering, Sangji University, South Korea, in
error calculation with mean absolute deviation and mean 2010 and 2015. From 2015 to 2018, he was a
absolute percentage error," Proc. J Phys.: Conf. Ser., 25-26 Aug. professor in the Division of Computing
2017. Engineering, Dongseo University, Korea. Since
2019, he is currently a professor in the
[47] N. Ranjan, S. Bhandari, H. P. Zhao, H. Kim and P. Khan, "City- Department of Information Communication
Wide Traffic Congestion Prediction Based on CNN, LSTM and Software Engineering, Sangji University, Wonju,
Transpose CNN," in IEEE Access, vol. 8, pp. 81606-81620, South Korea. His research interests include
WLAN System, Heterogeneous Network, Ubiquitous Network Service,
2020. Human-Inspired Artificial Intelligent and Computing, Health Informatics,
[48] Y. Zheng, L. Liao, F. Zou, M. Xu and Z. Chen, “PLSTM: Long Knowledge System, Peer-to-Peer, and Cloud Network.
Short-Term Memory Neural Networks for Propagatable Traffic

13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like