Sag Heer 2019
Sag Heer 2019
Sag Heer 2019
Zidong Wang
Accepted Manuscript
PII: S0925-2312(18)31163-9
DOI: https://doi.org/10.1016/j.neucom.2018.09.082
Reference: NEUCOM 20015
Please cite this article as: Alaa Sagheer, Mostafa Kotb, Time Series Forecasting of
Petroleum Production using Deep LSTM Recurrent Networks, Neurocomputing (2018), doi:
https://doi.org/10.1016/j.neucom.2018.09.082
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
T
Abstract
IP
Time series forecasting (TSF) is the task of predicting future values of a given sequence using historical data.
CR
Recently, this task has attracted the attention of researchers in the area of machine learning to address the
limitations of traditional forecasting methods, which are time-consuming and full of complexity. With the
increasing availability of extensive amounts of historical data along with the need of performing accurate pro-
US
duction forecasting, particularly a powerful forecasting technique infers the stochastic dependency between
past and future values is highly needed. In this paper, we propose a deep learning approach capable to address
AN
the limitations of traditional forecasting approaches and show accurate predictions. The proposed approach
is a deep long-short term memory (DLSTM) architecture, as an extension of the traditional recurrent neural
network. Genetic algorithm is applied in order to optimally configure DLSTM’s optimum architecture. For
M
evaluation purpose, two case studies from the petroleum industry domain are carried out using the production
data of two actual oilfields. Toward a fair evaluation, the performance of the proposed approach is compared
with several standard methods, either statistical or soft computing. Using different measurement criteria, the
ED
empirical results show that the proposed DLSTM model outperforms other standard approaches.
Keywords: Time Series Forecasting, Deep Neural Networks, Recurrent Neural Networks, Long-Short Term
PT
1. Introduction the past, the TSF problem has been influenced by lin-
CE
tion and have established themselves as serious con- more effective use of the parameters of each LSTM’s
tenders to statistical methods in the forecasting com- layer in order to train the forecasting model effi-
munity after they showed better prediction accura- ciently. It works as follows: each LSTM layer op-
cies [5]. Given the several ANN algorithms, iden- erates at different time scale and, thereby, processes
tifying a specific ANN algorithm for a forecasting a certain part of the desired task and, subsequently,
task should be based on a compromise among three passes it on to the next layer until finally the last layer
aspects; namely, the complexity of the solution, the generates the output [12, 13].
desired prediction accuracy, and data characteristics Thus, we can attribute the benefit of stacking
T
[5]. Considering the first two aspects, i.e. precision more than LSTM layer to the recurrent connections
and complexity, the best results are obtained by the between the units in the same layer, and the feed-
IP
Feed Forward NN predictor, in which the informa- forward connections between units in an LSTM layer
CR
tion goes through the network in the forward direc- and the LSTM layer above it [13, 14]. This ensures
tion only. However, on the addition of the third as- an improved learning with more sophisticated condi-
pect, i.e. the data characteristics, Recurrent Neural tional distributions of any time series data. Also, it
Network (RNN) is found to be more suitable than
FFNN [6].
In RNN, the activations from each time step are
US can perform hierarchical processing on difficult tem-
poral tasks, and more naturally, capture the structure
of data sequences [11].
AN
stored in the internal state of the network in order Towards fair evaluation, here we in this study
to provide a temporal memory property [7]. How- train and validate the DLSTM model through more
ever, the most major weakness of RNN is carried than a scenario, where we have used the genetic algo-
M
out during the requirement of learning long-range rithm in order to optimally design and configure the
time dependencies [7, 8]. To overcome this draw- best DLSTM architecture and parameters. Concur-
ED
back, Hochreiter et al. [9] developed the Long Short- rently, we compare the DLSTM’s performance with
Term Memory (LSTM) algorithm as an extension to the performance of other reference models using the
RNN [8, 10]. Despite the advantages cited for LSTM same datasets, and same experimental conditions via
PT
and its predecessor RNN, their performances for TSF different error measures. The reference models vary
problem are not satisfactory. Such shallow architec- from statistical methods, neural networks (shallow
CE
tures, can not represent efficiently the complex fea- and deep) methods, and hybrid (statistical and neu-
tures of time series data, particularly, when attempt- ral networks) methods.
ing to process highly nonlinear and long interval time The remainder of the paper is organized as fol-
AC
series datasets [8, 11]. lows: Section 2 describes the TSF problem and asso-
In this paper, we propose that a Deep LSTM (DL- ciated works in the oil and petroleum industry. The
STM) architecture can adapt with learning the non- proposed DLSTM model is presented in section 3.
linearity and complexity of time-series data. The Section 4 shows the experiment settings of this pa-
proposed deep model is correspondingly an exten- per. The experimental results of two case studies are
sion of the original LSTM model, where it includes shown in section 5. Discussion and analysis of the
multiple LSTM layers such that each layer contains results are provided in section 6 and, finally, the pa-
multiple cells. The proposed model demonstrates a per is concluded in section 7.
2
ACCEPTED MANUSCRIPT
T
unsolved problem with numerous potential applica- 2.2. Related Works
IP
tions [1]. For this reason, time series forecasting is
Several approaches have been developed to over-
considered as one of the top ten challenging prob-
CR
come the aforementioned petroleum TSF challenges,
lems in data mining due to its unique properties
however, yet the key for a successful forecasting
[15]. In this paper, we focus on the TSF problem
lies in choosing the right representation among these
of petroleum fields production.
2.1. Overview of Petroleum TSF US approaches [11]. These approaches can be classi-
fied into two broad categories; namely, statistical
approaches, and soft computing approaches. One
AN
Forecasting of the petroleum production is a very
of the most common traditional statistical methods
pertinent task in the petroleum industry, where the
is the Autoregressive Integrated Moving Average
accurate estimation of petroleum reserves involves
(ARIMA) [22].
M
methods are based mainly on the analysis of subjec- bines the kernel trick with the Arps exponential de-
tive data types. In other words, they pick the proper cline equation. It is worthy to mention that, in this
slope, and subsequently tune in the parameters of the paper we conduct a comparison with both Chekra
numerical simulation model, in such a way that the [18] and Ma [28] approaches since both of these con-
reasonable values are retained, and finally, they are tributions present application of the same case stud-
able to provide interpretations of the oilfield’s geol- ies as described in this paper.
ogy [25]. But the oilfield’s geology and fluid proper-
ties of the oilfields are highly nonlinear and hetero- 2.3. Motivation
T
geneous in nature, thus yielding time series data that Although the soft computing methods, that em-
represent a long memory process. Certainly these
IP
ploy different ANN algorithms, are used to re-
properties represent big challenges for traditional ap- cover the aforementioned limitations of the statistical
CR
proaches, which still are far from estimating the ac- methods and yield more accurate forecasting rates,
curate future production of petroleum [17] [11]. they are observed to still face some challenges. It
Since the past decade, sincere efforts have been is demonstrated that the traditional ANNs with shal-
evidently published in the literature presenting the
use of soft computing methods to achieve different
forecasting activities in a number of petroleum en-
US low architectures are devoid of sufficient capacity to
accurately model the aforementioned complexity as-
pects of time series data, such as, high nonlinearity,
AN
gineering applications. In 2011, Berneti et al. pre- longer intervals, and big heterogeneous properties
sented an imperialist competitive algorithm using [28, 29]. This reason, and more, motivated us to
ANN to predict oil flow rate of the oil wells [26]. solve the TSF problem using a Deep Neural Network
M
In 2012, Zhidi et al. combined wavelets transfor- (DNN) architecture instead of shallow NN architec-
mation with ANN in order to establish a production- ture models. DNNs models are termed deep because
ED
predicting model that used drill stem to test produc- they are constructed by stacking multiple layers of
tion and wavelet coefficients [27]. In 2013, Chakra et nonlinear operations one top of one another with sev-
al. presented an innovative higher-order NN model eral hidden layers [30].
PT
[18].
Prior to introducing the proposed model, it is es-
More recently in 2016, Aizenberg et al. pre-
sential to describe briefly the original LSTM as in
sented a multilayer NN with multi-valued neurons
AC
4
ACCEPTED MANUSCRIPT
RNN is the network delay recursion, which enables ory cell), which looks like a conveyor belt. It runs
it to describe the dynamic performance of systems straight down the entire chain with the ability to add
[6]. The signal delay recursion makes the output or remove information to the cell state, carefully reg-
of the network at time t associate not only with the ulated by structures called gates. The gates are ways
input at time t but also with recursive signals before for optional inlet of information. They are composed
time t, as shown in Fig. 1. However its capability of a sigmoid neural net layer and a pointwise multi-
to process short-term sequential data, the weakness plication operation as depicted in Fig. 2. An input
of RNN is carried out when learning long-range at time step t is (Xt ), and the hidden state from the
T
dependencies, or long-term context memorization, previous time step (S t−1 ) that is introduced to LSTM
is demanded in time series forecasting applications block, and then the hidden state (S t ) is computed as
IP
[8, 10]. follows:
CR
• The first step in LSTM is to decide what
information is going to be thrown away from
the cell state. This decision is made by the
RNN variants, the Long Short-Term Memory This step has two folds: First, the input gate (it )
(LSTM) model is the elegant RNN’s variant, which layer decides which values to be updated. Sec-
ED
uses the purpose-built LSTM’s memory cell in or- ond, a tanh layer that creates a vector of new
der to represent the long-term dependencies in time candidate values C̃t . These two folds can be
series data [31]. In addition, LSTM is introduced described as follows:
PT
T
IP
CR
Fig. 2. LSTM block, where ft , it , ot are forget, input, and output gates respectively
cell state are going to be produced as output. shown in Fig. 3, one after another connected in a
Then, the cell state goes through tanh layer (to
push the values to be between -1 and 1) and
US deep recurrent network fashion to combine the ad-
vantages of a single LSTM layer. The goal of stack-
multiply it by the output gate as follows:
AN
ing multiple LSTM in such a hierarchical architec-
ture is to build the features at the lower layers that
ot = σ(Xt U o + S t−1 W o + bo ) (5) will disentangle the factors of variations in the input
M
2. Recurrent weights: W f , W i , W o , W c .
CE
3. Bias: bf , bi , bo , bc .
the input at time t, Xt is introduced to the first LSTM 4.1.1. Data preprocessing
(1)
block along with the previous hidden state S t−1 , the The data used in this paper are the raw production
superscript (1) refers to the first LSTM. The hidden data of two actual oilfields, so it is highly possible to
state at time t, S t(1) is computed as shown in section include noise as in influencing factor. As such, it
3.1 and goes forward to the next time step and also is not appropriate to use the raw production data in
goes upward to the second LSTM block. The sec- the learning of NN because NN requires extremely
ond LSTM uses the hidden state S t(1) along with the low learning rates. Thus, a preprocessing scenario
(2)
previous hidden state S t−1 to compute S t(2) , that goes consists of four steps has been incorporated before
T
forward to the next time step and upward to the third the use of the raw production data in the experiments
LSTM block and so on, until the last LSTM block is of this paper.
IP
compiled in the stack.
The benefit of such stacked architecture is that Step 1: Reduce noise from raw data
CR
each layer can process some part of the desired task To smoothen the raw data and remove any pos-
and subsequently pass it on to the next layer until fi- sible noise we will use the moving average fil-
nally the last accumulated layer provides the output.
Another benefit, such architecture allows the hidden
state at each level to operate at a different timescale.
US ter as a type of low pass filter in the analogous
way as described in [18]. Specifically, this fil-
ter provides a weighted average of past data
AN
The last two benefits have great impact in scenar- points in the time series production data within
ios showing the use of data with long-term depen- a time span of five-points to generate smoothed
dency or in case of handling multivariate time series estimation of a time series. This step is impera-
M
In this section, we will show in detail all the ex- Step 2: Transform raw data to stationary data
PT
previous time step (t-1) is subtracted from the 4.1.3. Training of DLSTM
current observation (t) [21]3 . In the training phase of DLSTM experiments,
Step 3: Transform data into supervised learning we use the Genetic Algorithm (GA) to infer optimal
selection for the proposed model hyper-parameters.
We use one-step ahead forecast, where the next
We implemented the GA using Distributed Evolu-
time step (t+1) is predicted. We divide the time
tionary Algorithms in Python (DEAP) library [34].
series into input (x) and output (y) using lag
The number of hyper-parameters is based on the
time method, and specifically, in the study we
implementation scenario. For the static scenario,
T
have used different sizes of lag from lag1 to
there are three hyper-parameters, namely, number
lag6.
IP
of epochs, number of hidden neurons, and the lag
Step 4: Transform data into the problem scale size. For the dynamic scenario, there are four hyper-
CR
parameters, the same three hyper-parameters of the
Like other neural networks, DLSTM expects
static scenario, plus the number of updates, which is
data to be within the scale of the activation
the number of times we update our forecasting model
function used by the network. The default ac-
tivation function for LSTM is the hyperbolic US
tangent (tanh), wherein its output values lie be-
each time step when new observations from the test-
ing data are inserted. This methodology is typically
AN
adopted in the experiments of other neural networks
tween -1 and 1. This is the preferred range for
in the reference models.
the time series data. Later on, we transformed
the scaled data back in order to return the fore- 4.2. Reference Models
M
of these parameters [36]. The AIC measures how forward multilayer neural network model that em-
well a model fits with the data in consideration of ploys what is called higher-order synaptic operations
the overall model complexity. (HOSO). HOSO of HONN embraces the linear cor-
relation (conventional synaptic operation) as well as
2. The Vanilla RNN model the higher-order correlation of neural inputs with
The comparison with the vanilla RNN model synaptic weights [18]. For the comparison purpose,
represents a machine learning-based comparison. we will rely on the results introduced by the authors
The original RNN is already covered briefly in of [18] by exclusively using the second case study,
T
section 3.1. For comparison purpose, we will since they did not apply their method on the first case
implement two RNN reference models, one with study described in this paper.
IP
single hidden layer and the other one with multiple
4.3. Forecasting accuracy measures
CR
hidden layers [13, 14].
In the literature, two kinds of errors are usually
3. The DGRU model measured in order to estimate the forecasting pre-
The comparison with the Deep Gated Recurrent
Unit (DGRU) model represents a deep learning-
based comparison, where DGRU is a counterpart of
US cision and performance evaluation of the forecasts,
namely, scale-dependent errors and percentage
errors.
AN
DLSTM. It is demonstrated that, the GRU model
is similar to the original LSTM model with the (1) Scale-dependent errors
exception that GRU includes only two gates rather These errors are on the same scale as the data itself.
M
than three [37]. The experiments of DGRU are Therefore, as a limitation, the accuracy measures
typically similar to that of DLSTM. that are based directly on this error cannot be used
ED
for linear Arps decline (NEA) model represents a mean square error (RMSE) [38], which can be given
hybrid-based comparison. NEA is a hybrid method as follows:
CE
9
ACCEPTED MANUSCRIPT
scaled datasets. The most commonly used measure China [28]5 . The dataset of this oilfield contains 227
is the root mean square percentage error (RMSPE) observations of the oil production data, in which the
[38], which can be given as follows: first 182 observations (80% of dataset) have been
v
t used to build, or train, the forecasting models, and
1 X yi − yobs 2
n pred
i
the remaining 45 observations (20% of the dataset)
RMSPE = × 100 (8)
have been used for testing the performance of the
n i=1 yobs
i
forecasting models.
It is clear that, both measures are calculated by The best performance results of the proposed
T
comparing the target values for the time series and DLSTM static scenario, DLSTM dynamic scenario,
its corresponding time series predictions. The results single-RNN, Multi-RNN, and DGRU are shown sep-
IP
obtained using both metrics are different in their cal- arately in Tables 1, 2, 3, 4, and 5, respectively. Each
CR
culated values, but the significance of each metric of these five tables show the values of each hyper-
is similar in performance measurement of the pre- parameter, which has been optimally selected using
diction models. Notably, since the production data the GA as described in section 4.1.3. The relation
indicate the performance of the corresponding model Basin oil field in India
in the testing data rather than training data. This has
As the previous case study, we examined the pro-
CE
10
ACCEPTED MANUSCRIPT
T
No. of No. of hidden No.of
lag update RMSE RMSPE
IP
layer units Epochs
1 [3] 1352 3 1 0.267 3.783
2 [4,5] 1187 5 1 0.219 3.124
CR
3 [4,3,3] 403 5 2 0.257 3.637
No. of No.of
lag RMSE RMSPE
US No. of
layer units
Table 4: Best results of Multi-RNN
history. The authors in [28] and [18] considered only of this oilfield. The relationship between the five in-
the cumulative oil production data from five wells; put series and the output series has been reported to
out of these eight wells. Thus implying the availabil- be highly nonlinear [18].
ity of five input series corresponding to the monthly Accordingly, and toward fair evaluation, in the
production of the five oil wells, plus an output se- experiments of this case study we will consider also
ries as corresponding to the cumulative production the same cumulative data of the same five wells. We
11
ACCEPTED MANUSCRIPT
Training data
Testing data
T
IP
Fig. 4. production data v.s. prediction using DLSTM-static Fig. 5. production data v.s. prediction using DLSTM-dynamic
CR
will follow the same experimental scenario described the HONN model [18], described in section 4.2. In
in [28] and [18] by dividing the production dataset their paper, the authors used three measures to eval-
into two sets, i.e. first set (70% of data set) to be uate their model and these include MSE, RMSE, and
used to build the forecasting models, and second set
(30% of the data set) to be used for testing the perfor-
US MAPE. In the current paper, we have used the RMSE
(RMSE is the root of MSE) as described in section
AN
mance of the forecasting models. The results of each 4.3. Subsequently, in this comparison we calculate
model shown in this section are based on the testing the MAPE measure within our model to compare
data. with the MAPE results of HONN shown in [18]. The
M
The best performance results of the proposed MAPE, as a percentage error measure, can be com-
DLSTM static scenario, DLSTM dynamic scenario, puted as follows:
single-RNN, Multi-RNN, and DGRU are shown sep-
ED
1 X |yi − yobs
n pred
arately in Tables 7, 8, 9, 10, and 11, respectively. i |
MAPE = × 100 (9)
Each table of these five tables shows the values of
n i=1 yobs
i
PT
each hyper-parameter, which optimally selected us- Table 13 shows the comparison between the HONN
ing the GA as described in section 4.1.3. The re- model and the proposed DLSTM model based on the
lation between the original production data and their
CE
among these five models along with the best param- different lags in their experiments, and the best result
eter combinations of ARIMA method and the best as highlighted by them was inferred using lag 1 [18]8
performance results of NEA reported in [28] using which is included in Table 13.
the same data set. The NEA results shown in Table
12 are imparted as they are given by the authors of 6. Results Analysis and Discussion
[28] where they did not consider the RMSE measure.
In this paper, we tried to ensure a genuine eval-
This case study provides an extra comparison
uation for the proposed model against five different
where we compare the proposed DLSTM model with
8
see table 3 in [18]
12
ACCEPTED MANUSCRIPT
T
No. of No. of hidden No.of
lag update RMSE RMSPE
layer units Epochs
IP
1 [5] 1259 6 4 0.029 4.219
2 [2,5] 1500 6 3 0.028 4.060
CR
3 [4,4,5] 1400 6 4 0.032 4.482
units
Table 9: Best results of Single-RNN
No. of No.of
Epochs
lag RMSE RMSPE
US No. of
layer
2
No. of hidden
units
[5,1]
No.of
Epochs
1514
lag
5
RMSE RMSPE
0.027 3.731
AN
[1] 1551 4 0.029 4.095 2 [2,4] 1551 5 0.028 4.125
[2] 1115 1 0.029 4.133 2 [2,2] 787 3 0.030 4.196
[1] 953 2 0.030 4.174 3 [1,1,3] 953 4 0.029 4.112
M
13
ACCEPTED MANUSCRIPT
T
IP
Fig. 6. production data v.s. prediction using DLSTM-static Fig. 7. production data v.s. prediction using DLSTM-dynamic
CR
types of comparison with state-of-the-art techniques are the global minimum values amongst the other ref-
using two real world datasets. More than one stan- erence models. In Tables 7, 8, 10, 11, 12, and 13 of
dard optimality criteria are used to assess the perfor- the other case study, we will notice the same pattern
mance of each model. It is widely demonstrated in
literature that the percentage error measures are the
US for all models, again, with a superiority for the DL-
STM over the other models.
AN
most appropriate tool to assess the performance of However, the DLSTM is the optimum among the
different forecasting models. It also presents the per- other counterparts, though it illustrates a light varia-
centage error capable to estimate the relative error tion in the hyper-parameters values, particularly the
M
between different models particularly when the sam- parameter ”number of layers”. In our opinion, this
ples of the time series data have different scales [39]. variation in the best hyper-parameter values between
Accordingly, in this section we will discuss and ana- the two case studies may be attributed to the higher
ED
lyze the results shown in the previous section where data samples in case study 1 than case study 2. In
we will focus on these results based on the percent- other words, DLSTM does not require large num-
PT
age error measure of each model. ber of layers in case the dataset size is not large. Of
course, as the number of data samples is going to be
6.1. Case 1 versus Case 2
bigger, essentially the performance of DLSTM going
CE
model shows more efficiency than ARIMA model outperforms the NEA model with a difference ap-
in predicting the future oil productions and in de- proaches to one point in case study 1. Namely, DL-
scribing the typical tendency of the oil production STM achieved 2.9 against 4.2 achieved by NEA,
as shown in Fig. 4, 5, 6, and 7. In contrast, the pre- whereas in case study 2, the DLSTM achieved 3.4
dicted values by ARIMA are quite far away from the against 4.2 achieved by NEA. This indicates that the
oil production points where the difference between DLSTM model is more accurate than the NEA model
both contenders approaches 2 points in first case. We in predicting the future oil production.
can estimate why the performance of ARIMA is not Superiority in performance is not the only ad-
T
well due to its linearity nature whereas the relation- vantage of DLSTM over NEA but also NEA perfor-
ship between inputs and outputs is not linear in such mance is evidenced to be highly dependent on the se-
IP
a production data. As a nonlinear model, DLSTM lection of several parameters, as explained by the au-
CR
could to describe smoothly the nonlinear relationship thors of [28]. Among these parameters, the most im-
between inputs and outputs. portant parameters, which may affect the NEA per-
formance includes:(i) the regularized parameter (γ),
6.3. DLSTM versus Other Recurrent NNs
against 3.7 for Multi-RNN and 4.0 for DGRU. The diction of oil production, several experiments should
same rates are approximately achieved in case study be conducted in order to find improved and suitable
ED
2 and table 12. However, the error differences are combinations of these parameters.
not so big among the three contenders, since all of Furthermore, the performance of these parame-
them have typical deep architecture, but still the pro- ters in training phase is totally reversed in testing
PT
posed DLSTM model shows better performance than phase. For example, the training errors are growing
the others. Of course, as the size of data is going to with larger (σ) , whereas it is decreased for testing
CE
be large, expressively the performance of DLSTM errors. The converse will be in the case of (γ) pa-
will be much better than RNN but may be similar to rameter, where training errors decrease with larger
DGRU. (γ) but the testing errors remain monotonic. If the
AC
that this model is similar to traditional multilayer ries forecasting problems. However, in this paper,
feed forward neural network. The difference here it is tested specifically in case of petroleum time se-
is that HONN employs what is called Higher-Order ries applications. The proposed model is a deep ar-
Synaptic Operations (HOSO). HOSO of HONN em- chitecture of the Long-Short Term Memory (LSTM)
braces the linear correlation (conventional synaptic recurrent network, where we denoted it as DLSTM.
operation) as well as the higher-order correlation of The paper empirically evidences that, stacking of
neural inputs with synaptic weights. In the paper of more LSTM layers ensures to recover the limitations
[18], different HOSO have been applied up to third- of shallow neural network architectures, particularly,
T
order, where the first-order, the second-order, and when long interval time series datasets are used. In
the third-order synaptic operations are called Linear addition, the proposed deep model can describe the
IP
Synaptic Operation (LSO), Quadratic Synaptic Op- nonlinear relationship between the system inputs and
CR
eration (QSO) and Cubic Synaptic Operation (CSO), outputs, particularly, if we knew that the petroleum
respectively [18]. The authors stated that the best time series data are heterogeneous and full of com-
HOSO operation is the third one (CSO). plexity and missing parts.
It seems that the computation of HONN is com-
plex where calculation of the activation function of
the model is a combination of the conventional linear
US Notably, in the two case studies described in this
paper the proposed model outperformed its counter-
parts deep RNN and deep GRU. In addition, the per-
AN
synaptic function plus the cubic synaptic operation. formance of the proposed DLSTM is observed to
In addition, most of parameters, such as time lag and be much better than the statistical ARIMA model.
number of neurons in the hidden layer, are adjusted The most important comparisons that conducted with
M
manually or based on trial and error. This means that two recent reported machine learning approaches,
the parameters selection should be adjusted carefully denoted as NEA and HONN, where DLSTM outper-
ED
to ensure accurate oil production forecasting. formed both of them with a noticeable difference on
Nevertheless, in Table 13 DLSTM continues the scale of two different percentage error measures.
to show better performance than HONN via the The accurate prediction and learning perfor-
PT
three error measures, particularly for percentage er- mance shown in the paper indicate that the proposed
ror measure. Namely, through the MAPE measure deep LSTM model, and other deep neural network
CE
the DLSTM achieved 2.8 against 3.4 for HONN. In models, are eligible to be applied in the nonlinear
our perspective, the optimality of DLSTM’s perfor- forecasting problems in the petroleum industry. In
mance can attribute to the recursive nature of DL- our future research plans, we will investigate the
AC
STM, against the feedforward nature of HONN. In- performance of DLSTM in other forecasting prob-
deed, the recursive property ensures more accurate lems especially when the problem includes multi-
prediction particularly when the dataset size going to variables (multivariate) time series data.
be large.
Acknowledgements
7. Conclusion
The authors of this paper would like to express
In this paper, we developed a promising predic- about their thank and gratitude to “Deanship of Sci-
tion model can be used in the majority of time se- entific Research” at King Faisal University, Saudi
16
ACCEPTED MANUSCRIPT
Arabia for their moral and financial support to this research, Int. J. Inf. Technol. Decis. Making 5 (2006) 597-
work under the research grant number (170069). 604.
[16] Rajesh Mehrotra, Regilal Gopalan, Factors infleuencing
strategic decision-making process for the oil/gas indus-
References triesof UAE-A study, Int. J. Mark. Financial Management
[1] De Gooijer, J.G., Hyndman, R.J., 25 years of time series 5 (2017) 62-69.
forecasting, Int. J. Forecast. 22(3) (2006) 443-473. [17] R. B. C. Gharbi and G. A. Mansoori, An introduction
[2] Poskitt, D.S., Tremayne, A.R., The selection and use of to artificial intelligence applications in petroleum explo-
linear and bilinear time series models, Int. J. Forecast. ration and production, J. Pet. Sci. Eng. 49 (2005) 93-96.
2(1) (1986) 101-114. [18] N. Chithra Chakra, Ki-Young Song, Madan M. Gupta,
T
[3] Tong, H., Non-linear Time Series: A Dynamical System Deoki N. Saraf , An innovative neural forecast of cumula-
IP
Approach, Oxford University Press, (1990). tive oil production from a petroleum reservoir employing
[4] Engle, R.F., Autoregressive conditional heteroscedasticity higher-order neural networks (HONNs), J. Pet. Sci. Eng.
106 (2013) 18-33.
CR
with estimates of the variance of United Kingdom infla-
tion, Econometrica 50(4) (1982) 987-1007. [19] R. Nyboe, Fault detection and other time series opportuni-
[5] Guoqiang Zhang, B. Eddy Patuwo, Michael Y. Hu, Fore- ties in the petroleum industry, Neurocomputing 73(10-12)
casting with artificial neural networks: The state of the (2010) 1987-1992.
art, Int. J. Forecast. 14 (1998) 35-62.
[6] M. Hü sken, P. Stagge, Recurrent neural networks for time
series classification, Neurocomputing 50 (2003) 223-235.
US [20] Luis Martı́, Nayat Sanchez-Pi, José Molina, and Ana
Garcia, Anomaly Detection Based on Sensor Data in
Petroleum Industry Applications, Sensors 15 (2015)
AN
[7] Bayer, Justin Simon, Learning Sequence Represen- 2774-2797
tations, Diss.Mnchen, Technische Universitt Mnchen, [21] Jonathan D. Cryer, Kung-Sik Chan, Time Series Analysis,
Diss., 2015. 2nd edition, Springer Texts in Statistics, Springer, New
York, 2008.
M
unsupervised feature learning and deep learning for time- Alireza Bahadori, Decline curve based models for pre-
series modeling, Pattern Recognit. Lett. 42 (2014) 11-24. dicting natural gas well performance, Pet. 3 (2017) 242-
[12] Michiel Hermans,Benjamin Schrauwen,Training and ana- 248.
[25] Igor Aizenberg, Leonid Sheremetov, Luis Villa-Vargas,
AC
17
ACCEPTED MANUSCRIPT
Neural Network, Advances in Computer Science and In- [39] R. J. Hyndman, measuring forecast accuracy, In: M.
formation Engineering. Advances in Intelligent and Soft Gilliland, L. Tashman, U. Sglavo, business forecasting:
Computing, 168 (2012). practical problems and solutions, John Wiley & Sons,
[28] Xin Ma, Predicting the oil production using the novel 2016, pp. 177-183.
multivariate nonlinear model based on Arps decline
model and kernel method, Neural Comput. Applic. 29
Dr. Alaa Sagheer received his B.Sc.
(2016) 1-13.
and M.Sc. in Mathematics from
[29] S. Ben Taieb, G. Bontempi , A. F. Atiya , A. Sorja- Aswan University, EGYPT. He got
maa, A review and comparison of strategies for multi- Ph.D. in Computer Engineering in
step ahead time series forecasting based on the NN5 fore- the area of Intelligent Systems from
T
the Graduate School of Information
casting competition, Expert Syst. Applic. 39 (2012) 7067-
Science and Electrical Engineering,
IP
7083. Kyushu University, Japan in 2007.
[30] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, Deep After receiving PhD, he served as an
learning, Nature 521 (2015) 436-444. Assistant Professor at Aswan Univer-
CR
sity. In 2010, Dr. Sagheer established
[31] Klaus Greff, Rupesh K. Srivastava, Jan Koutnk, Bas R.
and directed the Center for Artificial
Steunebrink, and Jrgen Schmidhuber, LSTM: A Search Intelligence and Robotics (CAIRO) at Aswan University. He
Space Odyssey, IEEE Trans. Neural Netw. Learn. Syst. served also as the Principal Investigator in CAIRO in several
[32]
28 (2017) 2222-2232.
Michiel Hermans, Benjamin Schrauwen, Training and
Analysing Deep Recurrent Neural Networks, in: Proceed-
US research and academic projects funded by different Egyptian
governmental organizations. In 2013, Dr. Sagheer and his team
won the first prize, in a programming competition organized
by the Ministry of Communication and Information Technol-
AN
ings of the 26th International Conference on Neural Infor- ogy (MCIT) Egypt, for their system entitled Mute and Hearing
mation Processing Systems NIPS 1, Dec 2013, pp. 190- Impaired Education via an Intelligent Lip Reading System. In
198. 2014, he appointed as an Associate Professor at Aswan Uni-
versity. In the same year, Dr. Sagheer joined the Department
[33] Stephan Spiegel, Julia Gaebler, Andreas Lommatzsch
M
tian Gagné, DEAP: Evolutionary Algorithms Made Easy, ber of IEEE and IEEE Computational Intelligence society. He
is a reviewer for some journals and conferences related to his
J. Mach. Learn. Res. 13 (2012) 2171-2175.
research interests.
CE
18