The Use of NARX Neural Networks To Predict Chaotic
The Use of NARX Neural Networks To Predict Chaotic
The Use of NARX Neural Networks To Predict Chaotic
net/publication/228571349
CITATIONS READS
387 15,065
1 author:
Eugen Diaconescu
University of Pitesti
46 PUBLICATIONS 447 CITATIONS
SEE PROFILE
All content following this page was uploaded by Eugen Diaconescu on 15 May 2014.
Abstract:
The prediction of chaotic time series with neural networks is a traditional practical problem of dynamic systems. This
paper is not intended for proposing a new model or a new methodology, but to study carefully and thoroughly several
aspects of a model on which there are no enough communicated experimental data, as well as to derive conclusions
that would be of interest. The recurrent neural networks (RNN) models are not only important for the forecasting of
time series but also generally for the control of the dynamical system. A RNN with a sufficiently large number of
neurons is a nonlinear autoregressive and moving average (NARMA) model, with “moving average” referring to the
inputs. The prediction can be assimilated to identification of dynamic process. An architectural approach of RNN with
embedded memory, “Nonlinear Autoregressive model process with eXogenous input” (NARX), showing promising
qualities for dynamic system applications, is analyzed in this paper. The performances of the NARX model are verified
for several types of chaotic or fractal time series applied as input for neural network, in relation with the number of
neurons, the training algorithms and the dimensions of his embedded memory. In addition, this work has attempted to
identify a way to use the classic statistical methodologies (R/S Rescaled Range analysis and Hurst exponent) to obtain
new methods of improving the process efficiency of the prediction chaotic time series with NARX.
Key-Words: - Chaotic Time Series, Hurst Exponent, Prediction, Recurrent Neural Networks, NARX Model
unpredictable, and the prediction of chaotic time functional. Although NN had been shown to be
series is a difficult task. universal approximators, it has found that NN had
From a historical point of view, before the 1980s, difficulty modeling seasonal patterns in time series.
prediction of time series used linear parametric When a time series contains significant seasonality,
autoregressive (AR), moving-average (MA) or the data need to be deseasonalized.
autoregressive moving-average (ARMA) models -The number of samples in time series. Researchers
introduced by Box and Jenkins [11][13]. These have found that increasing observation frequency
models are linear and are not able to cope with does not always help to improve the accuracy of
certain non stationary signals, and signals whose prediction.
mathematical model is not linear. An obvious -Stationarity – the classical techniques for time
drawback is that these algorithms are linear, and are series prediction, require a stationary time series,
not able to cope with certain nonstationary signals while most real time series are not stationary
and signals whose model is chaotic nonlinear. On (stationarity refers to a stochastic process whose
the other hand, neural networks (NN) are powerful mean value, variances and covariances – first and
when applied to problems whose solutions require second order moments do not change in time). After
knowledge which is difficult to specify, but for NNs have been introduced one can use original time
which there is an abundance of examples. series as forecasting targets.
The prediction of chaotic processes implies finding -The problem of long time dependencies - is related
the interdependences between time series to the problem of vanishing gradient or forgetting
components. The dependences are minimal in behavior.
random time series and maximal in a complete Time series prediction is the same as system
deterministic process. But, random and deterministic identification; this paper shows that the dynamics of
are only margin of the large set of chaotic time nonlinear system that produce complex time series
series signals with weak dependences between can be captured in a model system. The model
components on short or long term. A special case is system is an artificial RNN. The main idea of RNN
represented by the fractal time series characterized is providing a weighted feedback connection
by auto similarity or non-periodic cycles. between layers of neurons and adding time
significance to entire artificial NN. Therefore, RNNs
are the most suitable for Time Series Analysis
1.1 Prediction with neural networks
After 1980, there has been resurgence in the field of 1.2 Chaotic time series
time series prediction, when it becomes clear that
this type of prediction is a suitable application for a 1.2.1 A short characterization of some example of
neuronal network predictor. chaotic time series
The NN approach to time series prediction is non- To evaluate the prediction capability of proposed
parametric, in the sense that it is not necessary to algorithms, the best data are the chaotic time series
know any information regarding the process that [14], generated by some linear dynamical systems.
generates the signal. It is shown that the recurrent The degree of irregularity is different, from on type
NN (RNN) with a sufficiently large number of of series to another, depending on the sort of iterated
neurons is a realization of the nonlinear ARMA difference equation, chaotic map or flows. In this
(NARMA) process. [1][12][5]. paper, four time series were been tested: logistic,
Neural Networks (NN) based prediction has been Mackey-Glass, fractal Weirstrass and BET index.
explored from their beginning and development, The logistic time series (4) generate a chaotic map
because of NNs approximation and generalization with extremely short memory length. It is a difficult
property. Many research papers are published in test for prediction algorithm. They do not exhibit
scientific literature and some commercial companies cycles as we see sometimes in system practice.
claim or market the so-called advanced statistical
y ′(t ) = a ⋅ y (t ) ⋅ (1 − y (t )) (4)
programs, using neural networks, for modeling and
prediction. The Mackey-Gloss equation (5) is classified as a
However, some difficulties and limitations remain simpler system generating chaotic flows. This type
nevertheless of current actuality and cause of chaotic time series is a relatively easy mission for
researches for new NN models and learning prediction algorithms. Non-periodical cycles appear
techniques to be conducted[4][5][7][8]. due the delay included in the equation. Chaotic
-Outliers make it difficult for NNs (and other
prediction models) to model the true underlying
from ti to ti+1, or a decrease will be followed by a those problems for which the output of a system at
decrease. time instant k depends on network inputs presented
A Hurst exponent of 0 < H < 0.5 shows at times r << k. The researchers have analyzed
antipersistent behavior. After a period of decreases, learning algorithms for systems with long time
a period of increases tends to show up. The dependencies and showed that for gradient-based
antipersistent behavior has a rather high fractal training algorithms, the information about the
dimension, corresponding to a very “noisy” profile- gradient contribution m steps in the past vanishes for
like curve (which highly fills up the plane). This large m. This effect is referred to as the problem of
behavior is sometimes called “mean reversion”[10]. vanishing gradients, which partially explains why
In principle, fractal dimension D and Hurst gradient descent algorithms are not very suitable to
coefficient H are independent of each other because estimate systems and signals with long time
the D is considered a local propriety, and long- dependencies. For instance, common recurrent
memory dependence is a global characteristic [22]. neural networks encounter problems when learning
For self-affine processes (e.g. fractals), the local information with long time dependencies, a problem
properties are reflected in the global ones, and it is in the prediction of nonlinear and no stationary
possible the relationship D + H = n + 1, where the n signals. The vanishing gradients problem makes the
is dimension of self-affine space [22]. The long- learning of long-term dependencies in gradient-
memory dependence (persistence) is linked with the based training algorithms difficult if not virtually
case 0.5 < H < 1 and a feature of the surfaces with impossible in certain cases [1].
low fractal dimensions. The antipersistent processes A state space representation of recurrent NARX
are linked with the surfaces with higher fractal neural networks can be expressed as[12]:
dimensions (rougher) with 0 < H < 0.5.
Φ(u (k ), zi (k )), i = 1,
z k (k + 1) = (11)
zi (k ), i = 2,3,...N ,
2 NARX networks where the output y(k) = zi(k) and zi, i=1,2, … N, are
In this paper, the architectural approach proposed to state variables of recurrent neural network. The
deal with chaotic time series is one based upon recurrent network exhibits forgetting behavior, if:
“Nonlinear Autoregressive models with eXogenous
input (NARX model)”, which are therefore called ∂z i ( k )
lim =0 ∀k , m ∈ K , i ∈ Ο , j ∈ Ι , (12)
NARX recurrent neural networks [1][4][5]. This is a m→∞ ∂z j ( k − m)
powerful class of models which has been
where z is state variable, “I” denotes the set of input
demonstrated that they are well suited for modeling
neurons. “O” denotes the set of output neurons and
nonlinear systems and specially time series. One
K denotes the time index set.
principal application of NARX dynamic neural
Several approaches have been suggested to get
networks is in control systems. Also, is a class
around the problem of vanishing gradient in training
computationally equivalent to Turing Machines [1].
RNNs. Most of them rest on including embedding
Some important qualities about NARX networks
memory in neural networks, whereas several others
with gradient-descending learning gradient
propose improved learning algorithms, such as the
algorithm have been reported: (1) learning is more
extended Kalman filter algorithm, Newton type
effective in NARX networks than in other neural
algorithm, annealing algorithm, etc.
network (the gradient descent is better in NARX)
Embedded memory is particularly significant in
and (2) these networks converge much faster and
recurrent NARX and NARMAX neural networks.
generalize better than other networks [4][5].
This embedded memory can help to speed up
The simulated results show that NARX networks are
propagation of gradient information, and hence help
often much better at discovering long time –
to reduce the effect of vanishing gradient. There are
dependences than conventional recurrent neural
various methods of introducing memory and
networks. An explanation why output delays can
temporal information into neural networks. These
help long-term dependences can be found by
include creating a spatial representation of temporal
considering how gradients are calculated using the
pattern, putting time delays into the neurons or their
back-propagation-through-time (BPTT) algorithm.
connections, employing recurrent connections, using
Recently, several empirical studies have shown that
neurons with activations that sum input over time,
when using gradient-descent learning algorithms, it
etc.
might be difficult to learn simple temporal behavior
with long time dependencies [7][9], in other words
3 Architecture and learning For the NN models used in this work, with two
levels (level 1 surnamed input layer and level 2 or
3.1 The NARX models output layer), the general prediction equations for
The NARX model for approximation of a function Γ computing the next value of time series y(k+1)
can be implemented in many ways, but the simpler (output) using model in figure 2, the past
seems to be by using a feedforward neural network observation u(k), u(k-1), …, u(k-du) and the past
with the embedded memory (a first tapped delay outputs y(k), y(k-1), …, y(k-dy) as inputs, may be
line), as is shown in figure 2, plus a delayed written in the form:
connexion from the output of the second layer to N du
(14)
du 2 dy
+ ∑w
i 2= 0
i 2 h u 2 (k − i 2) + ∑w j =0
jh ⋅ y ( k − j ))
Table 5
N du dy R
6 30 2 0.8207
7 30 2 0.9299
8 30 2 0.8281
11 45 2 0.8933
11 45 3 0.8690
12 30 2 0.4266
12 50 3 0.4266
Fig. 5 Weirstrass fractal function, easy to be 15 50 3 0.7904
predicted: H=0.82, a=1.2, b=1.4, ω=10,
R=0.99971. Original and predicted time series are
approximately the same, NN(8, 2;1).
5 Conclusions
In this paper, the performance of the prediction for
different time series was tested using a NARX
dynamic recurrent neural network. Comparative
experiments with real and artificial chaotic time
series from diverse domains have been made.
The first conclusion of this paper is that NARX
recurrent neural networks have the potential to
capture the dynamics of nonlinear dynamic system
such as in the examples shown, for the Mackey-
Fig. 8 BET index volume 2005-03.2008 Glass system with different delays. This affirmation
is based on the fact that correlation coefficient R
Table 6 Table 7 estimated for the original and generated (1000
N du1 du2 dy R N du1du2 dy R points) time series is close to 1 in many cases, and
10 25 25 3 0.8147 10 25 25 3 0.2432 the prediction can be considered of real interest or
12 6 6 2 0.5607 12 6 6 2 0.6143
12 25 25 3 0.8093 12 25 25 3 0.5320
significance if R>0.98.
14 14 14 2 0.5916 14 14 14 2 0.0301 The paper has attempted to use traditional statistical
15 25 25 3 0.7145 15 25 25 3 0.7308 methodologies (R/S Rescaled Range, from where
Hurst coefficient) to obtain indications to make
efficient the process of prediction chaotic time series
with RNN. To certain extent, the Hurst coefficient
may give a clue, otherwise vague, about existence of
long time memory in the analyzed time series. The
prediction may fails however, even the values of
Hurst coefficient are encouraging, in conformity
with R/S theory.
The second conclusion is that the nonlinear NARX
models are not without problems, they have
limitation in learning long time dependences due to
the “vanishing gradient”, and like any dynamical
system are affected by instability, and have lack of a
procedure of optimizing embedded memory.
The last conclusion, and the most important, is that
the architecture of the tested RNN model affects the
Fig. 9 NARX RNN with simultaneous inputs: BET- performance of prediction. The most favorable
index value and BET- index volume behavior of NARX model is dependent upon the
dimension of embedded memory of input and output
and the number of neurons in the input layer. The
determination of these architectural elements, in an
optimal way, is a critical and difficult task for the
NARX model, and remains an objective for future
works.
The followings lines contain several directions to
explore:
- The avoiding of saturation and over-fitting because
of too many neurons in network. Too many hidden
neurons lead to poor prediction. A solution is the
including in NARX models the penalizing terms as
Bayessian Information Criteria (BIC), Akaike
Information Criteria (AIC) – a process named
regularization.
- The data input analysis may show how the length
Fig. 10 NARX RNN with simultaneous inputs: BET- of sequences and their correlation influence the
index value and BET- index value 15 days lagged
interval and value of predictability (selection the [13] M. Tertisco, P. Stoica, T. Petrescu, Modeling
length of lags). and forecasting of time series, Publ. House of
- The finding other training algorithms than back- Romanian Academy, 1985
propagation, for the NARX models. Consecutive [14] Garnet P. Williams, Chaos Theory Tamed,
restarts of the program with back-propagation Joseph Henry Press, 1999
training function, give different results, indicating [15] H.B. Demuth, M. Beale, Users’ Guide for the
the end of iteration process in local optima. Neural Network Toolbox for Matlab, The
Mathworks, Natica, MA, 1998
References: [16] L.F. Mingo, J. Castellans, G. Lopez, F. Arroyo,
[1] Simon Haykin, Neural Networks, Second Time Series Analysis with Neural Network,
Edition, Pearson Education, 1999 WSEAS Transactions on Business and
[2] Georg Dorffner, Neural Networks for Time Economics, Issue 4, Volume 1, October, ISSN
Series Processing, Neural Network World, Vol. 1109-9526, pp. 303-310, 2004
6, No. 4, 447-468, 1996 [17] A.C. Tsakoumis, P. Fessas, V. M. Mladenov,
[3] J.Farawey, C.Chatfield, Time Series Forecasting N. E. Mastorakis, Application of Neural
with Neural Networks: A Comparative Study Networks for Short Term Electric Load
Using the Airline Data, Applied Statistics, Prediction, WSEAS Transaction on systems,
Volume 47, No.2, 1998, pp. 231-250 Issue 3, Volume 2, July 2003, ISSN 1109-2777,
[4] Tsungnan Lin, Bill G. Horne, Peter Tino, C. Lee pp. 513-516
Giles, Learning long-term dependencies in [18] A.C. Tsakoumis, P. Fessas, V. M. Mladenov,
NARX recurrent neural networks, IEEE N. E. Mastorakis, Application of Chaotic Time
Transactions on Neural Networks, Vol. 7, No. 6, Series for Short-Time Load Prediction, WSEAS
1996, pp. 1329-1351 TRANSACTION on SYSTEMS, Issue 3,
[5] Yang Gao, Meng Joo Er, NARMAX time series Volume 2, July 2003, ISSN 1109-2777, pp. 517-
model prediction: feedforward and recurrent 523
fuzzy neural network approaches, Fuzzy Sets and [19] Theodor D. Popescu, New method for time
Systems, Vol. 150, No. 2, 2005, pp.331-350 series forecasting, WSEAS TRANSACTIONS
[6] M. T. Hagan, O. D. Jesus, R. Schultz, Training on CIRCUITS and SYSTEMS, Issue 3, Volume
Recurrent Networks for Filtering and Control, in 2, July 2003, ISSN 1109-2734, pp. 582-587
(editors) L.R. Medsker, L.C. Jain, Recurrent [20] O. Valenzuela, I. Rojas, I. Marquez, M.
Neural Networks – Design and Applications, Pasados, WSEAS TRANSACTION on
CRC Press, 2001 CIRCUITS and SYSTEMS, A novel Aproach to
[7] Tsungnan Lin, C. Lee Giles, Bill G. Horne, S.Y. ARMA Time Series Model Identification by
Kung, A Delay Damage Model Selection Neural Network, Issue 2, Volume 3, April 2004,
Algorithm for NARX Neural Networks, IEEE ISSN 1109 – 2737, pp. 342-347
Transactions on Signal Processing, “Special [21] Wei Huang, Shouyang Wang, Lean Yu, Yukun
Issue on Neural Networks”, Vol. 45, No. 11, Bao, L. Wang, A New Computational Method of
1997, pp. 2719-2730 Input Selection for Stock Market Forrecasting
[8] H. T. Siegelmann, B. G. Horne and C. Lee with Neural Networks, International Conference
Giles, Computational capabilities of recurrent on Computational Science, 2006, pp. 308-315
NARX neural networks, IEEE Transactions on [22] Tilmann Gneiting, Martin Schlather, Stochastic
Systems, Man and Cybernetics, Part B, Vol. 27, Models That Separate Fractal Dimension and
No.2, 1997, 208-215 Hurst Effect, SIAM review, 46, 2004, pp. 269-
[9] Jingtao Yao, Chew Lim Tan, A case study on 282
using neural networks to perform technical [23] G. P. Zhang, Neural networks in Business
forecasting of forex, Neurocomputing, 34, 2000, Forecasting, Ideea Group Inc., 2003
pp. 79-98
[10] Edgar E. Peters, Fractal Market Analysis, John
Wiley & Sons, 2001
[11] D.S.G. Pollok, A Handbook of Time – Series,
Signal Processing and Dynamics, Academic
Press, 1999
[12] D.P. Mandic, J.A. Chambers, Recurrent Neural
Networks for Prediction, JohnWiley&Sons, 2001