ssrn-4170455

Extending Machine Learning Prediction Capabilities by Explainable AI in Financial Time
Series Prediction
aTaha Buğra ÇELİK, bÖzgür İCAN, cElif BULUT
aResearchAssistant, Faculty of Economics and Administrative Sciences, Department of Business
Administration, Ondokuz Mayıs University, Samsun, Turkey. E-Mail: tahabugra.celik@omu.edu.tr
bAssistantProfessor, Faculty of Economics and Administrative Sciences, Department of International
Trade and Logistics Ondokuz Mayıs University, Samsun, Turkey. E-Mail: ozgur.ican@omu.edu.tr
cAssistant
Professor, Faculty of Economics and Administrative Sciences, Department of Business
Administration, Ondokuz Mayıs University, Samsun, Turkey. E-Mail: elif@omu.edu.tr
Corresponding author: Taha Buğra ÇELİK. Phone number: (+90) 543 245 62 91.
Conflicts of Interest
All the authors do not have any conflicts of interest.
Data Availability
The datasets used to support the findings of this study are available from the corresponding author
upon request.
Abstract
Prediction with higher accuracy is vital for stock market prediction. Recently, considerable amount of
machine learning techniques are proposed which successfully predict stock market price direction. No
matter how successful the proposed prediction model, it can be argued that there occur two major
drawbacks for further increasing the prediction accuracy. The first one is that, because machine
learning methods bear black box nature, the source of inference for the predictions cannot be
explained. Furthermore, due to the complex characteristics of the predicted time series, no matter how
sophisticated techniques are employed, it would be very difficult to achieve a marginal increase in
accuracy that would meaningfully offset the additional computational burden it brings in. For these
two reasons, instead of chasing incremental accuracy increases, we propose utilizing an eXplainable
Artificial Intelligence (XAI) approach which can be employed for assessing the reliability of the
predictions hence allowing decision maker to abstain from poor decisions which are responsible for
decrease in overall prediction performance. If there would be a measure of how sure the prediction
model is on any prediction, the predictions with a relatively higher reliability could be used to make a
decision while lower quality decisions could be avoided. In this study, a novel two-stage stacking
ensemble model for stock market direction prediction based on machine learning (ML), empirical
mode decomposition (EMD) and XAI is proposed. Our experiments have shown that, proposed
prediction model supported with local interpretable model-agnostic explanations (LIME) achieved the
highest accuracy of 0.9913 with trusted predictions on the KOSPI dataset.
Keywords: Stock market prediction, machine learning, deep learning, empirical mode decomposition,
explainable machine learning, local interpretable model-agnostic explanations.
1. Introduction
Financial markets, in particular stock markets, allow investors and traders (practitioners who aim to
earn returns from short-term price movements) to gain capital gains if they can make the right
decisions. However, the price movements in the stock markets are highly nonlinear, and it is difficult
to make the right decisions consistently. As a result of the rapid developments in machine learning and
deep learning in recent years, great progress has been made in the field of stock market prediction.
Since many new methods and techniques are applied simultaneously in the studies of the stock market
prediction literature, it is very difficult to classify these studies in terms of applied methods and
Electronic copy available at: https://ssrn.com/abstract=4170455

techniques. Therefore, rather than making a definite classification, relevant research can be grouped
according to which methods or techniques they suggest which directly increase the success of
prediction. In the literature, studies focusing on determining/optimizing the feature space used by the
model are common in order to increase the prediction success. Zhou et al. [46] employ multiple data
sources including historical transaction data, technical indicators, stock posts, news and Baidu index to
predict stock market direction with support vector machine (SVM). In feature-oriented studies, novel
prediction models are frequently used by hybridizing with other methods besides single machine
learning or deep learning methods. These studies focus on improving the features to be used for the
model in order to increase the success of the prediction model. Persistent homology, a method used in
topological data analysis, employed by Ismail et al. [14] to obtain new and useful inputs for stock
market prediction model. Yun et al. [43] proposes a hybrid GA-XGBoost prediction model and
utilizes genetic algorithms (GA) for feature selection. Efforts to increase the predictive success of the
model by focusing on the inputs are also called feature engineering in general [33]. In order to learn
the latent feature representation from stock prices, Zhang et al. [44] proposed deep belief network
(DBN). Thakkar and Chaudhari [34] utilize term frequency–inverse document frequency to derive
feature weight matrix from the historical stock market data and backpropagation neural network
(BPNN). Hao and Gao [10] propose multiple time scale feature learning to predict the price trend of
the stock market index. In order to learn complementary features from different sources of historical
price and text data, Liu et al. [20] proposed a recurrent convolutional neural kernel. Yang et al. [39]
combines convolutional neural network (CNN) for feature extraction and a long short-term memory
(LSTM) network for prediction.
Another approach focusing on features is dimensionality reduction such as variational auto encoders
and principal components to improve the computational efficiency of prediction models via reducing
the complexity of feature set [9,8]. In addition, the use of technical analysis indicators as inputs in the
prediction model is quite common in the literature. Yang et al. [40] combine technical analysis with
group penalized logistic regressions to predict up and down trends of stock prices. Patel et al. [28]
represent ten technical indicators as trend deterministic data. Nabipour et al. [23] uses ten technical
indicators as continuous data and then convert these indicators to binary data before using and
comparing results. Lee et al. [18] makes predictions with LSTM fed by technical analysis indicators.
Besides dimensionality reduction, decomposition of the complex time series into more manageable
sub-components simplifies the work of the prediction model. One of the most commonly used
decomposition approaches is empirical mode decomposition (EMD) [13]. EMD has been applied by
Xu and Tan [38] to decompose stock price and sub components predicted by a temporal attention
LSTM. Zhou et al. [45] introduce EMD and factorization machine based neural network to predict the
stock market trend. Jin et al. [17] proposed sentiment analysis combined with LSTM and EMD. A
more advanced decomposition method developed as an alternative to EMD is variational mode
decomposition (VMD) [7]. Utilization of VMD along with deep learning and machine learning models
for stock market prediction has been proved to provide successful results [24,2,42]. There are also
other decomposition methods such as singular spectrum analysis, empirical wavelet transform,
ensemble EMD employed along with prediction models [37,19,41].
In the studies mentioned above, in order to increase the prediction performance of a single prediction
model, additional methods and techniques are hybridized. Utilizing multiple prediction models
emerges as an alternative approach. Ensemble learning is a meta approach that combines multiple
prediction models in order to produce a better composite predictive model. Ensemble methods are
divided into three main categories called bagging, stacking, and boosting. Bagging or bootstrap
aggregation include a diverse group of prediction models which are trained with different training
subsets generated using random sampling. The predictions made by the ensemble members are then
given to a combination scheme (such as voting, averaging or any set of rules) to produce a final
prediction value [6]. On the other hand, stacking approach combines outputs of a group of prediction
models as inputs to another prediction model in order to achieve higher prediction accuracy [6].
Although multiple layers of models can be utilized, two-level hierarchy is more common. In boosting
ensemble models, training data is iteratively changed to focus on the misclassified instances in the

previous fits. In summary, boosting approach is based on the idea of correcting prediction errors [27,
1].
The number of proposed methods and techniques to increase the success of stock market prediction
models is plentiful and new ones are constantly being proposed. The findings of the mentioned studies
reveal that the prediction success of the proposed models is significantly high. Regardless of the
method used, it becomes more and more difficult, or even impossible, to exceed the prediction
accuracy rates claimed in these studies [43]. The high pattern recognition capacities of machine
learning methods can be used to discover patterns in time series. Reporting of close prediction
successes with different combinations of new or existing techniques indicates that it is increasingly
difficult to make further progress. In this case, instead of dealing with new techniques that increase the
computational burden and complexity of interpretation, the option of how to increase the reliability of
the prediction success of existing techniques comes to the fore as an alternative. For example, an
approach that allows a prediction model to avoid the 20% predictions in which the prediction model
fails, rather than striving to increase its accuracy from 80% to 85%, will indirectly increase overall
prediction accuracy to significantly higher levels by making only reliable or trustworthy predictions.
The only drawback of this approach is that it prevents predictions from being made for each period, as
unreliable predictions will be avoided. This is the price to be paid to be able to make predictions with a
very high accuracy rate. In the contemporary literature, we see that such efforts are examined under
the name of explainable artificial intelligence (XAI). Recently, two different approaches have come to
the fore within the scope of XAI. One of them is local interpretable model-agnostic explanations
(LIME) [31], which allows any model to be interpreted by describing each prediction on an instance-
by-instance basis. Another method is SHapley Additive Explanations (SHAP) [21], which assigns the
importance level of features to each prediction.
In this study, initially, a two-stage stacking ensemble prediction model is developed in order to predict
the daily stock market closing price direction. Instead of feature engineering, we prefer decomposition
approach by employing EMD technique. The reason for preferring data decomposition is clearly
explained in future sections by experimental results. It is put forward that EMD clearly facilities the
operation of the prediction model. These decomposed series, also known as intrinsic mode functions
(IMF) are simply sub-components of original time series. In the first stage, each IMF is predicted with
two distinct ANN models. The first one is used for predicting each IMF’s real value. In other words, it
predicts its quantitative values (regression prediction), so it is called ANN regression (ANNR) for
short. On the other hand, the latter one is used for classifying direction (upward and downward) and
since this ANN model is designed as a classifier model, it is named as ANNC after classifier. Needless
to say that, for all IMFs, these two ANN models have been trained separately and model objects have
been saved for predictions. The reason behind employing these two distinct ANN models (regression
prediction and classification) is pre-experimental results asserting the superiority of ANNC for
predicting certain IMFs and ANNR for the rest. For this reason, the predictions of these two models
have been combined in order to exploit their relative strengths in the first stage and their combined
predictions have been fed to a third prediction model in the second stage. The third model in the
second stage has been selected based on the comparative performances of different algorithms namely
random forest (RF) and extreme gradient boosting (XGBoost). In addition, third prediction model is
utilized as classifier (upward and downward direction prediction) to predict the direction of the
original time series. To sum up, overall architecture is one of the possible stacking ensemble model
configurations among many possible ones hence our proposed ensemble model is named as EMD-
ANN-RF.
In the second stage of proposed prediction procedure, which constitutes the most important part of the
study, the LIME algorithm has been integrated to the RF model. By giving the outputs of ANNR and
ANNC as inputs to the RF model, the upward and downward movement direction of six major stock
market indices is predicted. The main motivation of this study emerges at this stage. During the
explorative researches, it has been discovered that the RF model makes more successful predictions
for certain values of the inputs. For instance, the outputs of the ANNC model take values in the range
of [0,1]. In fact, it has been observed that this prediction is quite successful if the ANNC output has a

value close to zero while the RF model makes a downward prediction (i.e. classifies as 0). Similarly,
for the situation where the ANNC output is close to one, if the RF makes an upward forecast, the
forecast is more successful. The same scenario applies to the output of the ANNR model. In order to
better understand and explain this behavior of the RF prediction model, it is desired to examine the
model's predictions instance by instance with an explicable machine learning approach.
In the literature on XAI, independent of the prediction model being used, there is an approach which
tries to explain each prediction (instance by instance) called LIME. LIME explains the predictions of
any classier in an interpretable and faithful manner. It offers explanations by learning an interpretable
model locally around the prediction [31]. Thus, the probabilities of each class prediction made by the
RF model can be calculated with LIME. In the usual process, when any input instance is given to the
RF model, it makes one of the 0 or 1 predictions. However, after the LIME algorithm is implemented,
a probability is assigned to each of the 0 and 1 class labels. For example, suppose that for any input
set, the probability of 0.10 for the class 0 and 0.90 for the class 1 is obtained. These probabilities
calculated for the classes are obtained for each prediction in the test set. Here, the class probabilities
are utilized as the reliability level for the predictions. If one of the class probabilities is high enough
for the decision maker, then he/she trust that prediction and make decision based on the outcome.
According to the previous example, if 0.80 is high enough to trust any prediction for the decision
maker, then he/she make decision in favor of 1 class label. Here, 0.80 is the reliability level for the
decision maker. On the other hand, If the decision maker would set the reliability level as 0.91 let’s
say, then he/she would hesitate to make any decision since 0.90 < 0.91, in other words, reliability
condition is not satisfied. If the class probability level is set to 0.50, we obtain the predictions simply
made by the RF algorithm alone without LIME. However, as the reliability level is increased, some
predictions will be avoided as the reliability condition will not be met for some of them, but a higher
accuracy will be expected for trusted predictions. In other words, predictions cannot be made for all of
the predictions of RF model in the test set, since those with low reliability levels will be avoided. In
order to test this idea, the final increase in the accuracy rate for all reliability levels ranging from 50%
to 100% and the decrease in the number of predictions (accuracy and number of trusted predictions
trade-off) have been revealed as a result of the experiments. To the best of our knowledge, the
implementation of LIME algorithm to a prediction model as we propose here has not been previously
proposed in the relevant literature. It is useful for the reader to summarize the next parts of the work.
Sections 2.1 to 2.5 describe the base methods and techniques used in the forecasting model. In Section
2.5, the details of the model we have proposed are shared. While the experimental results and
evaluations are included in Section 3, conclusion and future directions are given in Section 4.
2. Prediction Methods and Framework
2.1. Empirical mode decomposition

Empirical mode decomposition (EMD) is a method for analyzing nonlinear and non-stationary
data which is developed by [13]. Any complicated dataset can be decomposed into finite number of
‘intrinsic mode functions’ (IMF) and the decomposition is based on the local characteristic time scale
of the data so that it is applicable to nonlinear and non-stationary processes [13]. Decomposition is a
separation of signal into different components and it is usually done in data analysis if some
information is wanted to be extracted that cannot be obtained when considering the data as a whole.
Therefore, decomposing data allows analyzing the newly obtained components to gain new insight
into the features inherent to the data. Each mode function represents a portion of the complete signal.
For a given signal 𝑋(𝑡), EMD algorithm is implemented as follows:
1. Determine envelops of 𝑋(𝑡) which are defined by local maxima 𝑋𝑢𝑝 and local minima 𝑋𝑙𝑜𝑤
separately.
2. Once the extrema are identified, all the local maxima are connected by a cubic spline line, 𝑋𝑢𝑝
(𝑡), as the upper envelope.
3. Repeat the procedure for the local minima to produce the lover envelope, 𝑋𝑙𝑜𝑤(𝑡).
4. Let 𝑚𝑡 be the mean of the upper and lower envelope such that,

𝑚𝑡 = (𝑋𝑢𝑝(𝑡) + 𝑋𝑙𝑜𝑤(𝑡)) 2 (1)
5. Then the first component, ℎ1(𝑡),
ℎ1(𝑡) = 𝑋(𝑡) ‒ 𝑚𝑡 (2)
6. After the first component obtained, replace 𝑋(𝑡) with ℎ1(𝑡) and repeat the whole procedure
until the stopping criterion is satisfied. The stopping criterion,
(ℎ𝑖(𝑡) ‒ 𝑋(𝑡))2
∑ 𝑡 𝑋(𝑡)2
<𝜖 (3)
Here 𝜖 is a threshold value and usually close to 0.2. The IMF admits well-behaved Hilbert
transform. This decomposition method is adaptive, which means that it is not based on a
predetermined well-defined mathematical basis but the data itself dictates the decomposition,
therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of
the data, it is applicable to non-linear and non-stationary processes. With the Hilbert transform, the
IMFs yield instantaneous frequencies as functions of time that give sharp identifications of imbedded
structures. The final presentation of the results is an energy-frequency-time distribution, designated as
the Hilbert Spectrum. Classical non-linear system models are used to illustrate the roles played by the
non-linear and non-stationary effects in the energy-frequency-time distribution. Examples including
Duffy equation, Rossler Equation, and non-linear wind wave data will be discussed to show the new
Hilbert view of non-linear and non-stationary systems.
2.2. Artificial neural networks
The beginning of artificial neural networks (ANN) goes back to the computational model developed
for neural networks called threshold logic by [22]. However, the algorithm that forms the basis of the
multi-layer neural networks algorithm and enables the training of artificial neural networks in its
current sense is the backpropagation algorithm. ANN is a supervised machine learning algorithm that
learns the mapping between an input and an output set. ANN has three layers and each layer consists
of nodes. First layer is the input layer and each node refers to an input. The second layer called as
hidden layer which may include more than one layer. The last layer called output layer and it produces
the output of the model for each input instance. Based on the design, nodes are connected each other
but not necessarily all nodes are connected. Then the model is trained with a sample of data, to capture
the relationship between inputs and outputs. In Figure 1, a representative ANN model is depicted.
Input Hidden Hidden Hidden

layer layer 1 layer 2 layer 𝑛
1 1 1 1
⋮
Output
layer
2 2 2 2
⋮
⋮ ⋮ ⋮ ⋮
𝑚 𝑙 𝑘 𝑗
⋮
Figure 1. General architecture of ANN models.
Both regressing prediction (prediction of continuous numeric values) and classification (prediction of
class label) can be made with ANN. In this study, both these features of artificial neural network are
used. Regression version of ANN named as ANNR and classifier version as ANNC. Both ANNR and
ANNC models have two hidden layers with 100 nodes and using ReLU activation function. Two

models are trained for 300 epochs with a mini-batch size of 30 samples. The output layer of ANNR
has a single node for predicting a numeric value with hyperbolic tangent activation function and
trained to minimize the mean squared error (MSE) loss function using the Adam version of stochastic
gradient descent. ANNC on the other hand, has sigmoid activation function in output layer with binary
cross entropy loss function.
2.3. Local interpretable model-agnostic explanations (LIME)
Ribeiro et al. [31] proposed the LIME algorithm to provide explanations for individual
predictions, allowing some degree of reliability for the predictions of any classifier or regressor. LIME
provides interpretations for predictions locally. On the other hand, for a given prediction, SHAP
calculate the marginal contribution of a feature to the model by the Shapley value of a feature. The
main approach of LIME differs from SHAP. LIME decides whether a model is locally faithful
regardless of the model and verifies how a model represents the features around a prediction. This
attribution of LIME algorithm is known as local fidelity [32]. The explanation produced by LIME is
obtained by the following, [31]:
𝑒𝑥𝑝𝑙𝑎𝑛𝑎𝑡𝑖𝑜𝑛(𝑥) = argmin ℒ(𝑓,𝑔,Π𝑥(𝑧)) + Ω(𝑔) (4)
𝑔∈𝐺
'
𝑥 represents instance and interpretable representation of an instance is a binary vector 𝑥 ∈ {0,1}𝑑 . Let
𝐺 be a set of potentially interpretable models and 𝑔 ∈ 𝐺, where 𝑔 represents a machine learning model.
'
The domain of 𝑔 is {0,1}𝑑 . The complexity of an interpretation of a model is Ω(𝑔). For classification,
𝑓(𝑥) is the probability measure of 𝑥 belong to a class. Π𝑥(𝑧) is proximity measurement between an
instance 𝑧 to 𝑥, so as to define locality around 𝑥.
2.4. Proposed model: Two-stage ensemble EMD-ANN-RF
Two major prediction approaches exist in stock market price prediction literature. The first
approach is to predict actual price levels of time series (also known as regression prediction) and
compare the results with observed real values with respect to known evaluation metrics such as root
mean squared error (RMSE), mean squared error (MSE), mean absolute error (MAE) and mean
absolute percentage error (MAPE). Recently, directional prediction accuracy (also known as hit-rate)
for evaluating prediction performance is also becoming more common since making accurate
movement direction prediction is vital for successful stock market predictions. Accuracy measured as,
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (5)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
where 𝑇𝑃 true positive predictions, 𝑇𝑁 true negative predictions, 𝐹𝑃 false positive predictions and 𝐹𝑁
false negative predictions. Other performance evaluation metrics that commonly employed in the
literature and in this paper are precision, recall, F1-score and area under the receiver operating
characteristic curve (ROC-AUC). These evaluation metrics are defined as follows;
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (6)
𝑇𝑃 + 𝐹𝑃
𝑇𝑃 (7)
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)(𝑅𝑒𝑐𝑎𝑙𝑙) (8)
𝐹1 ‒ 𝑠𝑐𝑜𝑟𝑒 = 2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
1 𝑇𝑃 𝑇𝑃 (9)
𝑅𝑂𝐶 𝐴𝑈𝐶 = ∫ 𝑇𝑃 + 𝐹𝑁𝑑𝑇𝑃 + 𝐹𝑃
0
The second approach in predicting stock market data is based on time series directional
prediction, also known as time series classification. In order to conduct stock market direction
classification, target data is commonly labeled as 1 and 0 standing for upward and downward
movements as respectively. Then the prediction procedure turns into a classification problem and a

reinforcement model is trained with respect to input set and target set consisting of 1s and 0s. These
two major prediction approaches have their own advantages and disadvantages. In regression
prediction, actual data values are predicted so that the prediction results offer “more information”
compared to classification. On the other hand, results of time series classification is easier to interpret
relative to regression prediction. Moreover, classification prediction is easier to handle for the
prediction model as long as the target data and the feature set are related since the target data set is
Boolean in contrast to continuous target sets of regression prediction. In this paper, in order to predict
stock market price data, a novel two-stage stacked ensemble prediction framework is designed by
exploiting the advantages of both approaches.
Before building our proposed ensemble model, data preprocessing and scaling of daily closing
prices of six selected major stock market indices namely Standard & Poor's 500 Index (SP500),
Nikkei 225 Index (NI225), Borsa Istanbul 100 Index (XU100), Korea Composite Stock Price Index
(KOSPI), Deutscher Aktien Index (DAX) and Financial Times Stock Exchange 100 Index (FTSE100)
is conducted. As the first step of building the ensemble model, time series data has been decomposed
by employing EMD technique in order to reduce complexity of the original series so that the data sets
decomposed into more manageable sub-components (i.e. IMFs). Then experiments conducted for the
prediction of each of the IMF data sets. Using ANNR and ANNC, regression and classification
predictions are made respectively for each IMFs. Results are summarized in Table 1 below.
Table 1. Prediction accuracies of each of the IMFs with ANNR and ANNC
Index Model IMF0 IMF1 IMF2 IMF3 IMF4 IMF5 IMF6 IMF7 IMF8
ANNR 0.7330 0.8945 0.9736 0.9744 0.9936 0.9968 0.9960 1.0000 NA
SP500
ANNC 0.8826 0.8610 0.8754 0.9169 0.9361 0.7995 0.4657 1.0000 NA
ANNR 0.8248 0.8725 0.9729 0.9926 0.9942 0.9984 0.9959 0.9984 NA
NI225
ANNC 0.8924 0.8915 0.9441 0.9499 0.9745 0.9573 0.8094 0.9326 NA
ANNR 0.7894 0.8991 0.9792 0.9888 0.9920 0.9984 0.9960 0.9976 NA
XU100
ANNC 0.8928 0.8888 0.9240 0.9400 0.9048 0.8776 0.8472 0.9960 NA
ANNR 0.8103 0.8896 0.9657 0.9926 0.9943 0.9943 0.9984 1.0000 NA
KOSPI
ANNC 0.8840 0.8856 0.9371 0.9322 0.9175 0.7042 0.4608 1.0000 NA
ANNR 0.8211 0.8847 0.9833 0.9873 0.9944 0.9976 0.9968 1.0000 NA
DAX
ANNC 0.9039 0.8864 0.9055 0.9436 0.9357 0.9325 0.6958 1.0000 NA
ANNR 0.7645 0.8789 0.9680 0.9924 0.9933 0.9958 0.9975 0.9975 1.0000
FTSE100
ANNC 0.8840 0.8950 0.9168 0.9303 0.9529 0.9815 0.6395 0.3782 1.0000
Prediction results show that, accuracy of classification predictions for the first IMF, ℎ0(𝑡), is
significantly better then regression predictions. Since ℎ0(𝑡) is the hardest part to predict for regression
prediction, it is thought that improvement in prediction accuracy for ℎ0(𝑡) may contribute to the
overall success of the prediction model of the original price series. Moreover, prediction results of
other IMFs are slightly better than classification predictions. Therefore, predicting all IMFs except ℎ0
(𝑡) with regression based prediction model and predicting ℎ0(𝑡) with classification based prediction
model is considered to be appropriate for increasing overall prediction accuracy. However, at this
stage a problem arises. Prediction procedure of EMD-ANN model is traditionally made by summation
of each predicted continuous valued IMFs so that the aggregated predictions of IMFs represent the
prediction of original price series. On the other hand, classification prediction results of ℎ0(𝑡) are
sigmoid function, 𝜎(𝑥), outputs (continuous values which is ranging between 0 and 1). ANN
classification model labels predictions as 1 if 𝜎(𝑥) > 0.5 and 0 for 𝜎(𝑥) ≤ 0.5. For this regard,
summation of classification predictions of ℎ0(𝑡) with the regression predictions of the rest of the IMFs
is inappropriate and meaningless. In order to overcome this obstacle, a novel prediction experiment
design proposed.

Accordingly, as described before, ℎ0(𝑡) series is predicted by ANNC. ANNC is trained with four
days lagged values of ℎ0(𝑡) such that ℎ0(𝑡 ‒ 1),…,ℎ0(𝑡 ‒ 4) are fed as inputs and the movement
direction of ℎ0(𝑡) which is denoted as 𝐷𝑡(ℎ0(𝑡)) where
{
0,
𝐷𝑡(ℎ0(𝑡)) = 1,
𝑖𝑓
𝑖𝑓
ℎ0(𝑡) ‒ ℎ0(𝑡 ‒ 1) ≤ 0
ℎ0(𝑡) ‒ ℎ0(𝑡 ‒ 1) > 0
(10)
as output. Since ANNC is a binary classifier, predictions, 𝐶𝑦,𝑡, are class labels that is 𝐶𝑦,𝑡 ∈ {0,1}
implied by classifier design of the artificial neural network. Activation function of ANNC is a sigmoid
function also called as squashing function in machine learning terminology and denoted by 𝜎𝑡(𝑥),
which is a special form of the logistic function and ranges between 0 and 1. ANNC classifies inputs
based on the predicted values of 𝜎𝑡(𝑥) such that,
𝐶𝑦,𝑡 = { 0,
1,
𝑖𝑓
𝑖𝑓
𝜎𝑡(𝑥) ≤ 0.5
𝜎𝑡(𝑥) > 0.5 (11)
Explorative experiments for the final prediction step have shown that, predictions that have sigmoid
function output close to one or zero can more successfully predict upward and downward movements
of original time series respectively. Based on this justification, 𝜎𝑡(𝑥) is kept as the output value of
ANNC, in other words, no class label conversion is made. On the other hand, all IMFs are also
predicted by the ANNR model and summed same as the conventional way to predict original price
series for each of the 𝑡 time period. Let 𝑦𝑡 be the prediction at 𝑡 time step, and difference series which
is denoted as 𝑑𝑡(𝑦), where
𝑑𝑡(𝑦) = 𝑦𝑡 ‒ 𝑦𝑡 ‒ 1 (12)
is obtained for each day. Here 𝑑𝑡(𝑦) indicates the prediction direction magnitude of ANNR and
indicates the severity of the increase or decrease in the successive predictions. The motivation behind
obtaining 𝑑𝑡(𝑦) is the same with 𝜎𝑡(𝑥). Similarly, as 𝑑𝑡(𝑦) diverges from zero, upward and downward
movement direction prediction accuracies are expected to increase. Stage 1 of the proposed model
ends here and the outputs of this stage, 𝑑𝑡(𝑦) and 𝜎𝑡(𝑥), initiate stage 2. Summarizing the procedure
and operations performed in stage 1 in terms of prediction models, datasets and input-output pairings
would be helpful. ANNR and ANNC models are trained in training set and using these two models,
predictions are made in validation set to obtain 𝑑𝑡(𝑦) and 𝜎𝑡(𝑥). At this point, stage 1 is complete and
two models are obtained (ANNR and ANNC) that produce the inputs that will feed the prediction
model of stage 2. Stage 2 starts in validation set where another machine learning classification model
(such as random forest (RF) and extreme gradient boosting (XGBoost)) is trained with 𝑑𝑡(𝑦) and 𝜎𝑡
(𝑥) inputs and output is the movement direction of original price series, 𝐷𝑡(𝑦), at time step 𝑡, where,
𝐷𝑡(𝑦𝑡) = 1,{
0, 𝑖𝑓
𝑖𝑓
𝑦𝑡 ‒ 𝑦𝑡 ‒ 1 ≤ 0
𝑦𝑡 ‒ 𝑦𝑡 ‒ 1 > 0 (12)
Thus, to predict the movement direction of original price series in the test set, all models are trained.
In final prediction step of stage 2, in order to test aforementioned hypotheses, 𝑑𝑡(𝑦) and 𝜎𝑡(𝑥) are
obtained by the predictions of ANNR and ANNC respectively in the test set. Finally, 𝑑𝑡(𝑦) and 𝜎𝑡(𝑥)
are given to random forest or XGBoost classifier to predict the movement direction of original price
series for each 𝑡 time step and 𝐶𝑦,𝑡 = {0,1} (predicted movement directions of original price series for
each time step 𝑡) is obtained. Then the results are compared with single EMD-ANNR prediction model
in terms of accuracy, precision, recall, F1-score and ROC-AUC. The entire prediction procedure is
depicted in Figure 2. In a nutshell, in the final version of the prediction model, the entire dataset is
decomposed with EMD, then using ANNR and ANNC (ANNRC in short), the first prediction
procedure of the ensemble mode is completed. Then random forest or XGBoost is employed for the
second prediction procedure of the ensemble model and it is called EMD-ANNRC-RF (since random
forest model is preferred for the final experiments).

Original Dataset Implement EMD to Apply the lagged values of
Scaling scaled original dataset each IMF as input to Model
Stage 1 Stage 2
IMF0 IMF1 IMFK IMF0 𝑑𝑡(𝑦) 𝜎𝑡(𝑥)

…
RF
ANNR1 ANNR2 … ANNRK ANNC
Predict 𝐷𝑡(𝑦) : direction class
Predict Predict Predict Predict of scaled original series
values of values of … values of direction
IMF1 IMF2 IMFK class of
IMF0 𝐶𝑦,𝑡
Integrate LIME
Σ 𝐶ℎ0(𝑡),𝑡 algorithm to RF
Decide the reliability level

Obtain sigmoid
𝑦𝑡 ‒ 𝑦𝑡 ‒ 1 function outputs of
predicted classes Make predictions with
trusted suggestions
Evaluate the prediction

𝑑𝑡(𝑦) 𝜎𝑡(𝑥) results
Figure 2. Schematic layout of proposed two stage EMD-ANNRC-RF-LIME model
Step by step prediction procedure of EMD-LIME-ANN is given as below,

1. Scale original dataset.
2. Decompose the entire original dataset and obtain IMF datasets.
3. Divide each of the IMF dataset into training, validation and test sets.
4. Train ANN with 𝑛 days lagged values of the IMF dataset as inputs and most recent day value
as output for each IMF. The ANN model utilized here is referred as ANNR for ANN
regression.
5. Train another ANN with 𝑛 days lagged values of IMF0 dataset, 𝐼𝑀𝐹0𝑡 ‒ 1, …, 𝐼𝑀𝐹0𝑡 ‒ 𝑛, as
inputs and most recent day’s movement direction of IMF0, 𝐷𝑡(𝐼𝑀𝐹0), as output. The ANN
employed at this stage is a classifier and referred as ANNC.
6. Predict original dataset in validation set via ANNR by summation of all predicted IMFs and
by calculating difference between original dataset predictions for each successive time periods
𝑡 and 𝑡 ‒ 1 obtain 𝑑𝑡(𝑦) for each day.
7. Predict directional movements of IMF0 in validation set via ANNC. Obtain predicted sigmoid
function outputs, 𝜎𝑡(𝑥), for each time period 𝑡 of each classification.
8. Using 𝑑𝑡(𝑦) and 𝜎𝑡(𝑥) that are obtained in steps 5 and 6 as inputs and direction of scaled
original series, 𝐷𝑡(𝑦) as output; train another classification model such as RF and XGBoost in
validation set.
9. Train LIME classification algorithm on ANNC2 in validation set with the same input/output
setup as described in previous step and obtain class probabilities. Determine class probability

reliance level so that predictions are made if the prediction of ANNC2 is reliable enough with
respect to the criteria.
10. Finally, in test set, predict original dataset with ANNR and predict IMF0 with ANNC1. Then
feed prediction outputs of ANNR and ANNC1 to ANNC2 as inputs to make final predictions
with respect to pre-determined reliable class probabilities calculated by LIME algorithm.
2.4.1. Integrating LIME algorithm to the proposed model
Prediction with higher accuracy is vital for stock market prediction. Therefore, using a
prediction model with a high accuracy rate will allow more successful results to be obtained.
However, in cases where the accuracy of the prediction model cannot be increased further, another
approach can be suggested to increase the success of the predictions. Testing the reliability of the
results suggested by the prediction model appears as an option. For instance, if there would be a
measure of how sure the prediction model is about each prediction, the predictions with relatively
higher probability of occurrence could be used to make decision by providing reliability for the
decision maker. On the other hand, it could be hesitant to make decisions for unreliable predictions
(i.e. lower probability of occurrence). Thus, it would not be wrong to expect an ultimate increase in the
success of the predictions, since only the predictions with a high level of reliability will be used in
decision-making. In the machine learning literature, this approach is generally referred to as model
explainability. Model explicability increases trust in a machine learning model because it allows it to
be interpreted. There are two different ways to interpret a model; global and local. Global
interpretation explains the whole model while local interpretation explains only predictions [12].
Global interpretation explain the complete behavior of the model while local interpretation helps in
understanding how the model makes decisions for a single instance and explain the individual
predictions. LIME and SHAP are common algorithms for local interpretation. In this study, the LIME
algorithm is used for this purpose. Since it is desired to explain each prediction made by the prediction
model and consult to the measurements related to it in decision making, local interpretability approach
is employed for model explicability in this study.
A two-stage ensemble prediction model is designed to increase the accuracy of the base model
(EMD-ANN), and the outputs of the first stage, 𝜎𝑡(𝑥) and 𝑑𝑡(𝑦), are given as input to another
prediction model in the second stage. There are two purposes in designing this prediction procedure.
One of them is to increase the prediction success of the base model, as can be seen in the results in
section 3.2, and the other is to add another aspect to the prediction procedure using the LIME
algorithm. Therefore, according to the second aspect of this study, another novel prediction procedure
is proposed using the LIME algorithm. Essentially, the importance of using 𝜎𝑡(𝑥) as the predictive
output of the ANNC model instead of the class labels directly to predict 𝐷𝑡(𝑦𝑡), and similarly, using 𝑑𝑡
(𝑦) instead of directly using 𝑦𝑡 as the output of ANNR emerges at this stage. As mentioned in the
previous section, albeit partially, a parallelism is observed between the values of 𝜎𝑡(𝑥) and the values
of 𝐷𝑡(𝑦𝑡) in the preliminary experiments. As the values of 𝜎𝑡(𝑥) approach 1 and 0, the accuracy of the
predictions for 1 and 0 values of 𝐷𝑡(𝑦𝑡) increases, respectively. It is observed that the increase in hit
rates is similar for 𝑑𝑡(𝑦). As the absolute amount of increase/decrease in the values of 𝑑𝑡(𝑦) increases,
the accuracy rates of the predictions made for the 1 and 0 values of 𝐷𝑡(𝑦𝑡) increase, respectively. The
LIME algorithm is utilized to make these observed rules more systematic and useful. Thus, the
reliability of the results of the prediction model can be measured for each value of 𝜎𝑡(𝑥) and 𝑑𝑡(𝑦),
and only the predictions that meet the reliability condition are used to predict according to a certain
predetermined reliability level.
10

Random
Sample 1
Random
Sample 2
Random
Sample 3
Random
Sample 4
Random
Sample 5
Random
Sample 6
Random
Sample 7
Figure 3. Randomly selected predictions with LIME in test set.
By integrating the LIME algorithm to the prediction model, each prediction of the RF model is
explained in the test set. The LIME technique enables these explanations based on the behavior of RF
model in the validation set. As a result, for each prediction of RF model in the test set, prediction
probabilities or explanation prediction probabilities (EPP) are calculated for each class label.
Moreover, when any input instance is given to RF, since the prediction model is a binary classifier, it
produces one of the values of 0 and 1 as a prediction output. However, when the LIME algorithm is
integrated, it will allow a probability calculation of the probability of occurrence for each class label.
For example, let's assume that any input instance is given to RF model in test set. Then LIME
calculates prediction probabilities such as in Figure 3. In Figure 3, seven different random prediction
samples are given. According to random sample 1, for class 0 (downward prediction) LIME calculates
prediction probability as 0.97 and for class 1 (upward prediction) 0.03 and corresponding input values
are given in the right hand side of the figure. This indicates that, a downward movement will occur
with 0.97 probability or upward movement will occur with 0.03 probability. For random sample 2,
prediction probabilities of 0 and 1 classes are 0.39 and 0.61 respectively. In this case, if the decision
maker determines 0.70 as the reliability level beforehand, then he/she would trust the prediction in
random sample 1 since the downward prediction probability (0.97) is greater than or equal to the
reliability level. On the other hand, in random sample 2, since the prediction probability of both of the
classes is less than 0.70, the decision maker would hesitate to make a prediction and refrain from
making a decision.
3. Experimental results and evaluation

11

3.1. Data descriptions and experimental environment
All experiments in this study are performed in Python 3.9 on Windows 10 Pro with AMD Ryzen
5900X processor and 32 GB RAM. Utilized Python libraries are Pandas [15], NumPy [11], PyEMD
[25, 26], Scikit-learn [29], Keras [5], Matplotlib [35], xgboost [4], lime [30]. Ten years of historical
data for each dataset is used for the experiments. In order to visualize the test periods of each of the
stock market index, daily closing prices are depicted in Figure 4. Also before conduct experiments,
each data sets are scaled by maximum-minimum method, such that
𝑦𝑡 ‒ 𝑚𝑖𝑛( 𝑦𝑡)
𝑠(𝑦𝑡) = (13)
𝑚𝑎𝑥( 𝑦𝑡) ‒ 𝑚𝑖𝑛( 𝑦𝑡)
where 𝑠(𝑦𝑡): scaled data set, 𝑚𝑖𝑛( 𝑦𝑡): minimum value of 𝑦𝑡 and, 𝑚𝑎𝑥( 𝑦𝑡): maximum value of 𝑦𝑡.
Data normalization is common for machine learning and deep learning tasks since prediction models
can work more efficiently with scaled data.
Table 2. Date ranges and sizes of total data, training set, validation set and test set of experiment data
Index Dates Total data Training set Validation set Test set
SP500 2012-01-03 ~ 2021-12-31 2517 1261 625 626
NI225 2012-01-04 ~ 2021-12-30 2445 1224 608 608
XU100 2012-01-02 ~ 2021-12-31 2512 1258 624 625
KOSPI 2012-01-02 ~ 2021-12-30 2460 1232 611 612
DAX 2012-01-02 ~ 2021-12-30 2530 1267 629 629
FTSE100 2012-01-04 ~ 2021-12-31 2529 1267 628 629
Experiments are carried out on six different stock market indices which are SP500, NI225,
XU100, KOSPI, DAX and FTSE100. In order to train and predict stock market indices’ upward and
downward movement direction on the next day, the entire data set is divided into 50%, 25% and 25%
subsets as training, validation and test sets respectively. Since four days lagged values of each IMF are
used as inputs for prediction in Stage 1, four days of data are missing. In addition, since the differ
series 𝑑𝑡(𝑦) is obtained from the predictions of ANNR, there is a one more day data loss occurs, then
totally five days loss occurs in the sum of the validation, test and training sets. Date ranges of data
sets, total data, training set, validation set and test set sizes are given in Table 2.
12

Figure 4. Test periods of predicted indices
Since the length of the holiday days of different countries' markets differ from each other, negligible
differences occur between the total lengths of the data sets and the start-end dates. All of the
experimental data sets were downloaded from the tradingview website (https://tr.tradingview.com/).
3.2. Results and final evaluation
All accuracy results of EMD-ANNR, EMD-ANNRC-XGBoost and EMD-ANNRC-RF models

have been presented in Table 3 for each of the stock market indices. Obtained results reveal that
proposed two stage ensemble prediction models is significantly better than EMD-ANNR according to
performance evaluation metrics. Furthermore, EMD-ANNRC-RF is slightly better than EMD-
ANNRC-XGBoost in all cases. It is possible to think that this might be due to the fact that XGBoost
algorithm has more hyper parameters than RF hence is obliged to be fine-tuned.
13

Table 3. Comparison of prediction models
SP500 NI225 XU100 KOSPI DAX FTSE100
EMD-ANNR Accuracy 0.6784 0.6903 0.7352 0.7328 0.6529 0.6476
Precision 0.7112 0.6979 0.7519 0.7715 0.6715 0.6636
Recall 0.7409 0.7241 0.8122 0.7514 0.6875 0.6909
F1-score 0.7258 0.7108 0.7809 0.7613 0.6794 0.6770
ROC-AUC 0.6675 0.6885 0.7203 0.7299 0.6503 0.6443
EMD-ANNRC-XGBoost Accuracy 0.7796 0.7878 0.7872 0.7810 0.7727 0.7727
Precision 0.8101 0.8065 0.8142 0.8257 0.7898 0.7699
Recall 0.8056 0.7837 0.8209 0.7781 0.7827 0.8138
F1-score 0.8078 0.7949 0.8176 0.8012 0.7862 0.7912
ROC-AUC 0.7750 0.7880 0.7807 0.7815 0.7719 0.7701
EMD-ANNRC-RF Accuracy 0.7987 0.8010 0.7888 0.7925 0.7806 0.7838
Precision 0.8287 0.8133 0.8113 0.8313 0.7929 0.8050
Recall 0.8194 0.8056 0.8292 0.7954 0.7976 0.7808
F1-score 0.8240 0.8094 0.8202 0.8130 0.7953 0.7927
ROC-AUC 0.7951 0.8007 0.7810 0.7920 0.7794 0.7840
As a final analysis, the findings obtained by applying the LIME algorithm to the EMD-ANN-RF
model are discussed below and the resulting model is entitled as two-stage EMD-ANN-RF-LIME
from now on. Integrating the LIME algorithm to the prediction model makes it possible for the
decision maker to rely on predictions which only satisfy the reliability condition. Therefore, at this
stage, the value of the reliability level should be determined. Then, according to the determined
reliability level, predictions are made during the test period and it should be calculated for which days
the predictions will be made and to what extent these predictions are accurate. Also, as the reliability
level increases, it is necessary to test whether there is a corresponding increase in the accuracy rate. In
Figure 5, reliability level (horizontal axes) and accuracy rate (vertical axis) relation is plotted for the
test sets of six different market indices. Reliability level varies between 0.5 to 1 as 0.5 reliability
simply means working with EMD-ANN-RF model, in other words not using LIME technique at all.
By increasing reliability level gradually towards 1, decision maker simply imposes the desire of
getting more trustworthy predictions from the model. As previously mentioned, because of the
tradeoff between number of the number of trusted predictions and reliability, one simply cannot expect
to obtain trustworthy predictions for the whole prediction horizon. However, it is natural to think that
increasing reliability level might also increase accuracy rate but with less predictions. For all of the
datasets, increasing reliability level also increases the hit rates in general with small deviations.
14

Figure 5. Accuracy rates by reliability levels in test sets.
The number of days for which trusted predictions can be made and the corresponding accuracy rates
are shown in Figure 6. It should be noted that, normatively, only the predicted days are included in the
calculation of hit rates. Therefore, if any of the EPP value calculated for each class label is not above
the determined reliability level, the prediction will not be made, and one of the TP, TN, FP and FN
values will not occur for that day.
Figure 6. Number of trusted predictions (horizontal axis) and accuracy rates (vertical axis) trade-off in
test set.
The number of days for which a trusted prediction can be made and the corresponding hit rates are
shown in Figure 6. It should be noted that, naturally only the trusted predictions are included in the
calculation of accuracy rates. Therefore, if any of the EPP value calculated for each class label is not
above the determined reliability level, the model would abstain from making a prediction and none of
15

the TP, TN, FP or FN values will occur for that day. The accuracy rates of the predictions made during
the test period are between 0.7806 (DAX) for 629 trusted predictions and 0.8010 (NI225) for 608
trusted predictions at the bottom, while it ranges between 0.9079 (FTSE100) for 76 trusted predictions
and 0.9913 (KOSPI) for 115 trusted predictions at the top. As the reliability level is increased, the
number of predictions meeting the reliability condition decreases as expected. Consequently, the
accuracy rate increases as more trusted predictions are made as a result of the correlation between
reliability level and accuracy rate implicitly.
Figure 7. Percentage of trusted predictions with respect to reliability level and accuracy rate.
For six stock market indices, the proportion of trusted predictions in test sets (𝑡𝑟𝑢𝑠𝑡𝑒𝑑 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 %
) decreases for almost every reliability level while accuracy rate increases as can be seen in Figure 7.
On the other hand, as the reliability level exceeds approximately 0.90, the linear structure of the
relationship starts to deteriorate and 𝑡𝑟𝑢𝑠𝑡𝑒𝑑 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 % decreases drastically while an
equivalent increase in the accuracy rate on average is not observed.
The experimental results of EMD-ANN-RF-LIME model are summarized in Table 4. For the
cases where the reliability level is greater than or equal to each of the values %50, %85, %95 and %1
00, the hit rates and the corresponding number of trusted predictions are given in Table 4. Notice that
the number of trading days for Reliability ≥ 0.50 is the total test period for all indices and this is due
to the fact that it is equivalent for not determining any level of reliability in other words it can be
interpreted as predicting the entire test set same as EMD-ANN-RF model. The highest accuracy of the
EMD-ANN-RF-LIME is obtained on KOSPI data set with 0.9913 accuracy where 155 predictions are
made for Reliability = 1. On the other hand, lowest accuracy obtained on FTSE100 index with
0.9079 for 76 predictions where Reliability = 1.
Table 4. Prediction result of EMD-ANN-RF-LIME
Index Reliability = 1 Reliability ≥ 0.95 Reliability ≥ 0.85 Reliability ≥ 0.50
16

Accuracy Trusted Accuracy Trusted Accuracy Trusted Accuracy Trusted
predictions predictions predictions predictions
SP500 0.9850 133 0.9585 313 0.9020 408 0.7987 626
NI225 0.9700 100 0.9715 281 0.9266 354 0.8010 608
XU100 0.9655 116 0.9161 286 0.8804 418 0.7888 625
KOSPI 0.9913 115 0.9537 259 0.9158 368 0.7925 612
DAX 0.9636 55 0.9109 247 0.8545 385 0.7806 629
FTSE100 0.9079 76 0.9105 257 0.8924 381 0.7838 629
On another note, although prediction accuracies of proposed EMD-ANN-RF model for six different
datasets are close to each other, as the reliability level increases the accuracy rates of six datasets begin
to diverge (see Figure 5 and Figure 6). Therefore, it can be said that although the LIME algorithm
fulfills its task as expected, EMD-ANN-RF-LIME model may need improvement in terms of
robustness for higher levels of reliability.
4. Conclusion and future directions
Although as a decomposition approach, EMD is able to provide very bright results in our
experiments, there are more up-to-date data decomposition methods such as ensemble empirical mode
decomposition (EEMD) and variational mode decomposition (VMD) in the contemporary literature.
Also parallel experiments carried out by employing VMD and EEMD techniques. According to our
findings there was not any significant difference observed between EMD and mentioned techniques in
terms of hit rates. However, Niu et al. [24] has shown that VMD is more successful compared to
EMD in stock market direction prediction. In this case, we assert that VMD does not make a
difference in the ensemble model proposed here. It can be taught that the reason behind this
indifference might be due to the specific configuration the proposed ensemble model. Therefore, we
think that there is still room for improving obtained results by employing VMD with different settings
in future studies.
There is another important aspect to consider regarding data decomposition. In the literature
including this paper, the entire dataset (training, validation and test) is decomposed beforehand.
Therefore, decomposing the validation and test set along with training set is necessary before
employing any machine learning technique. However, in practice, only the data up to the day to be
predicted can be decompose before making the prediction as it is observed. Therefore, the next day
values of the subcomponents obtained after the observed original price data are decomposed are
predicted. To be more precise, if we say 0 to the starting point of the original time series, the values of
the subcomponents at time 𝑡 are predicted after the original time series consisting of the values of days
(0,1,…,𝑡 ‒ 2,𝑡 ‒ 1) is decomposed. Then, the time series consisting of the values of the days
(1,…,𝑡 ‒ 1,𝑡) is decomposed again, and the values of the subcomponents at the time 𝑡 + 1 are
predicted, and the prediction procedure proceeds in this way. This prediction procedure is referred to
as simply sliding window. In short, these operations are performed one after the other for each
window, and predictions are performed. It is suggested that financial time series prediction models
with data decomposition should be developed with such an experimental design, especially to guide
practitioners.
Although ANN is sufficiently well in predicting time series, LSTM has been shown to be more
successful [36]. Based on our preliminary experiment results, no significant difference is observed
between ANN and LSTM in terms of accuracy. In future studies we suggest that it might be useful to
better explore LSTM algorithm with fine-tuned hyper parameters. The same inference can be derived
also XGBoost model. Studies such as [43] have shown that XGBoost model can produce more
successful results than RF. The reason for adoption of RF in our ensemble model’s second stage is the
fact that RF was observed to be slightly better than XGBoost in terms of accuracy rates. Moreover, RF
has lesser hyper parameters relative to XGBoost hence it is more straightforward to use. On another
note, the possible reason for XGBoost’s inferior performance can be explained with the default
17

parameter settings as there are much more hyper parameters which needs to be fine-tuned with respect
to RF.
In the literature, it has not been found that the LIME algorithm has been used as suggested here in
an integrated manner with machine learning techniques to make direct daily direction estimation.
Therefore, integrating LIME in such a context is, in our opinion, an original contribution to the
literature. Thanks to LIME algorithm’s ability to avoid “unreliable” predictions of the model, and
allowing relatively fewer but more reliable predictions, our proposed EMD-ANNRC-RF-LIME
framework has proved to be distinctively successful.
The inevitable outcome of such a framework would be its use as a beneficial decision making tool
for investors in capital markets where buy and sell decisions usually occur synchronously. For
instance, in a daily trading operation, the more number of time series to be predicted at a particular
time period increases, the less total number of days in which the predictions are abstained will
decrease in a multi-asset setting. Because when more than one time series are predicted in the same
period, some predictions will be avoided for some assets while the others would be fulfilled.
Therefore, instead of predicting a single stock or stock market indices, it is possible to predict more
than one stock at the same time and rely on for those who provide a reliability condition among them.
Thus, in each prediction period, directional predictions are made for a certain number of stocks that
meet the reliability condition, while predictions that do not meet the reliability condition will be
avoided and the predicted stocks will change periodically. For further studies, it is recommended to
design an experiment in which the direction of a large number of assets is predicted in the same period
as stated.
References
[1] Ampomah, E. K., Qin, Z., & Nyame, G. (2020). Evaluation of tree-based ensemble machine
learning models in predicting stock price direction of movement. Information, 11(6), 332.
[2] Bisoi, R., Dash, P. K., & Parida, A. K. (2019). Hybrid variational mode decomposition and
evolutionary robust kernel extreme learning machine for stock price and movement prediction on daily
basis. Applied Soft Computing, 74, 652-678.
[3] Börjesson, L., & Singull, M. (2020). Forecasting financial time series through causal and dilated
convolutional neural networks. Entropy, 22(10), 1094.
[4] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp.
785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785
[5] Chollet, F., & others. (2015). Keras. GitHub. Retrieved from https://github.com/fchollet/keras
[6] Dash, R., Samal, S., Dash, R., & Rautray, R. (2019). An integrated TOPSIS crow search based
classifier ensemble: In application to stock index price movement prediction. Applied Soft Computing,
85, 105784.
[7] Dragomiretskiy, K., & Zosso, D. (2013). Variational mode decomposition. IEEE transactions on
signal processing, 62(3), 531-544.
[8] Ghorbani, M., & Chong, E. K. (2020). Stock price prediction using principal components. Plos
one, 15(3), e0230124.
[9] Gunduz, H. (2021). An efficient stock market prediction model using hybrid feature reduction
method based on variational autoencoders and recursive feature elimination. Financial Innovation,
7(1), 1-24.
[10] Hao, Y., & Gao, Q. (2020). Predicting the trend of stock market index using the hybrid neural
network based on multiple time scale feature learning. Applied Sciences, 10(11), 3961.
[11] Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature
585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2. (Publisher link).
[12] Heuillet, A., Couthouis, F., & Díaz-Rodríguez, N. (2021). Explainability in deep reinforcement
learning. Knowledge-Based Systems, 214, 106685.
[13] Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., ... & Liu, H. H. (1998).
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time
18

series analysis. Proceedings of the Royal Society of London. Series A: mathematical, physical and
engineering sciences, 454(1971), 903-995.
[14] Ismail, M. S., Noorani, M. S. M., Ismail, M., Razak, F. A., & Alias, M. A. (2020). Predicting next
day direction of stock price movement using machine learning methods with persistent homology:
Evidence from Kuala Lumpur Stock Exchange. Applied Soft Computing, 93, 106422.
[15] Jeff Reback, Wes McKinney, jbrockmendel, Joris Van den Bossche, Tom Augspurger, Phillip
Cloud, gfyoung, Sinhrks, Adam Klein, Matthew Roeschke, Simon Hawkins, Jeff Tratner, Chang She,
William Ayd, Terji Petersen, Marc Garcia, Jeremy Schendel, Andy Hayden, MomIsBestFriend, …
Mortada Mehyar. (2020). pandas-dev/pandas: Pandas 1.0.3 (v1.0.3). Zenodo.
https://doi.org/10.5281/zenodo.3715232
[16] Jiang, M., Liu, J., Zhang, L., & Liu, C. (2020). An improved Stacking framework for stock index
prediction by leveraging tree-based ensemble models and deep learning algorithms. Physica A:
Statistical Mechanics and its Applications, 541, 122272.
[17] Jin, Z., Yang, Y., & Liu, Y. (2020). Stock closing price prediction based on sentiment analysis
and LSTM. Neural Computing and Applications, 32(13), 9713-9729.
[18] Lee, M. C., Chang, J. W., Yeh, S. C., Chia, T. L., Liao, J. S., & Chen, X. M. (2022). Applying
attention-based BiLSTM and technical indicators in the design and performance analysis of stock
trading strategies. Neural Computing and Applications, 1-13.
[19] Liu, H., & Long, Z. (2020). An improved deep learning model for predicting stock market price
time series. Digital Signal Processing, 102, 102741.
[20] Liu, S., Zhang, X., Wang, Y., & Feng, G. (2020). Recurrent convolutional neural kernel model
for stock price movement prediction. Plos one, 15(6), e0234206.
[21] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions.
Advances in neural information processing systems, 30.
[22] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous
activity. The bulletin of mathematical biophysics, 5(4), 115-133.
[23] Nabipour, M., Nayyeri, P., Jabani, H., Shahab, S., & Mosavi, A. (2020). Predicting stock market
trends using machine learning and deep learning algorithms via continuous and binary data; a
comparative analysis. IEEE Access, 8, 150199-150212.
[24] Niu, H., Xu, K., & Wang, W. (2020). A hybrid stock price index forecasting model based on
variational mode decomposition and LSTM network. Applied Intelligence, 50(12), 4296-4309.
[25] Pele, O., and Werman, M.. A linear time histogram metric for improved SIFT matching.
Computer Vision - ECCV 2008, Marseille, France, 2008, pp. 495-508.
[26] Pele, O., and Werman, M.. Fast and robust earth mover’s distances. Proc. 2009 IEEE 12th Int.
Conf. on Computer Vision, Kyoto, Japan, 2009, pp. 460-467.
[27] Padhi, D. K., Padhy, N., Bhoi, A. K., Shafi, J., & Ijaz, M. F. (2021). A Fusion Framework for
Forecasting Financial Market Direction Using Enhanced Ensemble Models and Technical Indicators.
Mathematics, 9(21), 2646.
[28] Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock and stock price index
movement using trend deterministic data preparation and machine learning techniques. Expert systems
with applications, 42(1), 259-268.
[29] Pedregosa, F., Varoquaux, Ga"el, Gramfort, A., Michel, V., Thirion, B., Grisel, O., … others.
(2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(Oct),
2825–2830.
[30] Ribeiro, M. T. (2016). Local Interpretable Model-Agnostic Explanations. https://lime-
ml.readthedocs.io/en/latest/index.html
[31] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Explaining the
predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on
knowledge discovery and data mining (pp. 1135-1144).
[32] Rothman, D. (2020). Hands-On Explainable AI (XAI) with Python: Interpret, visualize, explain,
and integrate reliable AI for fair, secure, and trustworthy AI apps. Packt Publishing Ltd.
[33] Shen, J., & Shafiq, M. O. (2020). Short-term stock market price trend prediction using a
comprehensive deep learning system. Journal of big Data, 7(1), 1-33.
19

[34] Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency–
inverse document frequency-based feature weight matrix with neural networks. Applied Soft
Computing, 96, 106684.
[35] Thomas A Caswell, Michael Droettboom, Antony Lee, Elliott Sales de Andrade, Tim Hoffmann,
Jody Klymak, John Hunter, Eric Firing, David Stansby, Nelle Varoquaux, Jens Hedegaard Nielsen,
Benjamin Root, Ryan May, Phil Elson, Jouni K. Seppänen, Darren Dale, Jae-Joon Lee, Damon
McDougall, Andrew Straw, … Paul Ivanov. (2022). matplotlib/matplotlib: REL: v3.5.2 (v3.5.2).
Zenodo. https://doi.org/10.5281/zenodo.6513224
[36] Wu, D., Wang, X., Su, J., Tang, B., & Wu, S. (2020). A labeling method for financial time series
prediction based on trends. Entropy, 22(10), 1162.
[37] Xiao, J., Zhu, X., Huang, C., Yang, X., Wen, F., & Zhong, M. (2019). A new approach for stock
price analysis and prediction based on SSA and SVM. International Journal of Information
Technology & Decision Making, 18(01), 287-310.
[38] Xu, F., & Tan, S. (2021). Deep learning with multiple scale attention and direction regularization
for asset price prediction. Expert Systems with Applications, 186, 115796.
[39] Yang, C., Zhai, J., & Tao, G. (2020). Deep learning for price movement prediction using
convolutional neural network and long short-term memory. Mathematical Problems in Engineering,
2020.
[40] Yang, Y., Hu, X., & Jiang, H. (2022). Group penalized logistic regressions predict up and down
trends for stock prices. The North American Journal of Economics and Finance, 59, 101564.
[41] Yujun, Y., Yimei, Y., & Jianhua, X. (2020). A hybrid prediction method for stock price using
LSTM and ensemble EMD. Complexity, 2020.
[42] Yujun, Y., Yimei, Y., & Wang, Z. (2021). Research on a hybrid prediction model for stock price
based on long short-term memory and variational mode decomposition. Soft Computing, 25(21),
13513-13531.
[43] Yun, K. K., Yoon, S. W., & Won, D. (2021). Prediction of stock price direction using a hybrid
GA-XGBoost algorithm with a three-stage feature engineering process. Expert Systems with
Applications, 186, 115716.
[44] Zhang, X., Gu, N., Chang, J., & Ye, H. (2021). Predicting stock price movement using a DBN-
RNN. Applied Artificial Intelligence, 35(12), 876-892.
[45] Zhou, F., Zhou, H. M., Yang, Z., & Yang, L. (2019). EMD2FNN: A strategy combining
empirical mode decomposition and factorization machine based neural network for stock market trend
prediction. Expert Systems with Applications, 115, 136-151.
[46] Zhou, Z., Gao, M., Liu, Q., & Xiao, H. (2020). Forecasting stock price movements with multiple
data sources: Evidence from stock market in China. Physica A: Statistical Mechanics and its
Applications, 542, 123389.
20

ssrn-4170455

Uploaded by

Copyright:

Available Formats

ssrn-4170455

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ssrn-4170455

Uploaded by

Copyright:

Available Formats

Extending Machine Learning Prediction Capabilities by Explainable AI in Financial Time

All the authors do not have any conflicts of interest.

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

2. Prediction Methods and Framework

2.1. Empirical mode decomposition

Electronic copy available at: https://ssrn.com/abstract=4170455

2.2. Artificial neural networks

Input Hidden Hidden Hidden

Figure 1. General architecture of ANN models.

Electronic copy available at: https://ssrn.com/abstract=4170455

2.3. Local interpretable model-agnostic explanations (LIME)

2.4. Proposed model: Two-stage ensemble EMD-ANN-RF

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

IMF0 IMF1 IMFK IMF0 𝑑𝑡(𝑦) 𝜎𝑡(𝑥)

Decide the reliability level

Evaluate the prediction

Figure 2. Schematic layout of proposed two stage EMD-ANNRC-RF-LIME model

Step by step prediction procedure of EMD-LIME-ANN is given as below,

Electronic copy available at: https://ssrn.com/abstract=4170455

2.4.1. Integrating LIME algorithm to the proposed model

Electronic copy available at: https://ssrn.com/abstract=4170455

Figure 3. Randomly selected predictions with LIME in test set.

3. Experimental results and evaluation

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

3.2. Results and final evaluation

All accuracy results of EMD-ANNR, EMD-ANNRC-XGBoost and EMD-ANNRC-RF models

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

Table 4. Prediction result of EMD-ANN-RF-LIME

Index Reliability = 1 Reliability ≥ 0.95 Reliability ≥ 0.85 Reliability ≥ 0.50

Electronic copy available at: https://ssrn.com/abstract=4170455

4. Conclusion and future directions

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

Electronic copy available at: https://ssrn.com/abstract=4170455

You might also like