Final Year Project II (1) - 4-84-1-55

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

CERTIFICATE

Received 19 June 2023, accepted 11 July 2023, date of publication 17 July 2023,
date of current version 27 September 2023.
Digital Object Identifier 10.1109/ACCESS.2023.3296308

A Stock Price Prediction Method Based on


BiLSTM and Improved Transformer
SHUZHEN WANG
School of Information Science and Technology, Xiamen University Tan Kah Kee College, Zhangzhou 363105, China
Hongwang Laboratory, School of Information Science and Technology, Xiamen University, Zhangzhou 363105, China
e-mail: wangsz@xujc.com
This work was supported by the Educational Research Project of Young and Middle-Aged Teachers of Fujian Provincial Department of
Education under Grant JAT191086.

ABSTRACT How to maximize shareholder returns has always been a focus of research in the financial
field. In order to improve the accuracy and stability of stock price prediction, this article proposes a new
method, BiLSTM-MTRAN-TCN. Improve the transformer model and introduce TCN (Temporary Revo-
lution Network) to construct a new transformer model (MTRAN-TCN), making it suitable for stock price
prediction. This method consists of BiLSTM (Bi-directional Long Short-Term Memory) and MTRAN-TCN,
which can fully utilize the advantages of the three models: BiLSTM, transformer and TCN. Transformer is
good at obtaining full range distance information, but its ability to capture sequence information is weak.
BiLSTM can capture bidirectional information in sequences, while TCN can capture sequence dependencies
and improve the model’s generalization ability. Not only did the improvement effect of the transformer and
the effectiveness of introducing the BiLSTM model be verified, but the effectiveness of the method was
also verified using 5 index stocks and 14 Shanghai and Shenzhen stocks. Compared with other existing
methods in the literature, this method has the best fit on each index stock, and the R2 of this method is
the best in 85.7% of the stock dataset. RMSE decreases by 24.3% to 93.5%, and R2 increases by 0.3% to
15.6%. In addition, this method has relatively stable prediction performance at different time periods and
does not have timeliness issues. The results indicate that the BiLSTM-MTRAN-TCN method performs better
in predicting stock prices, with high accuracy and generalization ability.

INDEX TERMS Transformer, BiLSTM, TCN, stock price prediction, deep learning, hybrid neural network.

I. INTRODUCTION In the early days, the most famous is the moving average
For decades, stock price prediction has attracted atten- autoregressive model ARIMA [6]. Later, Narendra et al. [7]
tion from investors and researchers due to their enormous applied ARIMA model and GARCH model (autoregressive
value [1]. More and more investors pay attention to the chang- conditional heteroskedasticity model) to the NSE Indian
ing trend of stock price [2]. For economists, predicting stock stock market data forecast. In addition to these two models,
price changes in advance is a very important task [3], [4]. there are Bayesian vector autoregression model and Kalman
It can help investors maximize their investment income. How- filter model. Although these techniques can be success-
ever, due to the high volatility of the stock market and the fully used for short-term prediction, they are not suitable
impact of random noise, its trend is complex and difficult to for nonlinear problems and have poor long-term prediction
predict [5]. Although the financial time series is difficult to performance [8]. To solve this problem, machine learning
predict, it generally shows predictability is an essential task. is introduced to analyze time series, and they are success-
fully applied to stock price forecasting [2], [9], [10]. The
advantages of machine learning in processing complex and
The associate editor coordinating the review of this manuscript and large amounts of data have solved many limitations of tra-
approving it for publication was Yiqi Liu . ditional methods [9]. Machine learning methods include
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
VOLUME 11, 2023 104211
S. Wang: Stock Price Prediction Method Based on BiLSTM and Improved Transformer

support vector machine (SVM), decision tree, naive Bayes, In 2022, Zhang et al. proposed a novel transformer
random forest, etc. Wang et al. [11] mixed decision trees and encoder-based attention network framework with the fusion
SVM models to predict future price trends. Chen et al. [12] of media text and stock price, and it has been shown to be
established feature weighted SVM and K-nearest neighbor effective to predict the rise or fall of stock price [8]. In 2022,
algorithm to predict the stock market index. Experimental Peng et al. used a data organization method with LSTM
results have shown that the model has good short-term, and transformer to predict Chinese bank stock price [22].
medium-term, and long-term prediction capabilities. In 2021, In 2022, Wang utilized the latest deep learning framework,
Yan Zhengxu et al. [13] proposed a new combination model transformer, to predict the stock market index and it demon-
method of Random forest based on Pearson coefficient on strated that the transformer can outperform other classical
the basis of Random forest to achieve short-term forecasting methods [23].
regression of stock price. We can see that different LSTM-based and transformer-
In recent years. deep learning methods that rely solely on based models have been proposed for stock prediction. But
datasets can be used to predict stock price without the need so far, it is rare to use transformer-based models to predict
for professional knowledge. Therefore, their application in price without considering the significance of stock data or
the field of stock prediction has gradually become a research social media text. Most of the models based on transformer
hotspot for scholars. Deep learning methods include gated are used to predict the stock trend of up and down, rather
cyclic unit (GRU), recurrent neural network (RNN), Con- than predicting the stock price. Furthermore, now most of
volutional neural network (CNN), Long short-term memory the proposed methods usually only target specific stocks or a
(LSTM) and bidirectional Long short-term memory (BiL- single stock index, while the prediction model has timeliness
STM). In 2017 and 2018, wavelet neural networks and CNN issues. Therefore, there is still a lot of room for optimization
were successively used for stock price prediction [14], [15]. in terms of accuracy and depth in the network structure of
Experiments have shown that CNN is effective in predicting stock prediction models.
time series. In 2018, the authors proposed Conv1D-LSTM In order to improve the stability and accuracy of stock price
model, which combines one-dimensional CNN and LSTM. prediction, this article proposes a novel method BiLSTM-
It can integrate the advantages of the two networks: CNN MTRAN-TCN based on transformer. It can not only predict
can effectively extract features, and LSTM can well pro- the index price, but also the individual stock price. This
cess sequence data. The results indicate that the prediction method is formed by introducing BiLSTM and TCN (Tem-
results are more accurate than machine learning prediction porary Revolution Network) on the basis of transformer
models [16]. In 2019, Yang Qing et al. conducted predictive encoder. Transformer is a mechanism that can extract deep
research on global stock indexes using BiLSTM, indicating features of small samples to obtain key information. BiLSTM
that BiLSTM has excellent predictive accuracy and strong can capture bidirectional information in sequences, while
generalization ability [17]. In 2020, one study [10] used TCN can capture sequence dependencies and improve the
LSTM regression model to forecast India’s NIFTY 50 index. model’s generalization ability.
The results showed that LSTM models based on deep The main work completed in this article is as follows:
learning performed better than traditional machine learning
• Improve the transformer model and introduce TCN to
methods.
construct a new transformer model (MTRAN-TCN),
Now, in most of the current studies, the attention mecha-
making it suitable for stock price prediction.
nism has begun to be the main structure to solve the problem
• This method consists of BiLSTM and MTRAN-TCN,
of financial market forecasting, focusing on the key position
which can fully utilize the advantages of the three
that has a greater impact on the results. In 2020, Lu et al.
models: BiLSTM, transformer and TCN.
introduced an attention mechanism (AM) based on CNN
• Propose a bidirectional stock selection strategy to select
and BiLSTM, proposed CNN-BiLSTM-AM model, which
stock experimental subjects.
proved to be more accurate than the existing models [8].
• Compare with other existing models in the literature to
In 2022, a wavelet transform was used to denoise histor-
verify the effectiveness of the method.
ical stock data based on LSTM and an attention mecha-
• Experimental results have shown that the proposed
nism [18]. Moreover, the transformer is the state-of-the-art
method has good generalization ability and solves the
model based on the attention mechanism, which was pro-
problem of timeliness.
posed for sequence modeling [19]. New methods based on
the transformer were proposed to tackle the stock movement
prediction task. In 2020, Ding et al. demonstrated that the II. ALGORITHM INTRODUCTION
model based on the transformer with the enhancement of A. TRANSFORMER
Multi-Gaussian prior can be used for stock movement pre- Transformer is a classic NLP model proposed by Google’s
diction [20]. In 2021, a transformer neural network based on team in 2017 [24], and Bert, which is popular now,
self-attention was proposed, which has the special ability in is also based on the transformer. The transformer uses the
forecasting time series, and the electricity consumption and self-attention mechanism and does not use the RNN sequen-
traffic data were used to validate the proposed model [21]. tial structure, so that the model can be parallelized and have

104212 VOLUME 11, 2023

You might also like