MSC Thimmaiah K 2024 PDF
MSC Thimmaiah K 2024 PDF
MSC Thimmaiah K 2024 PDF
Kavya Thimmaiah
Supervised by
January 2024
DECLARATION
I hereby declare that the dissertation that I have submitted “Perform the
Stock Prediction Using the Sentiment Analysis and Time Series Forecasting
Approaches to Determine the Optimal One” to Dublin Business School for the
award of master’s in Financial Analytics under the guidance of supervision of Mr.
Heikki Laiho is solely the result of my own work; collaboration contributions have
been acknowledged and are explicitly referenced in the text. This work has not
been submitted to any university or college for the award of Degree.
1
ACKNOWLEDGEMENT
Secondly, I would like to thank my parents and the members of my family for
their support throughout my career pursuit.
2
ABSTRACT
Stock returns are affected by a variety of factors, among which the social media remarks
of public figures are one of the more important aspects on the stock market trend. On top
of that, latest news about the product of the stock also matters. In this paper, we determine
the sentiment type of public figures' social media remarks from the perspective of textual
sentiment, and compare them with the stock chart of the day to analyse the connection
between the two. Specifically, we first construct a dataset of public figures' social remarks
and classify the sentiment types, and then we use the network model BERT for training to
be able to judge the sentiment type of a new remark when it is inputted, which serves as a
basis for stock prediction. The experiment shows that the public figure's speech and the
news will have a strong impact on the stock trading on the same day, but the impact is small
for a long time, at the same time, the more influential the public figure is, the more obvious
the impact on the stock. The development and wealth of countries depend heavily on the
stock market. Data mining and artificial intelligence methods are required to analyse stock
market data. The financial success of particular businesses is one of the important factors
that has a significant impact on stock price volatility. However, news reports also have a
significant impact on how the stock market moves. In this research, we use sentiment
classification to use non-measurable data, such as financial news articles, to forecast a
company's future stock trend. We seek to cast light on the effect of news reports on the
stock market by analysing the connection between news and stock movement. Our study
seeks to advance knowledge of the function of news sentiment in forecasting stock market
trends. The dataset used in this study consists of news headlines from the financial news
website, Financial Times, and the prediction task is to classify the direction of the stock price
changes as either positive or negative. The purpose of this study is to evaluate the
effectiveness of sentiment analysis for stock prediction and to compare the performance of
different algorithms.
3
Table of Contents
DECLARATION .............................................................................................................................1
ABSTRACT .....................................................................................................................................3
4. IMPLEMENTATION ............................................................................................................28
5. CONCLUSION ......................................................................................................................35
4
6. FUTURE WORK ...................................................................................................................36
Table of Figures
Figure 1: LSTM Model ..................................................................................................................15
Figure 2: BERT Algorithm ............................................................................................................22
Figure 3: Prediction Using the Time-Series Approach ..................................................................33
Figure 4: Prediction using the Sentiment Analysis Approach .......................................................34
5
1. THESIS INTRODUCTION
Stocks are credit instruments on the financial market that can be used for
transfer and trading, offering the opportunity to make a profit on the one hand,
and the risk of loss on the other. Yield and liquidity are the two most important
factors of stocks, which reflect the laws of economic functioning and are also one
of the most important indicators of the securities market. Standard financial
theory, based on the efficient market assumption, assumes that markets are fully
transparent and fairly competitive. However, this assumption is far from reality.
Many studies have shown that investors' information acquisition and cognitive
biases in interpreting information affect their decision-making behavior in the stock
market, which in turn affects stock returns and liquidity.
Stock returns and liquidity are affected by various factors. Most natural
investors lack experience in trading, and their judgment of the market is highly
subjective, and they are easily influenced by the public media and people around
them, with a very obvious "herd effect". Among them, public figures express their
views through various social media such as Twitter, Snowball and Collingwood,
which can easily cause emotional fluctuations among investors and affect their
judgment of the stock market trend. Therefore, the social media remarks of public
figures have a higher degree of attention, and it is of great significance to study the
impact of their remarks on the stock market and predict the trend of the stock
market.
Considering that the comments scattered on the Internet media are a large
amount of unstructured real-time text information, it is natural to think of using
computers to help process them. Text mining techniques can uncover meaningful
6
and unknown knowledge in large amounts of textual information, and can be used
to analyze the emotions and sentiments of public figures' social media comments.
The use of computers to characterize and process textual information in the Web
has advanced the field of finance. The rise of the internet and social media
platforms have brought about an explosion of user-generated content on the web,
such as blogs, social media posts, forums, and product reviews.[1] As a result,
businesses and organizations face a daunting task of analyzing this unstructured
data to understand their customers' needs, preferences, and sentiments towards
their products or services. Sentiment analysis, also known as opinion mining, is a
technique used to extract subjective information from text data [2].
We will use a dataset of product reviews and news about the Apple company,
as the source of our data. The dataset contains twitter details and news content
7
gathered from different sources. We will preprocess the data to remove noise,
transform the text into numerical features using Count Vectorizer and Tfidf
Vectorizer, and apply algorithm models as a classifier to predict the sentiment of
the reviews. Using measures like accuracy, precision, recall, and F1-score, we will
assess the effectiveness of the machine learning algorithms. Furthermore, we will
visualize the results using confusion matrices to gain insights into the performance
of the classifiers. Finally, we will compare our results with existing studies in the
field and draw conclusions on the effectiveness of these algorithms for sentiment
analysis.
1.1 Stakeholder:
The stakeholder of the research application are the investors of the stock
market who would like to predict the stock market price trends in a most optimal
manner. As of now, the application is targeting to perform the prediction only for
the apple stock. Anyways, the dataset could be changed with different stock related
data and perform the prediction as we do for the apple stock. Any customization
required here has to be flagged by the stakeholders to develop the solution a more
generic one.
The approach that uses the sentiment analysis model performs more accurate
stock price prediction when compared to the model that considered only the
historical data.
8
1.3 Aim & Objective:
Predicting the price trend of a specific stock means forecasting whether its
price will rise or fall in the future: day, week, month, etc. In other words,
considering t as the current day, the objective is to predict whether there will be
an increase or decrease in its price on day t + n. This scenario can be seen as a
binary classification problem, where the goal is to predict one of two classes, based
on the input data. Then use the binary classified data to predict the stock price
trend.
In this work, we propose an approach for predicting stock price movements
using traditional classifiers and compare it with sentiment analysis approach and
treating each stock as a single dataset. More specifically, we tune hyper-
parameters and train a specialized model for each stock. Our methodology follows
a technical analysis approach that considers the percentage change of prices
(closing, opening, high, low) and volume relative to their previous day values as
features. Additionally, we incorporate the percentage change of these indicators
related to their 5-day, 10-day, and 15-day moving averages. We claim that
specialized models for each stock combined with feature extraction and data
preprocessing play crucial roles in the classifier performance.
9
changes. Stock market prediction is an important area in finance that is always in
demand. Stock prices are influenced by a wide range of factors such as economic
indicators, company performance and news. One of the key factors that can
influence stock prices is investor sentiment. Sentiment analysis is a very efficient
technique which can be used to analyze the emotions and opinions of investors
towards a particular stock, which can help predict future stock prices. In this report,
we will explore the use of sentiment analysis in stock prediction.
Sentiment analysis with the LSTM & BERT algorithm can be a valuable tool
for stock prediction. By analyzing investor sentiment, we can gain insights into
market trends and make informed investment decisions. The LSTM is an efficient
and powerful algorithm in machine learning that can combine sentiment data with
other factors to predict future stock prices with high accuracy. As the field of NLP
and machine learning continue to advance, we can expect sentiment analysis with
the LSTM algorithm to become an even more powerful tool for stock prediction in
the future. Stock prediction is a challenging task that is of great interest to investors
and financial analysts. One of the most significant factors that can influence stock
prices is investor sentiment. Sentiment analysis is a very efficient technique which
can be used to analyze the emotions and opinions of investors towards a particular
stock, which can help predict future stock prices. Sentiment analysis has emerged
as a popular technique for predicting stock prices, given the increasing availability
of data in form of text such as such as news, social media posts and articles. The
premise is that sentiment expressed in these texts can reveal information about
the market sentiment and expectations, which can be used to make predictions
about the direction or magnitude of the stock price changes.
10
1.5 Limitations of the Solution:
11
2. RELATED WORK
The text sentiment classification problem can use any text classification
algorithm such as K Nearest Neighbor algorithm, Naïve Bayes, Fisher's Discriminant
Criterion, Support Vector Machines and so on. Literature [5] proposed an improved
classification method of k Nearest Neighbor, comparing three classification
algorithms and the improved algorithm based on experimental data. Experiments
show that the improved method performs the best among the KNN classification
methods with ACCURACY of 11.5% and PRECISION of 20.3%.
14
transport capsule network model which is capable of transferring knowledge
obtained at the document level to the aspect level for classification based on the
sentiment detected in the text. The routing approach is extended to group
semantic capsules for use in a transfer learning framework.
15
analyze the weights of short texts and accurately weights the sums to finally obtain
the category output of the short texts. For the characteristics of microblog
comments, such as few words and short sentences. This study achieved good
results by improving the lexical model. Experimental results show that the
algorithm proposed in this study outperforms other algorithms in terms of aspect-
level classification accuracy and recall in the aspect-level sentiment categorization
task. From the above related work, it can be seen that deep learning is considerably
due to traditional machine learning methods in terms of accuracy in text
categorization, so we choose the deep learning method as the classification model
in this study.
17
The importance of sentiment research in predicting stock prices has grown.
The creation of several algorithms for predicting stock prices based on sentiment
analysis of news stories and social media posts has been facilitated by recent
developments in natural language processing. In this study, we test the
effectiveness of three algorithms for sentiment analysis on the daily newspaper
headlines of a particular company: Count Vectorizer, Tfidf Vectorizer, and Naive
Bayes. Previous research has looked into the use of sentiment analysis to forecast
stock values. Li and Li (2011) developed a regression model that beat a benchmark
model using sentiment analysis on data from stock message boards. Zhang et al.
(2011) found that sentiment analysis increased the precision of prediction models
when applied to financial news stories. Machine learning algorithms like Naive
Bayes, Support Vector Machines (SVM), Random Forest, and XGBoost have been
extensively used in recent years to predict stock prices using sentiment analysis.
The most successful machine learning algorithms, according to Akita and Kitagawa
(2016)'s tests on financial news articles, were Naive Bayes, SVM, and Random
Forest.[20]
Common natural language processing methods for feature extraction from text
include count Vectorizer and Tfidf Vectorizer. The Count Vectorizer algorithm turns
text documents into a feature matrix of token counts, whereas the Tfidf Vectorizer
algorithm transforms text documents into a feature matrix of term frequency-
inverse document frequency (TF-IDF) values. Overall, sentiment analysis has shown
promise for predicting stock prices, and earlier studies have shown the
effectiveness of machine learning algorithms like Count Vectorizer, Tfidf Vectorizer,
and Naive Bayes. By applying these algorithms to daily news headlines and offering
18
insights into their efficacy in sentiment analysis for stock price prediction, this
initiative seeks to advance existing research.[21]
19
3. APPROACH AND METHODOLOGY
Natural language processing (NLP) techniques are used in this project's
methodology and strategy to evaluate sentiment in texts from a variety of sources,
including news articles, social media posts, and financial reports. The research
contrasts various sentiment analysis algorithms, such as lexicon-based methods
and machine learning models. The effect of various sentiment characteristics,
including polarity and subjectivity, on the precision of stock price prediction is also
examined by the writers. The methodology of the research includes data collection
and preprocessing, model training and testing, and evaluation of performance
measures like recall, accuracy, and F1 score. The results of this research can have
significant ramifications for financial decision-making and offer information on how
well various sentiment analysis techniques can forecast stock prices.
20
3.1 Implemented Components:
• Vectorizer
• Countvectorizer
CountVectorizer is a tool for transforming text data into a table of word frequency
counts. The table has a column for each unique word in the vocabulary and a row
for each document. The cells in the table represent the number of times a particular
word appears in a given document. It is a method of representing text data in
numerical form where each word is represented by a number of occurrences in the
text. This approach builds a vocabulary from training data, and each document is
represented as a vector with values equal to the number of terms in the vocabulary.
The resulting vectors are then used as input to the Random Forest algorithm for
making predictions.
• TF-IDF Vectorizer
Tf-idf Vectorizer is another type of vectorizer that converts text data into a matrix
of term frequencies and inverse document frequencies (TF-IDF). This method is
similar to CountVectorizer, but it takes into account the importance of each word
in the document and in the corpus as a whole. Each word is represented by its term
21
frequency-inverse document frequency in this way of numerically expressing text
data. According to this technique, a word's significance in a document is assessed
based on both its rarity within the document and its frequency within the corpus
of documents. The resulting vectors are then used as input to the Random Forest
algorithm for making predictions.
• BERT
BERT algorithm that is often used in natural language processing tasks such as text
classification. It is based on the Bayes theorem, which determines the likelihood
that a document belongs to a particular class based on the likelihood that its words
occur in that class. BERT algorithm is used in text classification problems and it
performs well when the features are independent of each other. It is compatible
with numerous feature extraction methods like Count Vector and TF-IDF Vector.
In summary, there are two feature extraction methods, Count Vector and TF-IDF
Vector, that can be utilized with the LSTM algorithm. This algorithm is powerful and
22
can be applied to both classification and regression tasks. Additionally, the Naive
Bayes algorithm is a simple probabilistic classifier that can work with various
feature extraction methods, including Count Vector and TF-IDF Vector. This
algorithm applies Bayes' theorem while making strong independence assumptions.
3.2 Methodology:
Data Collection: The data collection phase of this project is the first stage. In this
instance, the information was gathered from a daily newspaper's headlines about
a specific business. In this step, historical stock prices and relevant news articles or
tweets are collected. The news articles or tweets are preprocessed by cleaning the
text, removing stop words, and stemming or lemmatizing the words. The historical
stock prices are adjusted for splits and dividends, and any missing values are
imputed.
Data Cleaning: The collected data is then cleaned to remove any unwanted
characters, punctuation marks, and stop words.[8] The data is then converted to
lowercase, and any other pre-processing techniques are applied, such as stemming
or lemmatization.
23
occurrences of each word in the document, as opposed to TfidfVectorizer, which
calculates the term frequency-inverse document frequency of each word.
Model Evaluation: Following training, the models are assessed using measures like
accuracy, precision, recall, and F1-score. The finest performing model is then
chosen for additional examination. In this step, machine learning models which
include LSTM are trained on input features to predict the magnitude or depth of
stock price changes. Models are evaluated using appropriate metrics such as
accuracy, precision, recall, F1-score and AUC-ROC. Models could also be back
tested against historical data to assess their performance in a simulated trading
environment.
Results and Interpretation: In this step, the results of the experiments are
presented and interpreted. The performance of the machine learning models is
compared to a baseline such as a random walk or a buy-and-hold strategy. The
contribution of each input feature to the prediction performance is also analyzed
using techniques such as feature importance or partial dependence plots. The
limitations and potential future directions of the approach are also discussed.
Hyper parameter Tuning: In this step, the hyper parameters of the machine
learning models are tuned using various techniques which include grid search or
random search. The hyper parameters help in controlling the complexity of the
models which might have a huge impact on working and performance. The tuning
process is typically performed using a separate validation set to prevent overfitting
24
Sentiment Analysis: The selected model is then used to predict the sentiment of
each headline in the dataset. The sentiment of a headline is determined by the
probability of the headline belonging to a positive, negative, or neutral class. In this
step, sentiment scores are calculated for each news article or tweet using
techniques such as VADER or TextBlob. The sentiment scores can be positive,
negative, or neutral, or they can be on a continuous scale. The sentiment scores
are then combined with other features, such as the stock price changes, trading
volumes, and technical indicators, to create a set of input features for the machine
learning models.
Stock Price Prediction: Finally, the sentiment of the headlines is used to predict the
stock price of the company. This is done by analyzing the correlation between the
sentiment of the headlines and the stock price movement of the company.
Data Analysis and Exploration is an important step in any data science project
as it allows for a thorough understanding of the data and identification of any
potential issues or trends. In this project, the data analysis and exploration will
focus on the stock prices and the sentiment of news articles.
To comprehend the general trend and any fluctuations, the stock prices will
first be visualised using a variety of plots, including line plots, bar plots, and
histograms. The computation of summary statistics like mean, median, and
standard deviation will also help us better understand the distribution of stock
values.
25
The sentiment of the news stories will then be examined using a variety of
methods, including sentiment analysis, text mining, and word clouds. This will make
it possible to comprehend the general tone of the news stories and any possible
relationships to the stock prices.
Additionally, the correlation between the stock prices and the sentiment of
the news articles will be analyzed. This will be done by calculating the correlation
coefficient between the two variables and visualizing the results using scatter plots.
In this project, the data will be analyzed using different algorithms such as
Random Forest Count Vector, LSTM TF-IDF Vector and BERT, thus it is important to
understand how each algorithm is performing on the data, by comparing the
accuracy, precision, recall and F1-score of each algorithm.
Groups for training and assessment will then be created from the
preprocessed data. The training data will then be subjected to the feature
extraction methods in order to produce the feature vectors that will serve as the
input for the algorithms.
The scikit-learn Python library will then be used to build the algorithm. The
training data will be used to programmed the algorithm, and the test data will be
used to make forecasts. Performance will be assessed using calculations based on
the algorithm's accuracy, precision, memory, and F1-score. Similarly, the LSTM
algorithm will also be implemented and evaluated using the same metrics.
The Count Vector and TF-IDF Vector feature extraction methods will be used
in this assignment to build the Random Forest algorithm. This will make it possible
to compare how the algorithm performed using the two distinct techniques and
determine which technique provides greater accuracy. Finally, the results of the
26
Random Forest algorithm with Count Vector and TF-IDF Vector will be compared
with the results of LSTM algorithm to see which algorithm gives a better accuracy.
27
4. IMPLEMENTATION
At first, a stock would be selected for which I would be developing the models
using sentiment analysis-based prediction and historical data-based predictive
analytics approaches. The selection of stock would be based on the News and
Tweets dataset available online concerning the stock.
4.1 Dataset:
1. For applying the sentiment analysis approach, the external factors like the
news, and people's opinions from social media sites like Twitter/X. On top of that,
a historical stock price dataset is required for model training & validation.
2. For performing the historical-based analysis, just the stock historical price
dataset is sufficient.
3. The stock price dataset can be downloaded from the Yahoo Finance portal.
News and Tweets could be downloaded from the sources like Kaggle.
A significant class of longitudinal study designs are suitable for time series
analysis as a statistical tool. These types of designs frequently use lone individuals
or research units that are tracked regularly at scheduled times over a sizable
number of evaluations. A longitudinal design might be compared to a time series
analysis as an example. Periodic analysis can be used to analyze the results of either
a controlled or accidental intervention, as well as to comprehend the fundamental
dynamical process and the trends of shifts throughout time. The ability to examine
longitudinal information collected on specific persons or units has advanced
28
significantly thanks to contemporary statistical analysis of time series along with
associated research techniques. Early time series systems mainly depended on
visualization to describe and comprehend results, particularly when applied to
psychology. The capacity to apply a sophisticated mathematical approach to this
sort of data has transformed the field of solitary subject investigation, even though
graphical tools are still helpful and continue to give significant additional insight
into the comprehension of a time series operation.
The research content of this paper is mainly to use text mining technology to
capture the social media remarks of public figures in related industries, to construct
and quantify the word vector space after tweet crawling, and to extract the specific
stock price information synchronized with its time to form a visualized comparison
chart; in the next step, to construct a network model between multiple historical
sensitive word vectors and stock price data curves, and to make a judgment on the
existing sensitive words and the corresponding stock price fluctuations to get the
best model results, in order to accurately extract the features, it is decided to use
the text pre-training model BERT proposed by Google; the prediction results, i.e.,
the proportional impact of the fluctuations of sensitive and quasi-sensitive words
on the stock price, are given on the optimal model, accordingly for the
recommendation or warning information of the stock trading.
29
structure, and the paper was published with refreshed SOTA for several natural
language processing tasks.
In order to ensure the validity and accuracy of the data, we select websites
with large traffic and recognition to engage in data acquisition, such as twitter, Sina
microblogging, China Economy Network and other well-known domestic and
international websites, and the selected time interval is 2010.1.1~2021.1.1, and the
captured public figures are: Elon Musk, Zuckerberg, Yu Chengdong, Wang Jianlin,
Wang Shi, Pan Shiyi, Buffett, and Shen Nanpeng. Using K-Means algorithm to
cluster analyze the quantified phrases, the text is divided into clusters into 3
categories, which are science and innovation, real estate and finance, accordingly,
we use Python crawler to grab the charts of the major stocks in these three sectors
during this time interval. The result of clustering can then be used to classify the
text for different features, then the non-critical information in the text is removed,
the important information is extracted, and the sentiment of the text is classified
as positive (1), negative (-1), and neutral (0).
30
4.4 Model Training
The idea of Masked Language Model (MLM) is that 15% of the word fragment
elements (tokens) are replaced (masked) in order to learn the representation of
the word. The word is predicted through context. For a word that is selected to
participate in mask, it is not always replaced every time (otherwise it would
produce words that the model has never seen before). Due to the above
mechanism, only 15% of the words are predicted at a time, so the model converges
slowly. This session is designed to serve tasks such as question and answer,
inference, and utterance topic relations. To generate the training data, from the
corpus, in 50% of the cases, two consecutive sentences are taken and labeled as
'IsNext'; in the other 50% of the cases, two arbitrary words are randomly taken and
labeled as 'NotNext '.
1. At first, we need to go through the steps to clean textual data and prepare the
target variable. This is to ensure, only the relevant details are extracted that are
necessary for my model training. Below steps are part of it:
a. Tokenization.
b. Stop word removal.
c. Normalization.
d. Punctuation removal.
e. Lemmatization.
f. Apply any other pre-processing steps if required.
31
2. Split the pre-processed dataset based on the sentiment score which is already
extracted from the dataset, to understand whether the given text is on a positive
or negative note.
4. Finally, create an LSTM model, which performs the stock price prediction, that
accepts the sentiment analysis results from the model developed in the previous
step and correlates the historical stock price data for the prediction of the stock
price in the future.
5. This approach depends on the hypothesis "The value of the stock varies based
on the public opinion and any major news about the product". For example, if
Apple launches a new model and it's attracting users a lot, then we have a high
chance, that Apple stock would go high.
2. The model doesn't use external data like news/tweets to find the
correlation between the pattern. This approach is purely based on the historical
data.
32
4.4 Application Interface & Obtained Results:
33
Figure 4: Prediction using the Sentiment Analysis Approach
The pre-training of BERT is costly in terms of time cost and hardware cost,
and we choose open-source pre-training models and codes to fine-tune on our
sample data. Before the training process of the network, each day is taken as a unit,
and the outputs of all samples of the day are averaged as the sentiment index of
the day. The output of the network ranges from -1 to 1. The sentiment indices of
all days are counted and converted into a "Sentiment Index - Time" graph. The
maximum length of the text was 64. Finally, the model was trained using the mean
square error as the loss function and Adam as the optimizer, with each input batch
of size 16; the learning rate was 2×10-4, and 300 epochs were iterated, with the
structure of the network saved every 10 epochs.
34
5. CONCLUSION
This paper proposed an approach for predicting stock price movements using
technical indicators based on the percentage change of prices, volume, and related
moving averages. It con-siders each stock as a distinct dataset and trains a
specialized classifier for each one. We compare the proposed procedure mainly
with state-of-the-art deep learning techniques.
6. FUTURE WORK
As future work, we propose to explore a hybrid approach that leverages an
ensemble of classifiers. Such approach could combine specialized models for
individual stocks with generic models trained on all stock data. In the future phase
of the implementation, the model would be developed with different algorithms
and compare the performance of the model with the one that has been developed
now. This is required as it is essential to ensure that a more optimal algorithm is
used in the stock prediction. On top of that, as of now, the sentiment analysis is
done based on the model which has been trained with limited data, in the future
phase, the sentiment analysis model could be developed as a hybrid one, which
makes use of other pre-trained algorithms as well and take the collective result.
The sentiment analysis layer is important for the stock prediction as the data is
plotted based on the result that has been obtained from the sentiment analysis
model.
36
7. REFERENCES
o Shuchi He, Zhongyue Chen and Xiaoping Chen, "A Position-Sensitive Regression Network for Multi-
Oriented Scene Text Detection", 2021 IEEE 4th International Conference on Computer and
Communication Engineering Technology (CCET), 2021.
o Zhenxuan Zhang, Yuanyuan Li, Sang Won Yoon and Daehan Won, "Chapter 6 Reflow Thermal Recipe
Segment Optimization Model Based on Artificial Neural Network Approach", Springer Science and
Business Media LLC, 2023.
o G Wang and S Y Shin, "An improved text classification method for sentiment classification[J]", Journal
of information and communication convergence engineering, vol. 17, no. 1, pp. 41-48, 2019.
o S Wang, D Li, X Song et al., "A feature selection method based on improved fisher’s discriminant ratio
for text sentiment classification[J]", Expert Systems with Applications, vol. 38, no. 7, pp. 8696-8702,
2011.
o A Onan, S Korukoğlu and H Bulut, "A multiobjective weighted voting ensemble classifier based on
differential evolution algorithm for text sentiment classification[J]", Expert Systems with Applications,
vol. 62, pp. 1-16, 2016.
o Y Li, J Wang, S Wang et al., "Local dense mixed region cutting+ global rebalancing: a method for
imbalanced text sentiment classification[J]", International journal of machine learning and
cybernetics, vol. 10, no. 7, pp. 1805-1820, 2019.
o W Li, P Liu, Q Zhang et al., "An improved approach for text sentiment classification based on a deep
neural network via a sentiment attention mechanism[J]", Future Internet, vol. 11, no. 4, pp. 96, 2019.
37
o Z Hameed and B Garcia-Zapirain, "Sentiment classification using a single-layered BiLSTM model[J]",
IEEE Access, vol. 8, pp. 73992-74001, 2020.
o Y Du, X Zhao, M He et al., "A novel capsule-based hybrid neural network for sentiment
classification[J]", IEEE Access, vol. 7, pp. 39321-39328, 2019.
o W Li, F Qi, M Tang et al., "Bidirectional LSTM with self-attention mechanism and multi-channel
features for sentiment classification[J]", Neurocomputing, vol. 387, pp. 63-77, 2020.
o S. Chen and H. He, "Stock prediction using convolutional neural network", IOP Conference Series:
Materials Science and Engineering, vol. 435, no. 1, pp. 1-9, 2018.
o J. Bollen, H. Mao and X. Zeng, "Twitter mood predicts the stock market", Journal of Computational
Science, vol. 2, no. 1, pp. 1-8, 2011.
o N. Naik and B. R. Mohan, "Novel Stock Crisis Prediction Technique—A Study on Indian Stock Market",
IEEE Access, vol. 9, pp. 86230-86242, 2021.
o X. Ding et al., "Using Structured Events to Predict Stock Price Movement: An Empirical Investigation",
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),
pp. 14015-1425, 2014.
o Xiaodong Li, Pangjing Wu and Wenpeng Wang, "Incorporating stock prices and news sentiments for
stock market prediction: A case of Hong Kong", Information Processing & Management, vol. 57, no.
5, 2020, [online] Available: https://doi.org/10.1016/j.ipm.2020.102212.
o Sentiment Analysis of News Headlines For Stock Trend Prediction Gupta O, pp. 13, 2020.
38