Stock Price Prediction Based On Stock Big Data and Pattern Graph Analysis

Stock Price Prediction based on Stock Big Data and Pattern Graph
Analysis
Seungwoo Jeon1 , Bonghee Hong1 , Juhyeong Kim1 and Hyun-jik Lee2

1 Dept. of Electrical and Computer Engineering, Pusan National University, Busan, South Korea
2 Division of Chronic disease control, Korea centers for disease control & prevention, Cheongju, South Korea
Keywords: Stock Price Prediction, Hierarchical Clustering, Pattern Matching, Feature Selection, Artificial Neural
Network.
Abstract: Stock price prediction is extremely difficult owing to irregularity in stock prices. Because stock price some-
times shows similar patterns and is determined by a variety of factors, we present a novel concept of finding
similar patterns in historical stock data for high-accuracy daily stock price prediction with potential rules for
simultaneously selecting the main factors that have a significant effect on the stock price. Our objective is
to propose a new complex methodology that finds the optimal historical dataset with similar patterns accord-
ing to various algorithms for each stock item and provides a more accurate prediction of daily stock price.
First, we use hierarchical clustering to easily find similar patterns in the layer adjacent to the current pattern
according to the hierarchical structure. Second, we select the determinants that are most influenced by the
stock price using feature selection. Moreover, we generate an artificial neural network model that provides
numerous opportunities for predicting the best stock price. Finally, to verify the validity of our model, we use
the root mean square error (RMSE) as a measure of prediction accuracy. The forecasting results show that the
proposed model can achieve high prediction accuracy for each stock by using this measure.
1 INTRODUCTION dictions even in the case of complex relationships of

variables, while an autoregressive integrated moving
Stock price provided by Koscom consists of thirty- average (ARIMA) model (Pai and Lin, 2005; Wang
two items (four groups: domestic buying, domestic and Leu, 1996) has been used to identify and predict
selling, foreign buying, and foreign selling) such as time series variation. On the other hand, several pre-
domestic selling high price, foreign selling opening diction studies are based on word analysis of news
price, and domestic buying completion amount. Even articles (Mittermayer, 2004; Nikfarjam et al., 2010;
if stock prices have the same value, their inside com- Kim et al., 2014).
binations may be different. For example, domestic These studies predict daily stock prices using the
selling high price may show a downturn and domes- daily closing price, which is not sufficient to make
tic buying completion amount may show an upturn predictions in a short period of time (e.g., 1 hour and
or vice versa. Because the items are highly variable, 30 minutes). Moreover, even though these studies
the objective is to predict the next stock price pattern have analyzed the significance of variables and in-
graph using these items, which would be very useful. creased the prediction accuracy by eliminating unim-
Stock market analysis and prediction are being portant variables, the error rates of the predictions are
studied using various methods such as machine learn- high owing to the use of outliers.
ing and text mining. Data mining studies use daily Stock price consists of several patterns such as
stock data. For example, prediction studies based on consolidation, cup with handle, double bottom, and
support vector machines (SVMs) (Cao and Tay, 2001; saucer, as shown in Figure 1. Because these patterns
Ince and Trafalis, 2007) have been conducted to de- appear repeatedly at time intervals, if we find a par-
termine whether the new pattern data belongs to a allel pattern to the current pattern, we will be able to
certain pattern category. In addition, artificial neu- predict the following pattern.
ral networks (ANNs) (Kimoto et al., 1990; Kohara By focusing on this point, in this paper, we pro-
et al., 1997) have been employed to achieve good pre- pose a new method for generating stock price pre-
223
Jeon, S., Hong, B., Kim, J. and Lee, H-j.
Stock Price Prediction based on Stock Big Data and Pattern Graph Analysis.
DOI: 10.5220/0005876102230231
In Proceedings of the International Conference on Internet of Things and Big Data (IoTBD 2016), pages 223-231
ISBN: 978-989-758-183-0
Copyright c 2016 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
IoTBD 2016 - International Conference on Internet of Things and Big Data
sion for significant/insignificant variable distinc-

tion with real tick-by-tick stock data.
• We evaluate the proposed model using RMSE,
which is widely used in stock price forecasts; low
RMSE implies high prediction accuracy.
• To generate the predicted stock price automati-
cally, we build a new system based on big data
(a) Consolidation pattern. (b) Cup with handle pattern. processing open-source tools such as Hadoop and
R.
The remainder of this paper is organized as fol-
lows. Section 2 reviews various existing studies on
stock price forecasting. Section 3 presents the speci-
fication of stock data. Sections 4 and 5 describe our
new complex methodology and system architecture
for handling the overall processes. Section 6 presents
(c) Double bottom. (d) Saucer. our evaluation results. Finally, Section 7 summarizes
Figure 1: Various stock patterns (Bulkowski, 2011). our findings and concludes the paper with a brief dis-
cussion on the scope for future work.
diction based on historical stock big data. First, un-
like existing studies that mostly use closing price data,
the present study uses tick-by-tick data for short-term 2 RELATED WORK
prediction and aggregates them to transform non-
continuous data into continuous data. Then, we make In this section, we introduce some related studies on
some patterns similar to the current pattern by using a various methods such as ANN, feature selection, and
hierarchical clustering algorithm and select important text mining for stock price prediction. ANN was the
features affecting the stock price by using stepwise most widely used method a few decades ago. Ini-
regression. Finally, we generate an ANN using the tially, it was used by itself, and later, attempts were
data to be completed, similar patterns, and selected gradually made to combine it with other techniques in
features as input data for high prediction accuracy order to achieve higher prediction accuracy. In (Ki-
through learning in order to derive the best results. moto et al., 1990), a buying and selling timing pre-
Thus, we propose a prediction system based on diction system was proposed using economic indexes
big data processing (Hadoop, Hive, RHive) and anal- (foreign exchange rates) and technical indexes (vec-
ysis (R) tools for next stock price prediction. The tor curves) from the Tokyo Stock Exchange Prices In-
system consists of four connected computers and indexes (TOPIX). In another study, an echo state net-
cludes five steps. The first step is a preprocessing work was used as a novel recurrent neural network to
step for transforming tick-by-tick data into aggregated forecast the next closing price (Lin et al., 2009).
data at five-minute intervals in order to facilitate the The following method involves feature selection
prediction and make daily patterns with five-minute for selecting significant input attributes, and it is
generation units using Hadoop and RHive query. The based on other methods that have been widely used
second step is to find all similar patterns for one year in recent years. In (Huang and Tsai, 2009), a com-
using hierarchical clustering provided by the R func- bination of support vector regression (SVR) with a
tion. Then, the system repeatedly removes insignif- self-organizing feature map (SOFM) and feature se-
icant variables through stepwise regression on the R lection based on filtering was proposed for predicting
function. Next, the system uses an ANN to generate the next day’s price index using Taiwan index futures
the final prediction model according to numerous sim- (FITX). Important features were selected using the R-
ulations. Finally, we verify the validity of the model squared value as input data for SVR. In (Lee, 2009),
using root mean square error (RMSE) as a measure of a prediction model was developed on the basis of an
prediction accuracy. SVM with a hybrid feature selection method for find-
The main contributions of this paper can be sum- ing the original input features, using NASDAQ index
marized as follows. direction. In contrast to the above-mentioned study,
• We generate a model for predicting stock prices the f-score was used as a selection factor.
by applying an ANN through hierarchical clus- However, most of these studies have some limita-
tering for pattern searching and stepwise regres- tions for short-term prediction. First, given all histor-
224
Stock Price Prediction based on Stock Big Data and Pattern Graph Analysis
Table 1: Example of stock raw data. input data to the generation of the prediction model
Attribute Value from the perspective of data analysis and processing.
Date
20140813090024 4.1 Aggregation of Stock Data
(yyyymmddhhmmsss)
Type 0
Completion price (won) 77,500 Because the tick-by-tick data we have are the data
Completion amount 37 generated per transaction, the completion price at the
Opening price (won) 78,900 time is zero if the transaction is not carried out, as
High price (won) 78,900 shown in Figure 2 (a). In other words, as the data
Low price (won) 76,600 is non-continuous data, it is difficult to predict the
Price just before (won) 77,400 price. Consequently, we generate aggregated data at
Accumulated completion five-minute intervals to obtain a continuous flow of
475,021 data, as shown in Figure 2 (b).
amount
Accumulated completion
36,770,000,000
price (won)
ical stock data as input data, because the next closing

price is predicted without removing the outliers, the
error rate is high. Second, although the total comple-
tion price is determined by a variety of factors such as
the foreign purchase closing price and domestic sell-
ing completion amount, it is insufficient to consider
(a) Completion price per transaction.
such factors. In other words, it is necessary to con-
sider a combination of significant factors.
3 DATA SPECIFICATION
In this study, stock data that was gathered over twelve
consecutive months (August 2014 to July 2015) from
the Korea Composite Stock Price Index (KOSPI) was (b) Completion price after aggregation.
used as the input . Figure 2: The need for aggregation.
The stock data was provided by Koscom. A data
sample is listed in Table 1; it consists of the date,
type, completion price, completion amount, opening 4.2 Searching for Similar Patterns
price, high price, low price, price just before, accu-
mulated completion amount, and accumulated com- Above all, it is necessary to make patterns from ag-
pletion price. Because there are four types (domestic gregated data for searching similar patterns. Figure 3
purchase price (0), domestic selling price (1), foreign shows the processes of patterning the aggregated data.
purchase price (2), and foreign selling price (3)), the The length of a pattern is one day and patterns are
stock price is the sum of thirty-two items. The size of generated at five-minute intervals, e.g., by the sliding
each data set was 168 GB and the data was collected window method, for pattern matching analysis using
during the one-year period from August 2014 to July various patterns. The number of patterns for one hour
2015. will be twelve.
Figure 4 shows similar patterns in the graph of
real stock price. The similar patterns can be found
by comparing historical patterns and the current pat-
4 OUTLINE OF PROPOSED tern. There are various methods for pattern match-
MODEL ing. We use a hierarchical clustering algorithm that
can find similar patterns quickly and simultaneously.
In this section, we describe the overall process, from The patterns are structured by hierarchical clustering
data preprocessing for making continuous data, the and similar patterns are neighbor or sibling nodes of
search for similar pattern data, and the selection of the current pattern. If there are only a limited number
225
Figure 3: Method of patterning the aggregated data.
of similar patterns, it is possible to extend the range

of the similar patterns.
Figure 5: Similar patterns defined through hierarchical

structure.
with the remainder.

• Select the final variable association with the high-
est value of R-square as the explanatory power of
the regression model.
Figure 4: Similar stock patterns. In this work, we consider the total completion
price as a dependent variable and thirty-two variables
Figure 5 shows a hierarchical structure based on as independent variables in the regression analysis,
the clustering algorithm; the numbers denote patterns. which is provided as two functions in R as shown be-
Given 1 as the current pattern, 4 and 15 are similar low. We use the lm function to fit a linear model,
patterns in the initial range because of the neighbor where y is a dependent variable and x1 –x32 are inde-
and sibling nodes. If we do not get satisfactory results pendent variables.
in the next steps, the range is extended and the number fit <- lm(y˜x1+x2+x3+...+x32, data=stock_data)
of similar patterns is eventually increased from 2 to
12. After fitting, we use the step function for deter-
mining the final independent variables; the first factor
4.3 Feature Selection According to represents the linear model and the second factor de-
termines the direction of the stepwise process, which
Stepwise Regression Analysis combines the forward and backward approaches. A
total of eight variables are removed after applying
Although there are significant variables affecting the stepwise regression, as can be seen in Table 2.
stock price among the thirty-two variables, some vari-
ables do not have a major effect on the price. To dis- bidirectional <- step(fit, direction="both")
tinguish these variables, we adopt feature selection,
which is performed using a bidirectional elimination 4.4 Prediction Model Generation using
approach in stepwise regression, as a combination of Artificial Neural Network
forward and backward approaches. Each step reviews
whether already selected variables are removed owing After feature selection, to generate the predicted stock
to a new important variable, while the new variable is data, we use an ANN algorithm, which is widely used
selected one by one. The procedure is conducted as in stock price forecasts (Kuo et al., 2001; Kim and
follows. Han, 2000; Cao et al., 2005). Moreover, it has the ad-
• Repeatedly add and remove a variable among all vantage of high prediction accuracy through learning
the variables; then conduct regression analysis by iterative adjustment. The learning is performed in
226
Table 2: Results of stepwise regression in real stock data of Hyundai Motor Company.
Domestic purchase price Domestic selling price Foreign purchase price Foreign selling price
Name Choice Name Choice Name Choice Name Choice
Completion Completion Completion Completion
O O O O
price price price price
Completion Completion Completion Completion
O O O O
amount amount amount amount
Opening price X Opening price X Opening price O Opening price O
High price O High price O High price O High price O
Low price O Low price O Low price O Low price O
Price Price Price Price
O X O O
just before just before just before just before
Accumulated Accumulated Accumulated Accumulated
completion X completion O completion O completion O
amount amount amount amount
Accumulated Accumulated Accumulated Accumulated
completion X completion X completion X completion X
price price price price
Table 3: Explanatory powers according to hidden layers. 5.1 Series of Operations for Generating
Hidden Hidden Hidden Hidden Hidden Predicted Stock Data
layer 1 layer 2 layer 3 layer 4 layer 5
37.6% 95.5% 95.9% 94.2% 95.3% We propose the following steps to generate a predic-
tion model for big data processing and analysis tools,
one or more hidden layers. The learning rate increases as shown in Figure 6.
with the number of hidden layers. However, the con- Step 1 (Stock Data Aggregation and Pattern
nection point between input and output could be lost Generation as Data Preprocessing): We stored the
if there are too many hidden layers, and the learning one-year stock data provided by Koscom in Hadoop
could be disturbed (Dominic et al., 1991). distributed file systems (HDFSs) of the Hadoop-based
We employed five hidden layers to ensure that the cluster. Because we could not manually modify the
system can bear the processing load and created the fi- source code of MapReduce for extracting the desired
nal model with a hidden number that shows the high- data from each HDFS of the Hadoop cluster, we used
est explanatory power (R-squared value) by perform- the RHive tool to provide HiveQL, which facilitates
ing learning in sequence from hidden layer 1 to hid- the search for the desired data, e.g., through select
den layer 5 for each stock item. Table 3 summarizes query of RDBMS. After the data was extracted, it was
the explanatory power in each hidden layer, and the aggregated at five-minute intervals by using R based
layer with the highest value is layer 3. on the tick-by-tick data. Then, patterns were gener-
ated from them because of concatenation of similar
patterns in R of the master computer. The size of a
pattern was one day and the generation unit was five
5 SYSTEM ARCHITECTURE FOR minutes. The total number of patterns was 17,323.
STOCK PRICE PREDICTION Step 2 (Pattern Matching with Hierarchical
Clustering): To retrieve similar patterns with the cur-
This section describes the series of operations that rent pattern, we used the hclust function in R, which
were implemented when generating the final artificial offers two advantages: it can quickly autodetect sim-
neural network model. All the processes were con- ilar patterns and freely determine the range of similar
ducted on a cluster consisting of four connected com- patterns simultaneously. Algorithm 1 describes the
puters (one master and three slaves) with Hadoop and procedure for finding similar patterns. After insert-
RHive installed. ing the current pattern into the aggregated patterns as
a historical dataset, clustered patterns were generated
via the hclust function. Then, similar patterns of the
same level as the current pattern could be found.
Step 3 (Feature Selection using Stepwise Re-
227
Figure 6: Dependent and independent variables should be defined in stepwise regression analysis.
Algorithm 1: Algorithm for pattern matching. regression. Before selecting the variables, the time of
input : Aggregated patterns is a list of similar patterns was determined, and then, variables
aggregated data, current pattern at the time were brought. Variables with p value be-
represents the current pattern low a specified threshold were judged as significant
output: similar patterns is a list of similar variables.
patterns after clustering
Algorithm 2: Algorithm for feature selection in step-
1 int last = Aggregated patterns.length()-1;
wise regression.
2 List<Integer> level = new
ArrayList<Integer>(); input : similar patterns represents a list of
3 int level = 0; foreach count in similar patterns, variables represents a
Aggregated patterns.length() do list of all variables constituting the
4 Aggregated patterns[last][count] = price
current pattern[count]; output: remainder is a list of variables
excluding the insignificant variables
5 run(’sink()’);
6 run(’hc< −hclust(dist(Aggregated patterns), 1 boolean f lag = false;
method=’ave’)’); 2 variables =
7 run(’sink(’out.txt’)’); getVariables(similar patterns.atTime());
8 List result patterns = Read File(’out.txt’); 3 while f lag == false do
9 foreach index in result patterns.length() do 4 remainder = run(’step(variables,
10 if result patterns[index] == direction=’both’)’);
current pattern then 5 foreach i in remainder.length() do
11 foreach level in result patterns do 6 if remainder[i].p value > 0.05 then
12 similar patterns = find SP(index); 7 break;
8 else
13 return similar patterns; 9 f lag = true;
10 return remainder;
gression): Given several similar patterns of stock
price, insignificant variables among all the variables
constituting the price were removed. Algorithm 2 de-
Step 4 (Predicted Data Generation on Artificial
scribes the steps for feature selection using stepwise
Neural Network): To create the predicted data, we
228
used an ANN after feature selection. Algorithm 3 6.1 Dataset and Test Scenario
describes the steps for generating the predicted data
using an ANN. Among the input data, we prepared To prove the effectiveness of the proposed model, we
dependent and independent variables as training data used a real historical stock dataset consisting of var-
with another time zone because we would predict the ious items for the one-year period from August 2014
next day of the current pattern. Specifically, given to July 2015. To measure the prediction accuracy, we
historical time of similar pattern ht, the time of the prepared three items (Hyundai Motor Company, KIA
dependent variable is ht + 1 and the time of the inde- Motors, and Samsung Electronics) as companies rep-
pendent variable is ht. After the independent and de- resenting the Republic of Korea, with their stock data
pendent variables were bound, we generated an ANN- for August 1, 2014, to July 28, 2015, as the training
based model using the neuralnet function provided by data, and their stock data for July 29–31, 2015, as the
R. Then, the independent variables at the current time test data. As a test scenario, first, two predicted stock
t in the model were input and the predicted data were data for one day were generated according to the pro-
generated. posed model and feature selection. Then, we checked
the prediction accuracy by using the RMSE values to
Algorithm 3: Algorithm for generation of predicted compare the predicted and real stock data.
data.
input : tr dependent represents the total 6.2 Evaluation of Prediction Accuracy
completion price at historical time
ht + 1, tr independent represents the We performed experiments to compute the accuracy
remaining variables excluding the total of the proposed method. Figure 7 compares the actual
completion price at historical time ht, data and two data values predicted by the proposed
te dependent represents the remaining model with only feature selection for July 31, 2015.
variables excluding the total The x-axis represents the time at five-minute intervals
completion price at current time t and the y-axis represents the total completion price,
output: predicted is a dataset generated by i.e., stock price according to the time. First, Figure 7
ANN (a) compares the results of Hyundai Motor Company
1 run(’training < − stock; we can see that the stock movement change
cbind(tr dependent,tr independent)’); of the proposed model is closer to the real stock data
2 run(’colnames(training) < − than that of only feature selection. In particular, this
c(’output’,’input’)’); can be an especially clear view of the rising curve of
3 run(’ANN result < − neuralnet(output input, the morning and the declining curve of the afternoon.
training, hidden=1∼5,act.fct=’tanh’)’); Figure 7 (b) shows the stock data derived from the
4 run(predicted < − ’prediction(ANN result, real and predicted data for KIA Motors. In contrast to
te dependent)’); Figure 7 (a), there are slight differences between the
5 return predicted; stock movement change of the proposed model and
the real data, whereas there is no clear view of the
rising and declining curve in the graph. Lastly, Fig-
Step 5 (Verification using RMSE): To verify the ure 7 (c) depicts the stock data derived from the real
validity of the proposed model, we selected RMSE and predicted data for Samsung Electronics. As com-
as a measure of prediction accuracy; the function was pared with only the feature selection graph, the stock
also provided in R. The measure was computed from movement change of the proposed model is similarly
comparisons between real and predicted data. drawn to the real data despite a slight difference in
price.
In this study, we selected RMSE as a measure of
6 EVALUATION prediction in order to verify the validity of our model
because this measure is frequently used in the stock
In this section, we describe the one-year test data pro- domain. Figure 8 shows the experimental results of
vided by Koscom and evaluate the accuracy of each the proposed model and only feature selection us-
stock item by computing the RMSE. ing RMSE. In Figure 8 (a) and (b), we can see that
there are good predictions except on July 30, when
the interesting aspect is the same item. For this rea-
son, we can estimate that there are variables affect-
ing the same theme, not variables that affect individ-
229
(a) Comparison results for Hyundai Motor Company stock.

(a) Comparison results for Hyundai Motor Company
stock.
(b) Comparison results for KIA Motors stock.
(b) Comparison results for KIA Motors stock.
(c) Comparison results for Samsung Electronics stock.

Figure 7: Comparison results according to the proposed
model.
ual stocks; it is necessary to make up for this point. (c) Comparison results for Samsung Electronics stock.
Unlike Figure 8 (a) and (b), Figure 8 (c) shows good
Figure 8: RMSE results.
prediction for all days. In particular, it shows good
predictions on the last day in all the graphs. work based on Hadoop and R. Finally, we demon-
strated the prediction accuracy for three stock items
using RMSE.
7 CONCLUSIONS In the future, we plan to enhance the reliability
of our model by further investigating big and small
pattern matching and analysis. In addition, we will
In this paper, we determined that stock prices sparsely develop a distributed parallel algorithm and predict all
show similar patterns and that not all the variables the stock items instead of only some of them.
have a significant impact on the price. For short-
term prediction, we proposed a novel method based
on a combination of hierarchical clustering, stepwise
regression, and ANN model in order to find similar ACKNOWLEDGEMENTS
historical patterns for each stock item and predict the
daily stock price using optimal significant variables This work was supported by the Research Program
through feature selection. Moreover, we dealt with funded by the Korea Centers for Disease Control and
the overall process using a big data processing frame- Prevention(fund code#2015-E33016-00).
230
REFERENCES 2010 The 2nd International Conference on, volume 4,

pages 256–260. IEEE.
Bulkowski, T. N. (2011). Encyclopedia of chart patterns, Pai, P.-F. and Lin, C.-S. (2005). A hybrid arima and sup-
volume 225. John Wiley & Sons. port vector machines model in stock price forecasting.
Cao, L. and Tay, F. E. (2001). Financial forecasting using Omega, 33(6):497–505.
support vector machines. Neural Computing & Appli- Wang, J.-H. and Leu, J.-Y. (1996). Stock market trend pre-
cations, 10(2):184–192. diction using arima-based neural networks. In Neural
Cao, Q., Leggio, K. B., and Schniederjans, M. J. (2005). Networks, 1996., IEEE International Conference on,
A comparison between fama and french’s model and volume 4, pages 2160–2165. IEEE.
artificial neural networks in predicting the chinese
stock market. Computers & Operations Research,
32(10):2499–2512.
Dominic, S., Das, R., Whitley, D., and Anderson, C. (1991).
Genetic reinforcement learning for neural networks.
In Neural Networks, 1991., IJCNN-91-Seattle Inter-
national Joint Conference on, volume 2, pages 71–76.
IEEE.
Huang, C.-L. and Tsai, C.-Y. (2009). A hybrid sofm-svr
with a filter-based feature selection for stock mar-
ket forecasting. Expert Systems with Applications,
36(2):1529–1539.
Ince, H. and Trafalis, T. B. (2007). Kernel principal com-
ponent analysis and support vector machines for stock
price prediction. IIE Transactions, 39(6):629–637.
Kim, K.-j. and Han, I. (2000). Genetic algorithms approach
to feature discretization in artificial neural networks
for the prediction of stock price index. Expert systems
with Applications, 19(2):125–132.
Kim, Y., Jeong, S. R., and Ghani, I. (2014). Text opinion
mining to analyze news for stock market prediction.
Int. J. Advance. Soft Comput. Appl, 6(1).
Kimoto, T., Asakawa, K., Yoda, M., and Takeoka, M.
(1990). Stock market prediction system with modu-
lar neural networks. In Neural Networks, 1990., 1990
IJCNN International Joint Conference on, pages 1–6.
IEEE.
Kohara, K., Ishikawa, T., Fukuhara, Y., and Nakamura, Y.
(1997). Stock price prediction using prior knowledge
and neural networks. Intelligent systems in account-
ing, finance and management, 6(1):11–22.
Kuo, R. J., Chen, C., and Hwang, Y. (2001). An intelligent
stock trading decision support system through integra-
tion of genetic algorithm based fuzzy neural network
and artificial neural network. Fuzzy sets and systems,
118(1):21–45.
Lee, M.-C. (2009). Using support vector machine with a hy-
brid feature selection method to the stock trend predic-
tion. Expert Systems with Applications, 36(8):10896–
10904.
Lin, X., Yang, Z., and Song, Y. (2009). Short-term stock
price prediction based on echo state networks. Expert
systems with applications, 36(3):7313–7317.
Mittermayer, M.-A. (2004). Forecasting intraday stock
price trends with text mining techniques. In Sys-
tem Sciences, 2004. Proceedings of the 37th Annual
Hawaii International Conference on, pages 10–pp.
IEEE.
Nikfarjam, A., Emadzadeh, E., and Muthaiyah, S. (2010).
Text mining approaches for stock market prediction.
In Computer and Automation Engineering (ICCAE),
231

Stock Price Prediction Based On Stock Big Data and Pattern Graph Analysis

Uploaded by

Copyright:

Available Formats

Stock Price Prediction Based On Stock Big Data and Pattern Graph Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stock Price Prediction Based On Stock Big Data and Pattern Graph Analysis

Uploaded by

Copyright:

Available Formats

Stock Price Prediction based on Stock Big Data and Pattern Graph

Seungwoo Jeon1 , Bonghee Hong1 , Juhyeong Kim1 and Hyun-jik Lee2

1 INTRODUCTION dictions even in the case of complex relationships of

sion for significant/insignificant variable distinc-

ical stock data as input data, because the next closing

Figure 3: Method of patterning the aggregated data.

of similar patterns, it is possible to extend the range

Figure 5: Similar patterns defined through hierarchical

with the remainder.

(a) Comparison results for Hyundai Motor Company stock.

(b) Comparison results for KIA Motors stock.

(b) Comparison results for KIA Motors stock.

(c) Comparison results for Samsung Electronics stock.

REFERENCES 2010 The 2nd International Conference on, volume 4,

You might also like