(IJCST-V3I1P21) : S. Padmapriya
(IJCST-V3I1P21) : S. Padmapriya
(IJCST-V3I1P21) : S. Padmapriya
RESEARCH ARTICLE
OPEN ACCESS
ABSTRACT
Data mining finds valuable information hidden in large volumes of data that need to be turned into useful information. It is
considered to deal with huge amounts of data which are kept in the database.Data mining is the analysis of data and the use of
software techniques for finding hidden patterns and regularities in sets of data Knowledge discovery from the large data set
becomes difficult. The increase in demand of finding pattern from huge data is improved by means of data mining algorithms
and techniques.Researchers presented a lot of approaches and algorithms for determining patterns. This paper presented various
data mining algorithms and mining methods to discover valuable patterns from the hidden information.
Keywords:-Data mining, Knowledge Discovery.
I.
INTRODUCTION
Data mining is an emerging trends.The information age has
variables,
numerical
and
categorical.A
numerical
or
say that one day is twice as hot as another day. On the other
hand, data on a ratio scale has true zero and can be added,
personal data.
is
information
of
to get familiar with the data, to discover first insights into the
ISSN: 2347-8578
typically
the
results
www.ijcstjournal.org
Page 105
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015
data and have a good understanding of any possible data
response
and missing values. Analyzing data that has not been carefully
c. Logistic regression:
variables).Neural
nets
too
can
create
both
Dataset:
sets are classified into two types test data and training data.
a.Classification
Classification
aim
to
identify
the
standard
statistical
techniques
such
as
linear
variable.
Algorithm
There are different types of neural networks, but they
sales volumes, stock prices, and product failure rates are all
very difficult to predict because they may depend on complex
interactions of multiple predictor variables. Therefore, more
complex techniques (e.g., logistic regression, decision trees, or
neural nets) may be necessary to forecast future values.The
same model types can often be used for both regression and
ISSN: 2347-8578
www.ijcstjournal.org
Page 106
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015
weighted sum of its inputs. The new calculated values then
become the new input values that feed the next layer. This
process continues until it has gone through all the layers and
determines the output. A threshold transfer function is
sometimes used to quantify the output of a neuron in the
output layer. Feed-forward networks include Perceptron
(linear and non-linear) and Radial Basis Function networks.
Feed-forward networks are often used in data mining.
A feed-back network has feed-back paths meaning they can
have signals traveling in both directions using loops. All
possible connections between neurons are allowed. Since
a.Genetic algorithms
Genetic algorithms are not used to find patterns per
interconnected factors.
Machine
(SVM)
is
ISSN: 2347-8578
www.ijcstjournal.org
Page 107
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015
activation flows through the network, through hidden layers,
until it reaches the output nodes. The output nodes then reflect
the people into three clusters, when k=2 would provide a more
clustered based upon home state and you called the k-means
be effective.
assigns the new case to the same class to which most of its
neighbors belong.
e.Bayesian Algorithms
Bayesian approches are a fundamentally important DM
ISSN: 2347-8578
www.ijcstjournal.org
Page 108
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015
calculate the posterior from the prior and the likelihood,
set much better than it fits the test set, over fitting is probably
the cause.
Cross-Validation
Time series
Time series forecasting predicts unknown future
P(C,D)=P(C|D)P(D)=P(D|C)P(C)
P(C|D)=P(D|C)P(C)
P(D)
V. MODEL EVALUATION
VI. CONCLUSION
data sets.
model performance.
REFERENCES
Hold-Out:
[1] http://www.twocrows.com/intro-dm.pdf
In this method, the mostly large dataset is randomly divided to
three subsets: Training set is a subset of the dataset used to
build predictive models. Validation set is a subset of the
dataset used to assess the performance of model built in the
training phase. It provides a test platform for fine tuning
model's parameters and selecting the best-performing model.
Younus,Dr.Ahmad
Farooq,Fahmida
A.Alhamed,Khazi
BegumData
Mining
ISSN: 2347-8578
2012.
www.ijcstjournal.org
Page 109