0% found this document useful (0 votes)
3 views6 pages

Classification Algorithm in Data Mining

This paper provides an overview of classification algorithms in data mining, highlighting techniques such as decision trees, k-nearest neighbors, support vector machines, and neural networks. It discusses the classification process, which involves supervised learning, pattern identification, and deployment, and emphasizes the importance of selecting appropriate algorithms based on data characteristics. The authors aim to review various classification methods to aid in the understanding and application of data mining techniques.

Uploaded by

wabi Jifara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

Classification Algorithm in Data Mining

This paper provides an overview of classification algorithms in data mining, highlighting techniques such as decision trees, k-nearest neighbors, support vector machines, and neural networks. It discusses the classification process, which involves supervised learning, pattern identification, and deployment, and emphasizes the importance of selecting appropriate algorithms based on data characteristics. The authors aim to review various classification methods to aid in the understanding and application of data mining techniques.

Uploaded by

wabi Jifara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of P2P Network Trends and Technology (IJPTT) – Volume 4 Issue 8- Sep 2013

Classification algorithm in Data mining: An


Overview
S.Neelamegam#1, Dr.E.Ramaraj*2
#1
M.phil Scholar, Department of Computer Science and Engineering,
Alagappa University, Karaikudi.
*2
Professor, Department of Computer Science and Engineering,
Alagappa University, Karaikudi.

Abstract— Data Mining is a technique used in various domains


to give meaning to the available data Classification is a data
mining (machine learning) technique used to predict group
membership for data instances. In this paper, we present the
basic classification techniques. Several major kinds of
classification method including decision tree, Bayesian networks,
k-nearest neighbour classifier, Neural Network, Support vector
machine. The goal of this paper is to provide a review of different
classification techniques in data mining.
.

Keywords— Data mining, classification, Supper vector machine


(SVM), K-nearest neighbour (KNN), Decision Tree.

I. INTRODUCTION
The development of Information Technology has
generated large amount of databases and huge data
in various areas. The research in databases and
information technology has given rise to an
approach to store and manipulate this precious data
for further decision making. Data mining is a
process of extraction of useful information and Figure 1. Knowledge discovery Process
patterns from huge data. It is also called as
knowledge discovery process, knowledge mining Data mining is a logical process that is used to
from data, knowledge extraction or data pattern search through large amount of data in order to find
analysis [1]. The Classification is the one of the useful data. The goal of this technique is to find
major role in Data mining. Basically classification patterns that were previously unknown. Once these
is a 2-step process; the first step is supervised Patterns are found they can further be used to make
learning for the sake of the predefined class label certain decisions for development of their
for training data set. Second step is classification businesses.
accuracy evaluation. Likewise data prediction is
also 2-step process. All the experiments are Three steps involved are
conducted by the Orange Data mining tool, and the  Exploration
input data sets are referring from UCI machine
learning repository.  Pattern identification
 Deployment

ISSN: 2249-2615 http://www.ijpttjournal.org Page 369


International Journal of P2P Network Trends and Technology (IJPTT) – Volume 4 Issue 8- Sep 2013

Exploration: In the first step of data exploration divided into two phases: training, when a
data is cleaned and transformed into another form, classification model is built from the training set,
and important variables and then nature of data and testing, when the model is evaluated on the test
based on the problem are determined. set. In the training phase the algorithm has access to
the values of both predictor attributes and the oal
Pattern Identification: Once data is explored, attribute for all examples of the training set, and it
refined and defined for the specific variables the uses that information to build a classification model.
second step is to form pattern identification. This model represents classification knowledge –
Identify and choose the patterns which make the essentially, a relationship between predictor
best prediction. attribute values and classes – that allows the
prediction of the class of an example given its
Deployment: Patterns are deployed for desired predictor attribute values. For testing, the test set
outcome. the class values of the examples is not shown. In the
testing phase, only after a prediction is made is the
algorithm allowed to see the actual class of the just-
II. CLASSIFICATION
classified example. One of the major goals of a
Data mining algorithms can follow three different classification algorithm is to maximize the
learning approaches: supervised, unsupervised, or predictive accuracy obtained by the classification
semi-supervised. In supervised learning, the model when classifying examples in the test set
algorithm works with a set of examples whose unseen during training.
labels are known. The labels can be nominal values
in the case of the classification task, or numerical Classification is the task of generalizing known
values in the case of the regression task. In structure to apply to new data. For example, an
unsupervised learning, in contrast, the labels of the email program might attempt to classify an email as
examples in the dataset are unknown, and the legitimate or spam.
algorithm typically aims at grouping examples
according to the similarity of their attribute values, Common algorithms include
characterizing a clustering task. Finally, semi-
supervised learning is usually used when a small  Decision Tree,
 K-Nearest Neighbor,
subset of labelled examples is available, together
 Support Vector Machines,
with a large number of unlabeled examples.
 Naive Bayesian Classification,
 Neural Networks.
The classification task can be seen as a supervised
technique where each instance belongs to a class,
which is indicated by the value of a special goal
attribute or simply the class attribute. The goal Decision Tree
attribute can take on categorical values, each of
them corresponding to a class. Each example A Decision Tree Classifier consists of a decision
consists of two parts, namely a set of predictor tree generated on the basis of instances. A decision
attribute values and a goal attribute value. The tree is a classifier expressed as a recursive partition
former are used to predict the value of the latter. of the instance space. The decision tree [4] consists
The predictor attributes should be relevant for of nodes that form a rooted tree, meaning it is a
predicting the class of an instance. In the directed tree with a node called “root” that has no
classification task the set of examples being mined incoming edges. All other nodes have exactly one
is divided into two mutually exclusive and incoming edge. A node with outgoing edges is
exhaustive sets, called the training set and the test called an internal or test node. All other nodes are
set. The classification process is correspondingly called leaves (also known as terminal or decision
nodes). In a decision tree, each internal node splits

ISSN: 2249-2615 http://www.ijpttjournal.org Page 370


International Journal of P2P Network Trends and Technology (IJPTT) – Volume 4 Issue 8- Sep 2013

the instance space into two or more sub-spaces a K-Nearest Neighbor Classifiers (KNN)
certain discrete function of the input attributes
values. K-Nearest neighbor classifiers are based on
learning by analogy. The training samples are
described by n dimensional numeric attributes.
Each sample represents a point in an n-dimensional
space. In this way, all of the training samples are
stored in an n-dimensional pattern space. When
given an unknown sample, a k-nearest neighbour
classifier searches the pattern space for the k
training samples that are closest to the unknown
sample. "Closeness" is defined in terms of
Euclidean distance, where the Euclidean distance,
where the Euclidean distance between two points,

X=(x1,x2,……,xn) and Y=(y1,y2,….,yn) is


d(X, Y)= 2

Figure 2: Decision tree model. slower at classification since all computation is


delayed to that time. Unlike decision tree induction
The root and the internal nodes are associated and backpropagation, nearest neighbor classifiers
with attributes, leaf nodes are associated with assign equal weight to each attribute. This may
classes. Basically, each non-leaf node has an cause confusion when there are many irrelevant
outgoing branch for each possible value of the attributes in the data. Nearest neighbor classifiers
attribute associated with the node. To determine the can also be used for prediction, that is, to return a
class for a new instance using a decision tree, real-valued prediction for a given unknown sample.
beginning with the root, successive internal nodes In this case, the classifier returns the average value
are visited until a leaf node is reached. At the root of the real-valued associated with the k nearest
node and at each internal node, a test is applied. neighbors of the unknown sample. The k-nearest
The outcome of the test determines the branch neighbors’ algorithm is amongest the simplest of all
traversed, and the next node visited. The class for machine learning algorithms. An object is classified
the instance is the class of the final leaf node. by a majority vote of its neighbors, with the object
being assigned to the class most common amongst
The estimation criterion [5] in the decision tree its k nearest neighbors. k is a positive integer,
algorithm is the selection of an attribute to test at typically small. If k = 1, then the object is simply
each decision node in the tree. The goal is to select assigned to the class of its nearest neighbor. In
the attribute that is most useful for classifying binary (two class) classification problems, it is
examples. A good quantitative measure of the helpful to choose k to be an odd number as this
worth of an attribute is a statistical property called avoids tied votes. The same method can be used for
information gain that measures how well a given regression, by simply assigning the property value
attribute separates the training examples according for the object to be the average of the values of its k
to their target classification. This measure is used to nearest neighbors. It can be useful to weight the
select among the candidate attributes at each step contributions of the neighbors, so that the nearer
while growing the tree. neighbors contribute more to the average than the
more distant ones.

ISSN: 2249-2615 http://www.ijpttjournal.org Page 371


International Journal of P2P Network Trends and Technology (IJPTT) – Volume 4 Issue 8- Sep 2013

The neighbors are taken from a set of objects for a k-nearest neighbor algorithm, choosing an
which the correct classification (or, in the case of appropriate k value is significant. If the k value is
regression, the value of the property) is known. too small it is susceptible to overfitting and would
This can be thought of as the training set for the misclassify some traditionally easy to classify
algorithm, though no explicit training step is situations. For example imagine a cluster of records
required. In order to identify neighbors, the objects that all have a class label called "plus" except for
are represented by position vectors in a one point in the cluster labelled as "minus". If a k of
multidimensional feature space. It is usual to use one were chosen for an input that is in the cluster,
the Euclidian distance, though other distance but it just so happens to be closest to the minus,
measures, such as the Manhanttan distance could in then there is a good chance that that point was
principle be used instead. misclassified. This is evident by the fact that if k
was 2 or more the resulting classification would be
different. As well as having a k value that is too
small it is important to choose a value that isn't too
large as it can also lead to misclassification. This
can be seen in a situation with a clust of one class
surround by a cluster of another class. Even if the
input is right in the middle of the first cluster if one
looks at too many points is possible it starts to
count the records from the surrounding cluster as
well.

Support Vector Machine (SVM)

SVM was first introduced by Vapnik [6] and


Figure 3: k-nearest neighbour model has been very effective method for regression,
classification and general pattern recognition. It is
The k-nearest neighbour algorithm is sensitive to considered a good classifier because of its high
the local structure of the data. The unknown sample generalization performance without the need to add
is assigned the most common class among its k a priori knowledge, even when the dimension of the
nearest neighbors. When k=1, the unknown sample input space is very high. It is considered a good
is assigned the class of the training sample that is classifier because of its high generalization
closest to it in pattern space. Nearest neighbour performance without the need to add a priori
classifiers are instance-based or lazy learners in that knowledge, even when the dimension of the input
they store all of the training samples and do not space is very high. The aim of SVM is to find the
build a classifier until a new(unlabeled) sample best classification function to distinguish between
needs to be classified. This contrasts with eager members of the two classes in the training data. The
learning methods, such a decision tree induction metric for the concept of the “best” classification
and back propagation, which construct a function can be realized geometrically. For a
generalization model before receiving new samples linearly separable dataset, a linear classification
to classify. Lazy learners can incur expensive function corresponds to a separating hyperplane f(x)
computational costs when the number of potential that passes through the middle of the two classes,
neighbours (i.e.,stored training samples)with which separating the two.
to compare a given unlabeled sample is great. Once this function is determined, new data instance
Therefore, they require efficient indexing f(xn) can be classified by simply testing the sign of
techniques. An expected lazy learning methods are the function f (xn ); xn belongs to the positive class
faster at a training than eager methods. When using if f(xn)>0.

ISSN: 2249-2615 http://www.ijpttjournal.org Page 372


International Journal of P2P Network Trends and Technology (IJPTT) – Volume 4 Issue 8- Sep 2013

Because there are many such linear hyperplanes, optimization problems are solved successfully. A
SVM guarantee that the best such function is found more recent approach is to consider the problem of
by maximizing the margin between the two classes. learning an SVM as that of finding an approximate
Intuitively, the margin is defined as the amount of minimum enclosing ball of a set of instances.
space, or separation between the two classes as
defined by the hyperplane. Geometrically, the Bayesian Networks
margin corresponds to the shortest distance between A Bayesian network (BN) consists of a
the closest data points to a point on the hyperplane. directed, acyclic graph and a probability
To ensure that the maximum margin hyperplanes distribution for each node in that graph given its
are actually found, an SVM classifier attempts to immediate predecessors [7]. A Bayes Network
maximize the following function with respect to a Classifier is based on a bayesian network which
and b represents a joint probability distribution over a set
of categorical attributes. It consists of two parts, the
directed acyclic graph G consisting of nodes and
arcs and the conditional probability tables. The
nodes represent attributes whereas the arcs indicate
where t is the number of training examples, and i , i direct dependencies. The density of the arcs in a BN
= 1, . . . , t, are non-negative numbers such that the is one measure of its complexity. Sparse BNs can
derivatives of LP with respect to i are zero. i are the represent simple probabilistic models (e.g., naïve
Lagrange multipliers and LP is called the Bayes models and hidden Markov models), whereas
Lagrangian. In this equation, the vectors and dense BNs can capture highly complex models.
constant b define the hyperplane. A learning Thus, BNs provide a flexible method for
machine, such as the SVM, can be modeled as a probabilistic modelling.
function class based on some parameters.Different
function classes can have different capacity in Neural Network
learning, which is represented by a parameter h An artificial neural network (ANN), often
known as the VC dimension. The VC dimension just called a "neural network" (NN), is a
measures the maximum number of training mathematical model or computational model based
examples where the function class can still be used on biological neural networks, in other words, is an
to learn perfectly, by obtaining zero error rates on emulation of biological neural system. It consists of
the training data, for any assignment of class labels an interconnected group of artificial neurons and
on these points. It can be proven that the actual processes information using a connectionist
error on the future data is bounded by a sum of two approach to computation. In most cases an ANN is
terms. The first term is the training error, and the an adaptive system that changes its structure based
second term if proportional to the square root of the on external or internal information that flows
VC dimension h. Thus, if we can minimize h, we through the network during the learning phase.
can minimize the future error, as long as we also
III. CONCLUSIONS
minimize the training error, SVM can be easily
extended to perform numerical calculations. Data mining offers promising ways to uncover
One of the initial drawbacks of SVM is its hidden patterns within large amounts of data. These
computational inefficiency. However, this problem hidden patterns can potentially be used to predict
is being solved with great success. One approach is future behaviour. The availability of new data
to break a large optimization problem into a series mining algorithms, however, should be met with
of smaller problems, where each problem only caution. First of all, these techniques are only as
involves a couple of carefully chosen variables so good as the data that has been collected. Good data
that the optimization can be done efficiently. The is the first requirement for good data exploration.
process iterates until all the decomposed Assuming good data is available, the next step is to

ISSN: 2249-2615 http://www.ijpttjournal.org Page 373


International Journal of P2P Network Trends and Technology (IJPTT) – Volume 4 Issue 8- Sep 2013

choose the most appropriate technique to mine the [3] Orange biolab Documentation.
[4] Decision tree Lior Rokach Department of Industrial
data. In this paper, we present the basic Engineering Tel-Aviv University, Oded Maimon Department
classification techniques. Several major kinds of of Industrial Engineering Tel-Aviv University
classification method including decision tree maimon@eng.tau.ac.il
[5] Overview of Decision Trees by H.Hamilton. E. Gurak, L.
induction, Bayesian networks, k-nearest neighbour Findlater W. Olive
classifier and Neural Network. [6] A Tutorial on Support Vector Machines for Pattern
Recognition CHRISTOPHER J.C. BURGES
REFERENCES [7] Overview of Bayesian network approaches to model gene-
[1] CLUSTERING AND CLASSIFICATION: DATA MINING environment interactions and cancer susceptibility Chengwei
APPROACHES by Ed Colet Su, Angeline Andrew, Margaret Karagas,Mark E. Borsuk.
[2] Support Vector Machine Solvers L´eon Bottou NEC Labs
America, Princeton, NJ 08540, USA Chih-Jen Lin
cjlin@csie.ntu.edu.tw Department of Computer Science
National Taiwan University, Taipei, Taiwan

ISSN: 2249-2615 http://www.ijpttjournal.org Page 374

You might also like