DM-BS-Lec7-Classification - DTI
DM-BS-Lec7-Classification - DTI
DM-BS-Lec7-Classification - DTI
Classification
Classification and
Prediction
What is classification? What is prediction?
Classification by decision tree induction
Bayesian Classification
Other Classification Methods
Prediction
Classification accuracy
Summary
Classification vs.
Prediction
Classification:
predicts categorical class labels
classifies data (constructs a model) based on the training
set and the values (class labels) in a classifying attribute
and uses it in classifying new data
Prediction:
models continuous-valued functions, i.e., predicts unknown
or missing values
Typical Applications
credit approval
target marketing
medical diagnosis
treatment effectiveness analysis
Classification—A Two-Step
Process
Model construction: describing a set of predetermined
classes
Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
The set of tuples used for model construction: training set
The model is represented as classification rules, decision trees, or
mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
The known label of test sample is compared with the classified
result from the model
Accuracy rate is the percentage of test set samples that are
correctly classified by the model
Test set is independent of training set, otherwise over-fitting
will occur
Classification Process
(1): Model Construction
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Supervised vs.
Unsupervised Learning
Supervised learning (classification)
Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Classification and
Prediction
What is classification? What is prediction?
Classification by decision tree induction
Bayesian Classification
Other Classification Methods
Prediction
Classification accuracy
Summary
Classification Methods
(1) Decision Tree Induction
Classification by Decision
Tree Induction
Decision tree
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution
Decision tree generation consists of two phases
Tree construction
At start, all the training examples are at the root
Partition examples recursively based on selected attributes
Tree pruning
Identify and remove branches that reflect noise or outliers
Use of decision tree: Classifying an unknown sample
Test the attribute values of the sample against the decision tree
Conditions for stopping
partitioning
age?
<=30 overcast
30..40 >40
no yes no yes
How to Construct Decision
Tree?
Which attribute to select?
(1) Attribute Selection Measure
Information Gain in Decision Tree
Induction
p p n n
InfoGain( p, n) log 2 log 2
pn pn pn pn
(2) Impurity/Entropy
(informal)
p p n n
InfoGain( p, n) log 2 log 2
pn pn pn pn
How to Draw the Decision
Tree?
Note Down this Table
p p n n
InfoGain( p, n) log 2 log 2
pn pn pn pn
2) What will be the entropy
of attributes?
Gain(outlook)=I(9,5)-E(outlook)
=0.97-0.692
=0.248
Avoid Overfitting in
Classification
The generated tree may overfit the training data
Too many branches, some may reflect anomalies due
to noise or outliers
Result is in poor accuracy for unseen samples
Two approaches to avoid overfitting
Prepruning: Halt tree construction early—do not split
a node if this would result in the goodness measure
falling below a threshold
Difficult to choose an appropriate threshold
Postpruning: Remove branches from a “fully grown”
tree—get a sequence of progressively pruned trees
Use a set of data different from the training data
to decide which is the “best pruned tree”
Approaches to Determine
the Final Tree Size
Separate training (2/3) and testing (1/3) sets
Use cross validation
Use all the data for training
but apply a statistical test (e.g., chi-square) to
estimate whether expanding or pruning a
node may improve the entire distribution
Enhancements to basic
decision tree induction
Allow for continuous-valued attributes
Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete
set of intervals
Handle missing attribute values
Assign the most common value of the attribute
Assign probability to each of the possible values
Attribute construction
Create new attributes based on existing ones that are
sparsely represented
This reduces fragmentation, repetition, and replication
Classification in Large
Databases
Classification—a classical problem extensively studied by
statisticians and machine learning researchers
Scalability: Classifying data sets with millions of examples
and hundreds of attributes with reasonable speed
Why decision tree induction in data mining?
relatively faster learning speed (than other classification
methods)
convertible to simple and easy to understand
classification rules
can use SQL queries for accessing databases
comparable classification accuracy with other methods
Question?
Outlook=overcast
Temperature=hot
Humidity=high
Windy=false
Play=?