Decision Trees

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Decision Tree

Decision-Tree
Decision-Tree Learning
Decision Trees Functions:
• Data compression
• Prediction
DECISION TREE
• An internal node is a test on an attribute
• A branch represents an outcome of the test, e.g.,
COLOR = Red
• A leaf node represents a class label or class label
distribution
• At each node, one attribute is chosen to split training
examples into distinct classes as much as possible
• A new case is classified by following a matching path to
a leaf node
• The no. of leaf nodes represent the no of branches.
• Each branch corresponds to a classification rule.
Decision Tree Based Classification
• Advantages:
–Inexpensive to construct
–Extremely fast at classifying unknown records
–Easy to interpret for small-sized trees
–Decision trees provide a clear indication of which
features are most important for classification.
–Decision trees perform classification without
requiring much computation.
–Accuracy is comparable to other classification
techniques for many simple data sets
The TDIDT Algorithm
(Top-Down Induction of Decision Trees)
• Many Algorithms:
–Hunt’s Algorithm (one of the earliest)
–CART
–ID3 (Iterative Dichotomiser 3), C4.5, C5.0
–SLIQ,SPRINT
Divide-And-Conquer Algorithms
Weather Data: Play or not Play?Outlook
Example Tree for “Play?”
Building Decision Tree
• Top-down tree construction
–At start, all training examples are at the root
–Partition the examples recursively by
choosing one attribute each time
• Bottom-up tree pruning
–Remove subtrees or branches, in a bottom-
up manner, to improve the estimated accuracy
on new
Choosing the Splitting Attribute
• At each node, available attributes are evaluated
on the basis of separating the classes of the
training examples. A Goodness function is used
for this purpose
• Typical goodness functions:
–information gain (ID3/C4.5)
–accuracy
–giniindex
–others (information gain ratio)
Which attribute to select?
A criterion for attribute selection
• Which is the best attribute?
–The one which will result in the smallest tree
–Heuristic: choose the attribute that produces
the “purest” nodes
• Popular impurity criterion: information gain
–Information gain increases with the average
purity of the subsets that an attribute produces
• Strategy: choose attribute that results in greatest
information gain
Computing information
Example: attribute “Outlook”
Computing the information gain
Cont’d
Continuing to split
Continuing to split
The final decision tree

Note: not all leaves need to be pure; sometimes


identical instances have different classes
- Splitting stops when data can’t be split any further
Missing values
Overfitting and Pruning
Prepruning
Early stopping
Post-Pruning
Postpruning
Subtree replacement
Subtree raising
Model Trees
Stopping Criteria for Tree Induction
• Stop expanding a node when all the records
belong to the same class
• Stop expanding a node when all the records
have similar attribute values
• Early termination
Summary

You might also like