Decision Trees
Decision Trees
Decision Trees
Decision-Tree
Decision-Tree Learning
Decision Trees Functions:
• Data compression
• Prediction
DECISION TREE
• An internal node is a test on an attribute
• A branch represents an outcome of the test, e.g.,
COLOR = Red
• A leaf node represents a class label or class label
distribution
• At each node, one attribute is chosen to split training
examples into distinct classes as much as possible
• A new case is classified by following a matching path to
a leaf node
• The no. of leaf nodes represent the no of branches.
• Each branch corresponds to a classification rule.
Decision Tree Based Classification
• Advantages:
–Inexpensive to construct
–Extremely fast at classifying unknown records
–Easy to interpret for small-sized trees
–Decision trees provide a clear indication of which
features are most important for classification.
–Decision trees perform classification without
requiring much computation.
–Accuracy is comparable to other classification
techniques for many simple data sets
The TDIDT Algorithm
(Top-Down Induction of Decision Trees)
• Many Algorithms:
–Hunt’s Algorithm (one of the earliest)
–CART
–ID3 (Iterative Dichotomiser 3), C4.5, C5.0
–SLIQ,SPRINT
Divide-And-Conquer Algorithms
Weather Data: Play or not Play?Outlook
Example Tree for “Play?”
Building Decision Tree
• Top-down tree construction
–At start, all training examples are at the root
–Partition the examples recursively by
choosing one attribute each time
• Bottom-up tree pruning
–Remove subtrees or branches, in a bottom-
up manner, to improve the estimated accuracy
on new
Choosing the Splitting Attribute
• At each node, available attributes are evaluated
on the basis of separating the classes of the
training examples. A Goodness function is used
for this purpose
• Typical goodness functions:
–information gain (ID3/C4.5)
–accuracy
–giniindex
–others (information gain ratio)
Which attribute to select?
A criterion for attribute selection
• Which is the best attribute?
–The one which will result in the smallest tree
–Heuristic: choose the attribute that produces
the “purest” nodes
• Popular impurity criterion: information gain
–Information gain increases with the average
purity of the subsets that an attribute produces
• Strategy: choose attribute that results in greatest
information gain
Computing information
Example: attribute “Outlook”
Computing the information gain
Cont’d
Continuing to split
Continuing to split
The final decision tree