0% found this document useful (0 votes)
19 views30 pages

Decision Tree

The document discusses decision tree algorithms. It describes how decision trees are constructed in a top-down recursive manner by splitting the training examples at each node based on an attribute. It discusses stopping conditions for partitioning and describes how decision trees are used for classification. The document also discusses different measures that can be used for attribute selection such as information gain, gain ratio, and Gini index.

Uploaded by

logi9361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

Decision Tree

The document discusses decision tree algorithms. It describes how decision trees are constructed in a top-down recursive manner by splitting the training examples at each node based on an attribute. It discusses stopping conditions for partitioning and describes how decision trees are used for classification. The document also discusses different measures that can be used for attribute selection such as information gain, gain ratio, and Gini index.

Uploaded by

logi9361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Decision Tree Algorithm

• Tree is constructed in a top-down recursive divide-and conquer


manner.
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are discretized in
advance)
Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
• There are no samples left
Decision trees – Testing Phase
• Given a tuple, X, for which the associated class label is unknown, the
attribute values of the tuple are tested against the decision tree.
• A path is traced from the root to a leaf node, which holds the class prediction
for that tuple.
• Decision trees can easily be converted to classification rules.
Why are decision tree classifiers so
•popular?”
Construction of decision tree classifiers does not require any domain knowledge
or parameter setting
• Can handle multidimensional data.
• Easy to assimilate by humans.
• Learning and classification - simple and fast.
• In general, decision tree classifiers have good accuracy.
• However, successful use may depend on the data at hand.
• Application areas - Medicine, manufacturing and production, financial analysis,
astronomy, and molecular biology.
WHICH ATTRIBUTE TO SPLIT ON??
Attribute Selection Measure
• Information Gain
• Gain Ratio
• Gini Index
Entropy or Expected Information
Attribute selection measures (Splitting rules)-
Information gain, Gain ratio, Gini index
Gini index - enforce the resulting tree to be binary.

Information gain, do not, therein allowing multiway splits (i.e., two or more branches to be grown from a node).
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed tomclassify a tuple in D:
Info( D )  E ( S )   pi log 2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
classify D: v | D |
InfoA ( D )  I ( attribute)  
j
 Info( D j )
j 1 | D |
 Information gained by branching on attribute A
Gain(A)  Info(D)  Info A(D)
9
Expected information (entropy)
needed to classify a tuple in D:
Information needed (after using A to split D into
v partitions) to classify D:

Information gain is defined as the difference between the


original information requirement (i.e., based on just the
proportion of classes) and the new requirement (i.e., obtained
after partitioning on A).
Further Consider Rain similar to SUNNY
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
• Tree is constructed in a top-down recursive divide-and-conquer
manner
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are
discretized in advance)
• Examples are partitioned recursively based on selected
attributes
• Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
• Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
• There are no samples left
22
Gain Ratio

• The information gain measure is biased toward many outcomes.


• It prefers to select attributes having a large number of values.
• C4.5, a successor of ID3, uses an extension to information gain- GAIN
RATIO - to overcome this bias.
• It applies a kind of normalization to information gain using a “split
information”
• The attribute with the maximum gain ratio is selected as the splitting
attribute.
• a kind of normalization to information gain using a “split information”

• This value represents the potential information generated by splitting the


training data set, D, into v partitions, corresponding to the v outcomes of a
test on attribute A.
• for each outcome, it considers the number of tuples having that outcome
with respect to the total number of tuples in D.
• It differs from information gain, which measures the information with
respect to classification that is acquired based on the same partitioning. The
gain ratio is defined as
Gini Index
• Gini index measures the impurity of D, a data partition or set of
training tuples

• where pi is the probability that a tuple in D belongs to class Ci


• The sum is computed over m classes.
• The Gini index considers a binary split for each attribute d as the
splitting attribute.
• The Gini index is used in CART.
• When considering a binary split,
• we compute a weighted sum of the impurity of each resulting
partition.
• For example, if a binary split on A partitions D into D1 and D2, the Gini
index of D given that partitioning is

• The reduction in impurity that would be incurred by a binary split on a discrete- or continuous-valued
attribute A is

The attribute that maximizes the reduction in impurity


(or, equivalently, has the minimum Gini index) ------
is selected as splitting attribute
Additional terms
• When decision trees are built, many of the branches may reflect noise
or outliers in the training data.
• Tree pruning attempts to identify and remove such branches, with the
goal of improving classification accuracy on unseen data.
• Scalability issue
• Incremental versions of decision tree induction have also been
proposed.
• When given new training data, these restructure the decision tree
acquired from learning on previous training data,
• rather than relearning a new tree from scratch.

You might also like