Lecture 4
Lecture 4
Lecture 4
Machine Learning
1
A simple example
2
Other solutions for the same data
3
Decision tree: the prinicple
4
Small trees better than big trees
Easier to interpret
Lower danger of training-data overfitting
Tendency to eliminate irrelevant and redundant attributes
Cheaper classification when attribute values expensive or difficult to
obtain
5
Induction of small decision trees
6
An attribute’s information content in a
2-class domain
Information contents of the message, “example x is pos”
◦ If most examples are positive, information contents is low
◦ If most examples are negative, information contents is big
7
Entropy: average information contents
8
Amount of information contributed by
an attribute
Attrib. at divides training set into subsets, 𝑇𝑖 , each defined by one value of at
|𝑇𝑖 |
Let 𝑃𝑖 =
|𝑇|
Information gain:
9
Searching for the best attribute
10
Find the best attribute (example)
11
Find the best attribute (cont.)
12
Binary splits of numeric attrib. (cont.)
13
A decision tree is a set of rules
14
Other approaches
classification
classifiers’ • Decision tree recommends an
attribute test
errors
scenarios • User finds this attribute’s value
and presents it to the tree
• Important: We may not know all
attribute values. May have to
measure them at great costs!
15