Machine Learning Functionalities
Machine Learning Functionalities
Machine Learning Functionalities
Learning
Introduction
Machine
Learning
Functionaliti
es
Outline
Introduction
Mining
Associations
Classification
Numeric
Prediction
Cluster Analysis
Interesting
Patterns
References
Rules
Introducti
on
Introduction
Machine
Learning tasks
Introduction
Different
Introduction
Knowledge
view
Machine Learning functionalities are used to
kind of patterns
specify
the
to be
found
Main
functionalities
Mining
Association
Rules
Classification
Numeric Prediction
Cluster Analysis
Mining Association
Rules
Frequent Patterns
Frequent
patterns are
occur
patterns that
frequently in data.
The
kinds of frequent
patternsa set of items
Frequent
itemsets patterns:
that frequently
refers
to
appear together in a
transactional data
set, such as milk and bread.
Frequent sequential patterns: such as the
pattern that customers tend to purchase first
a PC, followed by a digital camera, and then
a memory
is a (frequent)
sequential
Mining
frequentcard,
patterns
leads to the
discovery
pattern.
of
interesting associations and correlations
within data.
Association Rules
Suppose,
as a marketing manager of
AllElectronics, you would like to determine
which items are frequently purchased
together within the same transactions.
An example of association rule from the
AllElectronics transactional database, is:
Association Rules
Association Rules
We
like:
The rule indicates that of the AllElectronics customers
to
under
study, 2% are 20 to 29 years of age with an income
of 20,000
29,000 and have purchased a CD player at
AllElectronics.
ThisThere
is an
more
than
is aassociation
60% probabilitybetween
that a customer
in this
one age and income group will purchase a CD player.
humidity
Support = 4, confidence =
100% minimum support and confidence
Normally:
prespecified,
e.g. support >= 2 and confidence >= 95% for
weather data
Interpreting associationrules
Interpretation
is
is
not the
same
It
no obvious
t :
as
Association Rules
Large
number of possible
associations
Output needs to be restricted to show only
the most
predictive associations
Association
satisfy
both:
A minimum support threshold: number of
instances
predicted correctly
A minimum confdence threshold: number
of correct predictions, as proportion of all
instances that rule applies to
Association Rules
Association
learning is
unsupervised
nonnumeric
Association rules usually involve
only attributes
to uncover
interestinganalysis
statistical
Additional
can correlations
be
between
performed
associated attribute-value pairs.
Association Rules
Examples
of association rule
algorithms:
Apriori Algorithm
FP-growth Algorithm
Tree-Projection algorithm
ECLAT (Equivalence CLASS
Transformation)
algorithm
Classification
Classification
Classifcation
Construct models (functions) that describe
and
distinguish classes or concepts to predict the
class of objects whose class label is unknown
Example:
data
Test
data
Classification
Examples
of classification
output:
Decision tree
Classifcation
rules
Neural
networks
Decision trees
Divide
Decision trees
Leaf
nodes
To classifications
the attribute is
nominal:
Number of children usually equal to the
number of
possible values
Usually attribute wont get tested more
If the attribute is
than once
numeric:
Test whether value is
less than
constant
greater or
Attribute may get tested times
several
(or multi-way
+ Integer: less than, equal to, greater
than
Other
possibility: three-waysplit)
split
+ Real: below, within, above
Classification rules
Popular
alternative to decision
trees
Rules
include two
parts:
Antecedent
or precondition:
a series of tests just like the tests at of a
decision
the nodes tree
+ Tests are usually logically ANDed
together
+
fire
+ The class or set of classes or probability distribution
Consequent
or conclusion:
assigned
by rule
Example:
rules
One rule for each leaf:
+
Produces
clear
Doesnt matter in which order they are
executed
But:
a tree
Tree cannot easily express disjunction
between rules
Example:
attributes
identical
subtrees
set:
Ordered set of rules (decision list)
+
Neural Network
A
is
typically a collection of neuron-like processing
units with weighted connections between the
units.
Classification Techniques
Examples
of
classifcation
Decision Trees
techniques:
Classification Rules
Neural Network
Nave Bayesian
classification
Support vector machines
k-nearest neighbor
classification
Difference
association analysis to
classification
learning:
Can predict any attributes value, not just
the class
More than one attributes value at a time
There are more association rules than
classification rules
Numeric
Prediction
Numeric prediction
Numeric
prediction:
predicting a numeric
quantity
Numeric
prediction is a variant of
classification
learning in which the outcome is a
numeric value rather than a category.
Learning
Processisissupervised
being provided with target
value
Measure
data
success on test
Numeric prediction
A
Numeric prediction
To
Numeric prediction
Representing
numeric
prediction:
Linear regression equation: an equation to
predicts
a numeric quantity
Regression tree: a decision tree where
each leaf predicts a numeric quantity
+
data
Regression tree
Regression
data
Regression tree
We
Model tree
Model
data
Cluster Analysis
Cluster Analysis
Clustering
grouping similar instances into
clusters
Clustering
is
unsupervised
The class of an example is not
known
Example:
Cluster Analysis
A 2-D plot of customer data with respect to customer locations
in a city,
showing three data clusters. Each cluster center is marked
with a +.
Clustering
Clustering
step of
classification learning in which rules are
learned that give an intelligible
description of how new instances should
be placed into the clusters.
Representing clusters
The
Representing clusters
Representing clusters
Clustering
Examples
of clustering
algorithm:
k-Means algorithm
k-medoids algorithm
AGNES (AGglomerative
NESting)
DIANA (DIvisive ANAlysis)
BIRCH (Balanced Iterative
Reducing
Using Hierarchies)
and Clustering
Interesting
Patterns
Interesting Patterns
Machine
of
patterns:
Interesting Patterns
Objective
measures
Objective: based on statistics and of
structures
e.g.,
support,
confidence,
patterns,
Subjective:
based
on users
belief in the
etc.
data, e.g.,
unexpectedness, novelty, actionability, etc.
References
References
Ian
The
end