0% found this document useful (0 votes)

144 views50 pages

DMDW 4th Module

The document discusses classification techniques in data mining. It defines classification as assigning objects to predefined categories based on their attributes. Some common classification applications include spam detection, medical diagnosis, and credit card fraud detection. The document outlines the general approach to solving classification problems, which involves building a classification model from training data and evaluating it on test data. It then describes several specific classification techniques in more detail, including decision trees, rules, nearest neighbors, Bayesian methods, and support vector machines. A key part of the document focuses on decision tree induction, covering how trees are constructed recursively and important design considerations.

Uploaded by

Kavya Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

144 views50 pages

DMDW 4th Module

Uploaded by

Kavya Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

DATA MINING AND DATA WAREHOUSING

MODULE-4 CLASSIFICATION
4.1 Introduction
4.2 Decision Trees Induction,
4.3 Method for Comparing Classifiers
4.4 Rule Based Classifiers
4.5 Nearest Neighbor Classifiers
4.6 Bayesian Classifiers
4.7 Important Question

4.1 Introduction
Classification: Definition
Classification, which is the task of assigning objects to one of several predefined categories. Given a
collection of records (training set ),Each record contains a set of attributes, one of the attributes is the
class. Find a model for class attribute as a function of the values of other attributes. The input data for a
classification task is a collection of records. Each record,

VTUPulse.com
Goal: previously unseen records should be assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model. Usually, the given data set is divided into
training and test sets, with training set used to build the model and test set used to validate it.

Applications:
Examples

 Detecting spam email messages based upon the messageheader and content.
 Categorizing cells as malignant or benign based upon the results of MRI scans.
 Classifying galaxies based upon their shapes.
 Categorizing news stories as finance, weather, entertainment, sports, etc
 Classifying credit card transactions as legitimate or fraudulent.
General Approach to Solving a Classification

 A classification technique (or classifier) is a systematic approach to building

 classification models from an input data set.

Dept. of CSE, ATMECE, Mysuru Page 1

DATA MINING AND DATA WAREHOUSING

 Each technique employs a learning algorithm to identify a model that best fits the relationship
between the attribute set and class label of the input data.
 The model generated by a learning algorithm should both fit the input data well and correctly
predict the class labels of records it has never seen before.
 Therefore, a key objective of the learning algorithm is to build models with good generalization
capability; i.e., models that accurately predict the class labels of previously unknown records

VTUPulse.com
Evaluation of the performance of a classification model:
is based on thecounts of test records correctly and incorrectly predicted by the model. Thesecounts are
tabulated in a table known as a confusion matrix.

Dept. of CSE, ATMECE, Mysuru Page 2

DATA MINING AND DATA WAREHOUSING

Most classification algorithms seek models that attain the highest accuracy, or equivalently, the lowest
error rate when applied to the test set.
Classification Techniques:
• Decision Tree based Methods
• Rule-based Methods
• Memory based reasoning
• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines

VTUPulse.com
4.2 Decision Tree Induction
The tree has three types of nodes:
A root node that has no incoming edges and zero or more outgoing edges.
Internal nodes, each of which has exactly one incoming edge and two or more outgoing edges.
Leaf or terminal nodes, each of which has exactly one incoming edge and no outgoing edges.
In a decision tree, each leaf node is assigned a class label. The non terminal nodes, which include the root
and other internal nodes, contain attribute test conditions to separate records that have different
characteristics
In principle, there are exponentially many decision trees that can be constructed from a given set of
attributes.

Dept. of CSE, ATMECE, Mysuru Page 3

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Hunt's Algorithm
In Hunt's algorithm, a decision tree is grown in a recursive fashion by partitioning the training records
into successively purer subsets. Let Di be the set of training records that are associated with

Dept. of CSE, ATMECE, Mysuru Page 4

DATA MINING AND DATA WAREHOUSING

node t and y= {“y1,y2,y3….yc"} be the class labels. The following is a recursive definition of Hunt's
algorithm.
Step 1: If all the records in Dt belong to the same class yt, then t is a leaf node labeled as yt.
Step 2: If Di contains records that belong to more than one class, an attribute test condition is selected to
partition the records into smaller subsets. A child node is created for each outcome of the test condition
and the records in Dt are distributed to the children based on the outcomes. The algorithm is then
recursively applied to each child node.

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 5

DATA MINING AND DATA WAREHOUSING

VTUPulse.com
To illustrate how the algorithm works, consider the problem of predicting whether a loan applicant will
repay her loan obligations or become delinquent, subsequently defaulting on her loan.
The initial tree for the classification problem contains a single node with class label Defaulted = No (see
Figure 4.7a), which means that most of the borrowers successfully repaid their loans. The tree, however,
needs to be refined since the root node contains records from both classes.
The records are subsequently divided into smaller subsets based on the outcomes of the Home Owner test
condition as shown in Figure 4.7(b). The justification for choosing this attribute test condition will be
discussed later. For now, we will assume that this is the best criterion for splitting the data at this point.
Hunt's algorithm is then applied recursively to each child of the root node. From the training set given in
Figure 4.6, notice that all borrowers who are home owners successfully repaid their loans. The left child
of the root is therefore a leaf node labeled Defaulted = No (see Figure 4.7(b)).
For the right child, we need to continue applying the recursive step of Hunt's algorithm until all the
records belong to the same class. The trees resulting from each recursive step are shown in Figures 4.7(c)
and (d).

Design Issues of Decision Tree Induction:

A learning algorithm for inducing decision trees must address the following two issues.
1) Should the training records be split?
Each recursive step of the tree-growing process must select an attribute test condition to divide the records
into smaller subsets. To implement this step, the algorithm must provide a method for specifying the test
condition for different attribute types as well as an objective measure for evaluating the goodness of each
test condition.

Dept. of CSE, ATMECE, Mysuru Page 6

DATA MINING AND DATA WAREHOUSING

3) How should the splitting procedure stop?

A stopping condition is needed to terminate the tree-growing process. A possible strategy is to continue
expanding a node until either all the records belong to the same class or all the records have identical
attribute values. Although both conditions are sufficient to stop any decision tree induction algorithm,
other criteria can be imposed to allow the tree-growing procedure to terminate earlier.

Methods for Expressing Attribute Test Conditions:

Decision tree induction algorithms must provide a method for expressing an attribute test condition and its
corresponding outcomes for different attribute types.
Binary Attributes: The test condition for a binary attribute generates two potential outcomes, as shown
in Figure 4.8.

VTUPulse.com
Nominal Attributes :Since a nominal attribute can have many values, its test condition can be expressed
in two ways, as shown in Figure 4.9. For a multiway split (Figure 4.9(a)), the number of outcomes
depends on the number of distinct values for the corresponding attribute. For example, if an attribute such
as marital status has three distinct values-single, married, or divorced-its test condition will produce a
three-way split.
Figure 4.9(b) illustrates three different ways of grouping the attribute values for marital status into two
subsets.

Dept. of CSE, ATMECE, Mysuru Page 7

DATA MINING AND DATA WAREHOUSING

Ordinal Attributes: Ordinal attributes can also produce binary or multiway splits. Ordinal attribute
values can be grouped as long as the grouping does not violate the order property of the attribute values.
Figure 4.10 illustrates various ways of splitting training records based on the Shirt Size attribute.

VTUPulse.com
The groupings shown in Figures 4.10(a) and (b) preserve the order among the attribute values, whereas
the grouping shown in Figure a.10(c) violates this property because it combines the attribute values Small
and Large into the same partition while Medium and Extra Large are combined into another partition.

Continuous Attributes: For continuous attributes, the test condition can be expressed as a
comparison test (A < V) or (A>=V ,) with binary outcomes, or a range query with outcomes of the
form Vi<=A<Vi+1, for i=1,2…k. The difference between these approaches is shown in Figure 4.11.

Dept. of CSE, ATMECE, Mysuru Page 8

DATA MINING AND DATA WAREHOUSING

How to determine the Best Split:

Greedy approach:
– Nodes with homogeneous class distribution are preferred
Need a measure of node impurity:
Non-homogeneous, High degree of impurity

VTUPulse.com
Homogeneous, Low degree of impurity

Measures of Node Impurity:

• Gini Index
• Entropy
• Misclassification error

Dept. of CSE, ATMECE, Mysuru Page 9

DATA MINING AND DATA WAREHOUSING

Where p(i|t) denote the fraction of records belonging to class i at a given node t and where c is the
number of classes.
The measures developed for selecting the best split are often based on the degree of impurity of the child
nodes. The smaller the degree of impurity, the more skewed the class distribution.

VTUPulse.com
Node N1 has the lowest impurity value, followed by N2 and N3.
To determine how well a test condition performs, we need to compare the degree of impurity of the parent
node (before splitting) with the degree of impurity of the child nodes (after splitting). The larger their
difference, the better the test condition. The gain, is a criterion that can be used to determine the goodness
of a split.

Dept. of CSE, ATMECE, Mysuru Page 10

DATA MINING AND DATA WAREHOUSING

Characteristics of Decision Tree Based Classification:

Advantages :
 Decision tree induction is a nonparametric approach for building classification models. In other
words, it does not require any prior assumptions regarding the type of probability distributions satisfied by
the class and other attributes.
 Finding an optimal decision tree is an NP-complete problem
 Techniques developed for constructing decision trees are computationally inexpensive, making
it possible to quickly construct models even when the training set size is very large. Once a decision tree
has been built, classifying a test record is extremely fast, with a worst- case complexity of O(W), where
,W.irs the maximum depth of the tree.
 Decision trees, especially smaller-sized trees, are relatively easy to interpret.
 Decision tree algorithms are quite robust to the presence of noise.
 The presence of redundant attributes does not adversely affect the accuracy of decision trees.

Disadvantages:
Since most decision tree algorithms employ a top-down, recursive partitioning approach, the number of
records becomes smaller as we traverse down the tree. At the leaf nodes, the number of records may be
too small to make a statistically significant decision about the class representation of the nodes.

A subtree can be replicated multiple times in a decision tree, as illustrated in Figure 4.19. This makes
the decision tree more complex than necessary and perhaps more difficult to interpret. Such a situation
can arise from decision tree implementations that rely on a single attribute test condition at each internal
node.

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 11

DATA MINING AND DATA WAREHOUSING

Exercises:

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 12

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 13

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 14

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 15

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 16

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 17

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Model Over fitting:

The errors committed by a classification model are generally divided into two types: training errors and
generalization errors.

Dept. of CSE, ATMECE, Mysuru Page 18

DATA MINING AND DATA WAREHOUSING

Training error, is the number of misclassification errors committed on training records,

Generalization error is the expected error of the model on test records.
A good model must have low training error as well as low generalization error.

Underfitting : The training and test error rates of the model are large when the size of the tree is very
small. This situation is known as model underfitting.
Underfitting occurs because the model has yet to learn the true structure ofthe data. As a result, it
performs poorly on both the training and the test sets.

Overfitting.: As the number of nodes in the decision tree increases, the tree will have fewer training and
test error . However, once the tree becomes too large, its test error rate begins to increase even though its
training error rate continues to decrease. This phenomenon is known as model over fitting.

Reasons for over fitting:

• Presence of Noise
• Lack of Representative Samples

Figure shows the training and test error rates of the decision tree.

VTUPulse.com
Estimating Generalization Errors:
Generalization errors: error on testing (⅀ e‟(t)) Methods for estimating generalization errors:
1) Optimistic approach: e’(t) = e(t)
2) Pessimistic approach:
o For each leaf node: e‟(t) = (e(t)+0.5)
o
 Ex: For a tree with 30 leaf nodes and 10 errors on training
 (out of 1000 instances): Training error = 10/1000 = 1%
 Generalization error = (10 + 30 0.5)/1000 = 2.5%

Reduced error pruning (REP):

Uses validation data set to estimate generalization error

How to Address Over fitting:

Dept. of CSE, ATMECE, Mysuru Page 19

DATA MINING AND DATA WAREHOUSING

Pre-Pruning (Early Stopping Rule) 

 Stop the algorithm before it becomes a fully-grown tree
 Typical stopping conditions for a node:
o Stop if all instances belong to the same class
o Stop if all the attribute values are the same
o
 More restrictive conditions:
o Stop if number of instances is less than some user-specified threshold
o Stop if class distribution of instances are independent of the available features
(e.g., using x 2 test)
o Stop if expanding the current node does not improve impurity measures (e.g.,
Gini or information gain)

2 Post-pruning
 Grow decision tree to its entirety
 Trim the nodes of the decision tree in a bottom-up fashion
 If generalization error improves after trimming, replace sub-tree by a leaf node.
 Class label of leaf node is determined from majority class of instances in the sub-tree

Exercises:

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 20

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 21

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 22

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 23

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 24

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 25

DATA MINING AND DATA WAREHOUSING

4.4 Rule-Based Classifier

A rule-based classifier is a technique for classifying records using a collection of "if . . .then. . ." rules.
The rules for the model are represented in a disjunctive normal form, . where R is known as the rule set
and r;'s are the classification rules or disjuncts

The left-hand side of the rule is called the rule antecedent or precondition.
The right-hand side of the rule is called the rule consequent, which contains the predicted class yi

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 26

DATA MINING AND DATA WAREHOUSING

A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules

Rule Coverage and Accuracy

Coverage of a rule:
– Fraction of records that satisfy the antecedent of a rule
Accuracy of a rule:
– Fraction of records that satisfy both the antecedent and consequent of a rule

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 27

DATA MINING AND DATA WAREHOUSING

Mutually Exclusive Rules The rules in a rule set .R are mutually exclusive if no two rules in .R are
triggered by the same record. This property ensures that every record is covered by at most one rule in R.
Exhaustive Rules A rule set -R has exhaustive coverage if there is a rule for each combination of
attribute values. This property ensures that every record is covered by at least one rule in –R Ordered
Rules In this approach, the rules in a rule set are ordered in decreasing order of their priority, which can
be defined in many ways (e.g., based on accuracy, coverage, total description length, or the order in which
the rules are generated). An ordered rule set is also known as a decision list. When a test record is
presented, it is classified by the highest-ranked rule that covers the record. This avoids the problem of
having conflicting classes predicted by multiple classification rules.

Rule-Ordering Schemes
Rule-based ordering
Individual rules are ranked based on their quality

 This approach orders the individual rules by some rule quality measure.
 This ordering scheme ensures that every test record is classified by the "best" rule covering it.
Class-based ordering
Rules that belong to the same class appear together
In this approach, rules that belong to the same class appear together in the rule set R. The rules are then
collectively sorted on the basis of their class information.

VTUPulse.com

Rule Evaluation:

Dept. of CSE, ATMECE, Mysuru Page 28

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 29

DATA MINING AND DATA WAREHOUSING

Characteristics of Rule-Based Classifiers:

A rule-based classifier has the following characteristics:
The expressiveness of a rule set is almost equivalent to that of a decision tree because a decision tree
can be represented by a set of mutually exclusive and exhaustive rules. Both rule-based and decision tree

VTUPulse.com
classifiers create rectilinear partitions of the attribute space and assign a class to each partition.
Nevertheless, if the rule-based classifier allows multiple rules to be triggered for a given record, then a
more complex decision boundary can be constructed.

Rule-based classifiers are generally used to produce descriptive models that are easier to interpret,
but gives comparable performance to the decision tree classifier.

Dept. of CSE, ATMECE, Mysuru Page 30

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 31

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 32

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 33

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 34

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 35

DATA MINING AND DATA WAREHOUSING

4.5 Nearest-Neighbor Classifiers

Requires three things
– The set of stored records
–
– VTUPulse.com
Distance Metric to compute distance between records
The value of k, the number of nearest neighbors to retrieve

To classify an unknown record:

– Compute distance to other training records
– Identify k nearest neighbors
– Use class labels of nearest neighbors to determine the class label of unknown record
– (e.g., by taking majority vote)

Dept. of CSE, ATMECE, Mysuru Page 36

DATA MINING AND DATA WAREHOUSING

K-nearest neighbors of a record x are data points that have the k smallest distance to x Compute distance
between two points:
– Euclidean distance

VTUPulse.com
Determine the class from nearest neighbor list
– take the majority vote of class labels among the k-nearest neighbors Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from other classes

Characteristics of Nearest-Neighbor Classifiers:

 Nearest-neighbor classification is part of a more general technique known as instance-based
learning, which uses specific training instances to make predictions without having to maintain an
abstraction (or model) derived from data. Instance-based learning algorithms require a proximity measure
to determine the similarity or distance between instances and a classification function that returns the
predicted class of a test instance based on its proximity to other instances.

 Lazy learners such as nearest-neighbor classifiers do not require model building. However,
classifying a test example can be quite expensive because we need to compute the proximity values
individually between the test and training examples.

 Nearest-neighbor classifiers can produce arbitrarily shaped decision boundaries. Such boundaries
provide a more flexible model representation compared to decision tree and rule-based classifiers that are
often constrained to rectilinear decision boundaries.
 Nearest-neighbor classifiers can produce wrong predictions unless the appropriate proximity
measure and data preprocessing steps are taken.

Dept. of CSE, ATMECE, Mysuru Page 37

DATA MINING AND DATA WAREHOUSING

4.6 Bayesian Classifiers

Bayes’ Theorem:
Bayes‟ theorem is a way to figure out conditional probability. Conditional probability is the probability of
an event happening, given that it has some relationship to one or more other events

VTUPulse.com
Bayes’ Theorem Problems Example #1
In a particular pain clinic, 10% of patients are prescribed narcotic pain killers. Overall, five percent of the
clinic‟s patients are addicted to narcotics (including pain killers and illegal substances). Out of all the
people prescribed pain pills, 8% are addicts. If a patient is an addict, what is the probability that they will
be prescribed pain pills?
Step 1: Figure out what your event “A” is from the question. That information is in the italicized part
of this particular question. The event that happens first (A) is being prescribed pain pills. That‟s given as
10%.
Step 2: Figure out what your event “B” is from the question. That information is also in the italicized
part of this particular question. Event B is being an addict. That‟s given as 5%.
Step 3: Figure out what the probability of event B (Step 2) given event A (Step 1). In other words, find
what (B|A) is. We want to know “Given that people are prescribed pain pills, what‟s the probability they
are an addict?” That is given in the question as 8%, or .8.
Step 4: Insert your answers from Steps 1, 2 and 3 into the formula and solve.
P(A|B) = P(B|A) * P(A) / P(B) = (0.08 * 0.1)/0.05 = 0.16
The probability of an addict being prescribed pain pills is 0.16 (16%).

Bayes’ Theorem Problems Example #2

Given:

– A doctor knows that meningitis causes stiff neck 50% of the time

Dept. of CSE, ATMECE, Mysuru Page 38

DATA MINING AND DATA WAREHOUSING

– Prior probability of any patient having meningitis is 1/50,000

– Prior probability of any patient having stiff neck is 1/20

If a patient has stiff neck, what‟s the probability he/she has meningitis?

Using the Bayes Theorem for Classification:

Let X denote the attribute set and Y denote the class variable. If the class variable has a non- deterministic
relationship with the attributes, then we can treat X and Y as random variables and capture their
relationship probabilistically using P(Y/X). This conditional probability is also known as the posterior
probability for Y, as opposed to its prior probability, P(Y).

VTUPulse.com
During the training phase, we need to learn the posterior probabilities P(Y/X) for every combination of X
and Y based on information gathered from the training data.
By knowing these probabilities, a test record X' can be classified by finding the class Y’ that
maximizes the posterior probability, P(Y'/X').
To illustrate this approach, consider the task of predicting whether a loan borrower will default on their
payments.
Figure 5.9 shows a training set with the following attributes: House Owner, Marital Status, and Annual
Income. Loan borrowers who defaulted on their payments are classified as Yes, while those who repaid
their loans are classified as No

Suppose we are given a test record with the following attribute set:

Dept. of CSE, ATMECE, Mysuru Page 39

DATA MINING AND DATA WAREHOUSING

X :(Home Owner : No, Marital Status : Married, Annual Income : $120K).

To classify the record, we need to compute the posterior probabilities P(Yes/X) and P(No/X) based on
information available in the training data.
If P(Yes/X) > P(No/X), then the record is classified as Yes; otherwise, it is classified as No

VTUPulse.com

Estimating Conditional Probabilities for Continuous Attributes:

There are two ways to estimate the class-conditional probabilities for continuous Attributes in naive
Bayes classifiers:

Dept. of CSE, ATMECE, Mysuru Page 40

DATA MINING AND DATA WAREHOUSING

VTUPulse.com
Example of Naïve Bayes Classifier: Consider the training record

Dept. of CSE, ATMECE, Mysuru Page 41

DATA MINING AND DATA WAREHOUSING

VTUPulse.com
M-estimate of Conditional Probability:

where n is the total number of instances from class Yj, nc is the number of training examples from class
Yi that take on the value Xi, m is a parameter known as the equivalent sample size, and p is a user-
specified parameter.

Dept. of CSE, ATMECE, Mysuru Page 42

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 43

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 44

DATA MINING AND DATA WAREHOUSING

VTUPulse.com

Characteristics of Naive Bayes Classifiers:

Naive Bayes classifiers generally have the following characteristics:

Dept. of CSE, ATMECE, Mysuru Page 45

DATA MINING AND DATA WAREHOUSING

 They are robust to isolated noise points because such points are averaged out when estimating
conditional probabilities from data.
Naive Bayes classifiers can also handle missing values by ignoring the example during model building
and classification.
 They are robust to irrelevant attributes.
 Correlated attributes can degrade the performance of naive Bayes classifiers because the
conditional independence assumption no longer holds for such attributes.

Bayesian Belief Networks

Bayesian networks represent an advanced form of general Bayesian probability
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of
interest.

A Bayesian belief network (BBN), or simply, Bayesian network, provides a graphical representation of
the probabilistic relationships among a set of random variables. There are two key elements of a Bayesian
network:
1. A directed acyclic graph (dag) encoding the dependence relationships among a set of variables.
2. A probability table associating each node to its immediate parent nodes.

Consider three random variables,A, B, and C, in which A and B are independent variables and each has a
direct influence on a third variable, C.
The relationships among the variables can be summarized into the directed acyclic graph shown in Figure
5.12(a).

VTUPulse.com
Each node in the graph represents a variable, and each arc asserts the dependence relationship between the
pair of variables. If there is a directed arc from X to Y, then X is the parent of Y and Y is the child of X.
F\rrthermore, if there is a directed path in the network from X to Z, then X is an ancestor of Z, whlle Z is a
descendant of X.
For example, in the diagram shown in Figure 5.12(b), A is a descendant of D and D is an ancestor of
B. Both B and D arc also non-descendants of A.

In the diagram shown in Figure 5.12(b), A is conditionally independent of both B and D given C because
the nodes for B and D are non-descendants of node A.

Dept. of CSE, ATMECE, Mysuru Page 46

DATA MINING AND DATA WAREHOUSING

The conditional independence assumption made by a naive Bayes classifier can also be represented using
a Bayesian network, as shown in Figure 5.12(c), where gr is the target class and {Xt,Xz,...,Xa} is the
attribute set.
Besides the conditional independence conditions imposed by the network topology, each node is also
associated with a probability table.
1. If a node X does not have any parents, then the table contains only the prior probability P(X).
2. If a node X has only one parent, Y, then the table contains the conditional probability P(XIY).
3. If a node X has multiple parents, {Y1,,Y2, . . . ,Yn}, then the table contains the conditional
probability P(XlY1,Y2,. . ., Yn.).

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 47

DATA MINING AND DATA WAREHOUSING

EXAMPLE:
You have a new burglar alarm installed at home.
It is fairly reliable at detecting burglary, but also sometimes responds to minor earthquakes.
You have two neighbors, Ali and Veli, who promised to call you at work when they hear the alarm.
Ali always calls when he hears the alarm, but sometimes confuses telephone ringing with the alarm
and calls too.
Veli likes loud music and sometimes misses the alarm.
Given the evidence of who has or has not called, we would like to estimate the probability of a
burglary.

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 48

DATA MINING AND DATA WAREHOUSING

Characteristics of BBN
Following are some of the general characteristics of the BBN method:
BBN provides an approach for capturing the prior knowledge of a particular domain using a
graphical model. The network can also be used to encode causal dependencies among variables.

VTUPulse.com
Constructing the network can be time consuming and requires a large amount of effort. However,
once the structure of the network has been determined, adding a new variable is quite straightforward.

Bayesian networks are well suited to dealing with incomplete data. Instances with missing attributes
can be handled by summing or integrating the probabilities over all possible values of the attribute.

Because the data is combined probabilistically with prior knowledge, the method is quite robust to
model over fitting.

4.7 Important Questions:

1. What is classification. Explain the general approach for solving a classification problem with an
example.
2. How decision trees are used for classification. Explain decision tree induction algorithm for
classification.
3. Write Hunts algorithm and illustrate it‟s working.
4. Explain the Methods for Expressing Attribute Test Conditions.
5. Explain various measures for selecting the best split with an example.
6. Explain the importance of evaluation criterion for classification methods.
7. Explain the characteristics of decision tree Induction.
8. Explain Model Over fitting. What are the reasons for overfitting? How to address overfitting
problems

Dept. of CSE, ATMECE, Mysuru Page 49

DATA MINING AND DATA WAREHOUSING

9. Explain how to estimate generalization errors.

10. List characteristics of decision tree induction.
11. Give the difference between rule-based ordering and class-based ordering scheme.
12. Explain rule-based classifier and its characteristics.
13. Explain the characteristics of rule based classifier
14. How to improve accuracy of classification. Explain
15. Explain k-nearest neighbor classification algorithm.
16. Explain any characteristics of the nearest neighbor classifier.
17. What is Baye‟s theorem? Show how it is used for classification.
18. Explain with an example how naïve Baye „s algorithm used for classification.
20. Discuss the two common strategies for growing a classification rule.
21. Explain sequential covering algorithm for rule extraction.
22. Explain model building in Bayesian networks.

VTUPulse.com

Dept. of CSE, ATMECE, Mysuru Page 50

decision tree
No ratings yet
decision tree
66 pages
DM Mod4
No ratings yet
DM Mod4
108 pages
BMW M-4
No ratings yet
BMW M-4
108 pages
Module 4DMDW
No ratings yet
Module 4DMDW
45 pages
Computer Networks and Security (18CS52) Notes
67% (9)
Computer Networks and Security (18CS52) Notes
218 pages
Data Minin1
No ratings yet
Data Minin1
104 pages
DWDM Unit 3-Part 1
No ratings yet
DWDM Unit 3-Part 1
14 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
96 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
Module 4
No ratings yet
Module 4
41 pages
Decision Trees
No ratings yet
Decision Trees
34 pages
Housing Price Prediction Modeling Using Machine Learning
No ratings yet
Housing Price Prediction Modeling Using Machine Learning
6 pages
BIT150 Assignment 1 - From Introduction To C++ Functions
No ratings yet
BIT150 Assignment 1 - From Introduction To C++ Functions
13 pages
Classification Using Decision Tree: CSE-454: Data Warehousing and Data Mining Sessional
No ratings yet
Classification Using Decision Tree: CSE-454: Data Warehousing and Data Mining Sessional
23 pages
Ls 01 Basic Concept Lecture Notes 1 3
No ratings yet
Ls 01 Basic Concept Lecture Notes 1 3
26 pages
Unit 3
No ratings yet
Unit 3
95 pages
Quizizz - Measuring Angles
No ratings yet
Quizizz - Measuring Angles
5 pages
Unit 3
100% (1)
Unit 3
21 pages
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
No ratings yet
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
8 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
07.2.Decision Trees_ML
No ratings yet
07.2.Decision Trees_ML
32 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
Classification&DecisionTree (2)
No ratings yet
Classification&DecisionTree (2)
10 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Hooputra 2004
No ratings yet
Hooputra 2004
16 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
27 pages
siv UNIT-3 Classification DWM PART-A
No ratings yet
siv UNIT-3 Classification DWM PART-A
12 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
DMDW_MOD4_classification_PPT Updated
No ratings yet
DMDW_MOD4_classification_PPT Updated
128 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
m3
No ratings yet
m3
141 pages
DMDW 5th Module
No ratings yet
DMDW 5th Module
28 pages
Understanding The Role of Agility and Responsiveness Capabilities in Achieving Supply Chain Performance: The Case of Manufacturing Companies
No ratings yet
Understanding The Role of Agility and Responsiveness Capabilities in Achieving Supply Chain Performance: The Case of Manufacturing Companies
20 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Aiza Chinchin Angel Concon Aiza Chinchin Angel Concon
No ratings yet
Aiza Chinchin Angel Concon Aiza Chinchin Angel Concon
3 pages
Shear Zones - A Review: Earth-Science Reviews May 2017
No ratings yet
Shear Zones - A Review: Earth-Science Reviews May 2017
23 pages
MEE 212 Computer Project
No ratings yet
MEE 212 Computer Project
2 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
Relationship Between Self-efficacy and Language Proficiency- A Meta-Analysis
No ratings yet
Relationship Between Self-efficacy and Language Proficiency- A Meta-Analysis
11 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Tutorial Thyme
No ratings yet
Tutorial Thyme
45 pages
DMDW 2nd Module
No ratings yet
DMDW 2nd Module
29 pages
DWDM UNIT 4
No ratings yet
DWDM UNIT 4
80 pages
DM_06-Mar-2025
No ratings yet
DM_06-Mar-2025
13 pages
Venn Diagram
No ratings yet
Venn Diagram
1 page
A Note On The Derivation of FRechet and Gateaux
No ratings yet
A Note On The Derivation of FRechet and Gateaux
7 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
3 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Learning Analytics
No ratings yet
Learning Analytics
56 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Class Basic
No ratings yet
Class Basic
75 pages
SH5109 - 0 Module Information - Biostatistics Epidemiology
No ratings yet
SH5109 - 0 Module Information - Biostatistics Epidemiology
10 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
ANOVA3
No ratings yet
ANOVA3
194 pages
Comprehensive Display of Digital Image Copy-Move Forensics Techniques
No ratings yet
Comprehensive Display of Digital Image Copy-Move Forensics Techniques
6 pages
Module Booklet - Math Week 4
No ratings yet
Module Booklet - Math Week 4
10 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
FEMA P-1051B Flowcharts
No ratings yet
FEMA P-1051B Flowcharts
31 pages
Sheet#6 - Matrices and Determinants
No ratings yet
Sheet#6 - Matrices and Determinants
11 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Logarithms Part 3 - V Math
No ratings yet
Logarithms Part 3 - V Math
80 pages
Graphing and Solving Inequalities
No ratings yet
Graphing and Solving Inequalities
9 pages
Pulsewidth Modulation For Electronic Power Conversion: J. Holtz, Fellow, IEEE Wuppertal University - Germany
100% (1)
Pulsewidth Modulation For Electronic Power Conversion: J. Holtz, Fellow, IEEE Wuppertal University - Germany
18 pages
TOS MATH 8 4th Periodical
No ratings yet
TOS MATH 8 4th Periodical
2 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
Tutorial Regression
No ratings yet
Tutorial Regression
2 pages
06 9MA0 01 9MA0 02 A Level Pure Mathematics Practice Set 6 Mark Scheme
No ratings yet
06 9MA0 01 9MA0 02 A Level Pure Mathematics Practice Set 6 Mark Scheme
22 pages
Pharmacology Midterms Notes For Students
No ratings yet
Pharmacology Midterms Notes For Students
7 pages
Engineering Fluid Mechanics Twelfth Edition by Elger SI Ed 2022
No ratings yet
Engineering Fluid Mechanics Twelfth Edition by Elger SI Ed 2022
15 pages
Math 10 - Q2 - Week 3 - Module 3 - Relations Among Chords, Arcs, Central Angles - For Reproduction
No ratings yet
Math 10 - Q2 - Week 3 - Module 3 - Relations Among Chords, Arcs, Central Angles - For Reproduction
21 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Sample - Solution Manual Introduction To Chemical Engineering Thermodynamics 8th Edition Smith & Van Ness
No ratings yet
Sample - Solution Manual Introduction To Chemical Engineering Thermodynamics 8th Edition Smith & Van Ness
19 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
EXCEL Beginning Practice Activities
100% (1)
EXCEL Beginning Practice Activities
3 pages