BA_

PRESENTATION
Advanced Business
Analytics
1. Mohammed Shaban.
2. Mohammed Mahmoud.
3. Ziad Hamada.
4. Zahraa Sayed.
5. Radwa Ahmed.
6. Tasneem Mohamed-Sadek.
WHAT KIND OF PROBLEM I NEED TO SOLVE? AND
HOW TO SOLVE IT?
We present in this chapter K-means clustering, Apriori algorithm for Association

rules, Classification methods with Decision Trees. Those algorithms are chosen to
be presented in this chapter among several techniques available for the Data
Scientists to use to solve business analytic problems.
The problem to solve Techniques Algorithm
I want to group items by similarity. I want to find structure in

Clustering K-means clustering
the data.
Association
I want to discover relationships between actions or items. Apriori
Rules
I want to determine the relationship between the outcome and

Regression Linear Regression
the input variables.
Naïve Bayes, Decision

I want to assign (known) labels to objects. Classification
Trees
I want to find the structure in a temporal process. I want to Time Series

ACF, PACF, ARIMA
forecast the behavior of a temporal process. Analysis
Regular expressions,
I want to analyze my text data. Text Analysis Document representation
(Bag of Words), TF-IDF
CLUSTERING
Clustering is an unsupervised learning technique that groups
similar objects into clusters based on patterns in the data.
Key Points:
1. Similarity vs. Distance: Measures how alike or different items
are.
2. Applications:
Grouping documents by topics.
Customer segmentation for targeted marketing.
3. Characteristics:
Creates cohesive, tight clusters.
Ensures groups are well-separated.
CLUSTERING TASKS AND APPLICATIONS
Clustering is widely used across various fields to uncover patterns:
Biology: Identifying clusters of genes or proteins with similar functions or

discovering new species groups from genetic data.
Information Retrieval: Grouping documents by topic or clustering similar search

results to improve user experience.
Marketing: Segmenting customers based on purchasing behavior, preferences,

or demographics for personalized marketing strategies.
Earthquake Studies: Mapping earthquake epicenters to identify fault lines or

clustering seismic activities by magnitude and frequency.
City Planning: Categorizing houses or regions based on factors like type, value,
population density, or geographic location.
CLUSTERING
The K-Means Clustering
-Given k, the k-means algorithm is implemented in four steps:

1• Partition objects into k nonempty subsets
2• Compute seed points as the centroids of the clusters of the current
1.
partitioning (the centroid is the center, i.e., mean point, of the cluster)
3• Assign each object to the cluster with the nearest seed point44
4• Go back to Step 2, stop when the assignment does not change
- Partitional clustering approach

- Number of clusters, K, must be specified
- Each cluster is associated with a centroid (center point)
CLUSTERING
Evaluating K-means Clusters
. x is a data point in cluster C1 and mi is the representative point for cluster Ci

. Can show that mi corresponds to the center (mean) of the cluster
. Given two sets of clusters, we prefer the one with the smallest error
. One easy way to reduce SSE is to increase K, the number of clusters
. A good clustering with smaller K can have a lower SSE than a poor clustering with
higher K
WEAKNESSES OF K-MEANS CLUSTERING:
Works only for continuous numerical data.
Use K-modes for categorical data.
K-medoids is more flexible and handles a wider range of data.
Requires predefining the number of clusters.
Some methods can estimate the best.
Sensitive to noise and outliers.
Struggles with non-convex cluster shapes.

What are Association Rules?
They identify relationships, patterns, or correlations between items in datasets, like
transactions or databases. This method doesn’t predict but finds connections
between data.
Example Questions:
Which products are often bought together?
What do people with similar preferences buy or watch?
Applications:
Market Basket Analysis:
Example: "People who buy milk often buy cookies 60% of the time."
Recommender Systems:
Example: "Customers who bought item A also purchased item B."
Web Usage Patterns:
Example: "76% of users visiting page X click on link Y."
Market-Basket transactions:
Transactions Items
1 Bread, Milk
2 Bread, Diaper, Cheese, Eggs
3 Milk, Diaper, Cheese, Coke
4 Bread, Milk, Diaper, Cheese
5 Bread, Milk, Diaper, Coke
(Market-Basket five transactions)

Example of Association Rules
{Diaper} → {Beer}
{Milk, Bread} → {Eggs, Coke}
{Beer, Bread} → {Milk}
Itemset
A collection of one or more items.
Example: {Milk, Bread, Diaper}.
k-itemset
An itemset that contains k items.
Support count (σ)
Frequency of occurrence of an itemset.
Example: σ({Milk, Bread, Diaper}) = 2
Support
Fraction of transactions that contain an itemset.
Example: s({Milk, Bread, Diaper}) = 2/5
Association Rules
Frequent Itemset
An itemset whose support is greater than or equal to a minsup threshold.
Association Rule
Find all the rules of the form A → B, where A and B are itemsets.
With minimum confidence and support.
Example: {Milk, Diaper} → {Cheese}
Rule Evaluation Metrics

{Milk, Diaper} ⇒ {Cheese}
Support (s):
Denotes the frequency of the rule within transactions.
Support (s): Fraction of transactions that contain both A and B (P(A Union B)) / T
Confidence (c):
Denotes the percentage of transactions containing A which also contain B.
Confidence(A ⇒ B [s, c]) = P(B|A) = P(A ∪ B) / P(A) = support({A, B}) / support({A}).

Apriori Algorithm Simplified
The Apriori algorithm is widely used for association rule mining in datasets, particularly
for tasks like Market Basket Analysis (e.g., identifying items frequently bought together).
Key Concepts:
1. Frequent Itemset: A group of items frequently appearing together in transactions, meeting a

minimum support threshold.
2. Support: The percentage of transactions that include a specific item or itemset.
3. Apriori Property:
- A subset of a frequent itemset is always frequent.
- This principle reduces the search space, iteratively growing itemsets from size 1 to larger sets.
4. Confidence: The percentage of transactions containing item X that also include item Y. Used to
generate association rules (e.g., "If X, then Y").
Purpose:
The algorithm identifies relationships and patterns in transactional data,

enabling actionable insights like recommendations or targeted marketing.
Classification
Classification predicts categorical class labels (discrete or nominal). Classifies data
(constructs a model) based on the training set and the values (class labels) in a
classifying attribute and uses it in classifying new data.
Classification typical tasks

Credit/loan approval
Medical diagnosis
Fraud detection
Web page categorization
Task Attribute set, x Class label, y
Categorizing email Features extracted from email spam or non-

messages message header and content spam
Identifying tumor malignant or

Features extracted from MRI scans
cells benign cells
Elliptical, spiral,
Features extracted from telescope
Cataloging galaxies or irregular-
images
shaped galaxies
Model construction :
Describing a set of predetermined classes. Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute. The set of tuples used for model construction is the training set. The
model is represented as classification rules, decision trees, or mathematical formulae.
Model usage: Classifying future or unknown objects
Estimate accuracy of the model

The known label of test sample is compared with the classified result from the model. Accuracy rate is the
percentage of test set samples that are correctly classified by the model. Test set is independent of training
set (otherwise overfitting). If the accuracy is acceptable, use the model to classify new data.
Decision Tree classifier is used for classification where input variables can be continuous or
discrete and output is a tree that describes the decision flow
Leaf nodes return either a probability score, or simply a classification. Trees can be converted to a
set of "decision rules".
Basic Algorithm(Greedy Algorithm)
Algorithm Description:
- A decision tree is built top-down using a divide-and-conquer strategy.
- All training examples start at the root node.
- Attributes are categorized (continuous ones are converted into discrete categories
beforehand).
- Examples are split based on attributes chosen using a statistical measure (e.g.,
information gain).
Stopping Condition:
- Stop partitioning when all samples in a node belong to the same class.
- If no attributes remain for splitting, majority voting is used to classify the leaf.
- If no samples are left, the process stops.
Types of Decision Trees:
- Classification Trees: Used for categorical or binary outcomes, assigning class labels to observations.
- Regression Trees: Used for continuous outcomes (e.g., income), returning average values at each node.
Structure of Decision Trees:
- Leaf Nodes: Provide class labels or probability scores.
- Branches: Represent decision outcomes, shown as connecting lines.
- Internal Nodes: Test points based on a single variable or attribute.
-Leaf nodes, located at the ends of the tree branches, represent the final outcomes of all
prior decisions.
They indicate the class labels or the segment where all observations following
that path belong.
Advantages :
Inexpensive to construct
Extremely fast at classifying unknown records
Easy to interpret for small-sized trees
Robust to noise (especially when methods to avoid overfitting are

employed)
Can easily handle redundant or irrelevant attributes (unless the attributes

are interacting)
Disadvantages :
Space of possible decision trees is exponentially large.
Greedy approaches are often unable to find the best tree.
Does not take into account interactions between attributes.
Each decision boundary involves only a single attribute.

Thank
You

BA_

Uploaded by

Copyright:

Available Formats

BA_

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BA_

Uploaded by

Copyright:

Available Formats

PRESENTATION

We present in this chapter K-means clustering, Apriori algorithm for Association

I want to group items by similarity. I want to find structure in

I want to determine the relationship between the outcome and

Naïve Bayes, Decision

I want to find the structure in a temporal process. I want to Time Series

Biology: Identifying clusters of genes or proteins with similar functions or

Information Retrieval: Grouping documents by topic or clustering similar search

Marketing: Segmenting customers based on purchasing behavior, preferences,

Earthquake Studies: Mapping earthquake epicenters to identify fault lines or

-Given k, the k-means algorithm is implemented in four steps:

- Partitional clustering approach

. x is a data point in cluster C1 and mi is the representative point for cluster Ci

Use K-modes for categorical data.

K-medoids is more flexible and handles a wider range of data.

Requires predefining the number of clusters.

Some methods can estimate the best.

Sensitive to noise and outliers.

Struggles with non-convex cluster shapes.

2 Bread, Diaper, Cheese, Eggs

3 Milk, Diaper, Cheese, Coke

4 Bread, Milk, Diaper, Cheese

5 Bread, Milk, Diaper, Coke

(Market-Basket five transactions)

Rule Evaluation Metrics

Confidence(A ⇒ B [s, c]) = P(B|A) = P(A ∪ B) / P(A) = support({A, B}) / support({A}).

1. Frequent Itemset: A group of items frequently appearing together in transactions, meeting a

The algorithm identifies relationships and patterns in transactional data,

Classification typical tasks

Categorizing email Features extracted from email spam or non-

Identifying tumor malignant or

Estimate accuracy of the model

Structure of Decision Trees:

- Leaf Nodes: Provide class labels or probability scores.

- Branches: Represent decision outcomes, shown as connecting lines.

- Internal Nodes: Test points based on a single variable or attribute.

Extremely fast at classifying unknown records

Easy to interpret for small-sized trees

Robust to noise (especially when methods to avoid overfitting are

Can easily handle redundant or irrelevant attributes (unless the attributes

Space of possible decision trees is exponentially large.

Greedy approaches are often unable to find the best tree.

Does not take into account interactions between attributes.

Each decision boundary involves only a single attribute.

You might also like