BA_
BA_
BA_
Advanced Business
Analytics
1. Mohammed Shaban.
2. Mohammed Mahmoud.
3. Ziad Hamada.
4. Zahraa Sayed.
5. Radwa Ahmed.
6. Tasneem Mohamed-Sadek.
WHAT KIND OF PROBLEM I NEED TO SOLVE? AND
HOW TO SOLVE IT?
Association
I want to discover relationships between actions or items. Apriori
Rules
Regular expressions,
I want to analyze my text data. Text Analysis Document representation
(Bag of Words), TF-IDF
CLUSTERING
Clustering is an unsupervised learning technique that groups
similar objects into clusters based on patterns in the data.
Key Points:
1. Similarity vs. Distance: Measures how alike or different items
are.
2. Applications:
Grouping documents by topics.
Customer segmentation for targeted marketing.
3. Characteristics:
Creates cohesive, tight clusters.
Ensures groups are well-separated.
CLUSTERING TASKS AND APPLICATIONS
Clustering is widely used across various fields to uncover patterns:
City Planning: Categorizing houses or regions based on factors like type, value,
population density, or geographic location.
CLUSTERING
The K-Means Clustering
partitioning (the centroid is the center, i.e., mean point, of the cluster)
3• Assign each object to the cluster with the nearest seed point44
4• Go back to Step 2, stop when the assignment does not change
Example Questions:
Which products are often bought together?
What do people with similar preferences buy or watch?
Applications:
Market Basket Analysis:
Example: "People who buy milk often buy cookies 60% of the time."
Recommender Systems:
Example: "Customers who bought item A also purchased item B."
Web Usage Patterns:
Example: "76% of users visiting page X click on link Y."
Market-Basket transactions:
Transactions Items
1 Bread, Milk
Itemset
A collection of one or more items.
Example: {Milk, Bread, Diaper}.
k-itemset
An itemset that contains k items.
Support count (σ)
Frequency of occurrence of an itemset.
Example: σ({Milk, Bread, Diaper}) = 2
Support
Fraction of transactions that contain an itemset.
Example: s({Milk, Bread, Diaper}) = 2/5
Association Rules
Frequent Itemset
An itemset whose support is greater than or equal to a minsup threshold.
Association Rule
Find all the rules of the form A → B, where A and B are itemsets.
With minimum confidence and support.
Example: {Milk, Diaper} → {Cheese}
Support (s): Fraction of transactions that contain both A and B (P(A Union B)) / T
Confidence (c):
Denotes the percentage of transactions containing A which also contain B.
Key Concepts:
3. Apriori Property:
- A subset of a frequent itemset is always frequent.
- This principle reduces the search space, iteratively growing itemsets from size 1 to larger sets.
4. Confidence: The percentage of transactions containing item X that also include item Y. Used to
generate association rules (e.g., "If X, then Y").
Purpose:
Elliptical, spiral,
Features extracted from telescope
Cataloging galaxies or irregular-
images
shaped galaxies
Model construction :
Describing a set of predetermined classes. Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute. The set of tuples used for model construction is the training set. The
model is represented as classification rules, decision trees, or mathematical formulae.
Model usage: Classifying future or unknown objects
Decision Tree classifier is used for classification where input variables can be continuous or
discrete and output is a tree that describes the decision flow
Leaf nodes return either a probability score, or simply a classification. Trees can be converted to a
set of "decision rules".
Basic Algorithm(Greedy Algorithm)
Algorithm Description:
- A decision tree is built top-down using a divide-and-conquer strategy.
- All training examples start at the root node.
- Attributes are categorized (continuous ones are converted into discrete categories
beforehand).
- Examples are split based on attributes chosen using a statistical measure (e.g.,
information gain).
Stopping Condition:
- Stop partitioning when all samples in a node belong to the same class.
- If no attributes remain for splitting, majority voting is used to classify the leaf.
- If no samples are left, the process stops.
Types of Decision Trees:
- Classification Trees: Used for categorical or binary outcomes, assigning class labels to observations.
- Regression Trees: Used for continuous outcomes (e.g., income), returning average values at each node.
-Leaf nodes, located at the ends of the tree branches, represent the final outcomes of all
prior decisions.
They indicate the class labels or the segment where all observations following
that path belong.
Advantages :
Inexpensive to construct
Disadvantages :