0% found this document useful (0 votes)
12 views17 pages

BDA LabReport-9

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

BDA LabReport-9

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Name of the Student: Academic Year:

Bhavana Vovaldasu 2024 - 2025


Student Registration Number: Year & Term:
232P4R2043 2nd Year & 1st Term
Study Level: Class & Section: MCA-DS-B
Name of the Course: Name of the Instructor:
Big Data Analytics Priyanka Guptha

Name of the Assessment: Date of Submission:


Lab Report - 9 28 December 2024

Lab Report - 9

Introduction

I successfully applied the K-Means clustering algorithm to group


customers based on their income and spending patterns. This
approach allowed for a deeper understanding of customer behavior
and enabled the identification of similar groups for targeted
marketing. By utilizing the Elbow Method, I determined the optimal
number of clusters, ensuring accurate segmentation. Additionally,
visualizing the clusters made the results easier to interpret, offering
clear insights into customer segmentation. This analysis provides
businesses with the ability to design personalized offers, improve
customer experiences, and enhance decision-making.

Apriori Algorithm

The Apriori Algorithm is a simple method used to find items that are
often bought together in shopping data. It looks for patterns in
transactions to find groups of items that appear frequently.

What is Support?

 Support means how often an item or a group of items is bought


in all the transactions.

For example, the algorithm might show that people often buy "Milk
and Bread" together.

Association Rules

Association rules explain the connection between items that are


frequently bought together.

Parts of a Rule:

1. Antecedent: The item(s) people buy first (e.g., Milk).


2. Consequent: The item(s) people buy along with the first item
(e.g., Bread).
How Do We Measure Rules?

 Support: How often the items appear together in transactions.


 Confidence: How likely it is that people buy the second item
when they buy the first.
 Lift: How much more likely it is for the second item to be
bought with the first, compared to random chance.

Visualization

Scatter plots are used to show the results. They make it easy to spot
the best rules with high support and confidence.

Why is the Apriori Algorithm Useful?

1. Understand Shopping Patterns: Helps businesses learn what


customers like to buy together.
2. Recommend Products: Suggests items to customers, like
Butter to someone buying Bread.
3. Better Inventory Planning: Helps store owners keep popular
combinations of items in stock.

This algorithm makes it easier for businesses to improve sales, offer


better recommendations, and manage stock more effectively.
1. Importing Tools

We need some tools to help us:

 pandas: To organize and analyze the data.


 mlxtend.frequent_patterns: To run the Apriori Algorithm
and find rules.
 matplotlib.pyplot: To make charts and show the results.

2. Making a Dataset

We create a small dataset of transactions:


 Each transaction has an ID and a list of items bought.
 Example:

o Transaction 1: Milk, Bread, Butter

3. Preparing the Data

We turn the data into a table:

 Each row is a transaction.


 Each column is an item.
 If an item is bought, we mark it as 1; if not, we mark it as 0.

4. Running the Apriori Algorithm

The Apriori Algorithm looks for items often bought together:

 It only includes item groups that appear in at least 40% of the


transactions.

5. Finding Rules

The algorithm creates rules to show how items are related:

 Confidence: Measures how often one item is bought when


another is bought.
 Only rules with confidence above 70% are shown.

6. Showing Results

The results include:

 Frequent Item Groups: Lists of items bought together often.


 Rules: Explains how buying one item predicts another.

7. Making a Chart

We create a scatter plot:

 Support: How often items are bought together.


 Confidence: How likely one item is bought with another.
The chart helps us see the best rules.
1. Clustering (K-Means)

K-Means groups similar items together.

 Items in the same group (cluster) are alike.


 Items in different groups are not alike.
Example: Grouping kids in a class based on their favorite sports.

2. StandardScaler

StandardScaler makes sure all data is on the same scale.

 It adjusts numbers so they are fair to compare.

Example: Comparing heights in centimeters and weights in


kilograms. StandardScaler makes them easier to work with.

3. Elbow Method

The Elbow Method helps you find the right number of groups
(clusters).

 You draw a graph to see how many groups make sense.


 Look for the "elbow" (a bend in the graph). That tells you the
best number of groups.

Example: Deciding how many flavors of ice cream to sort into


categories based on taste.
1. Import Libraries

We use some tools in this code:

 pandas: To organize and work with data.


 KMeans: To group data into clusters.
 matplotlib.pyplot: To create charts to show results.
 StandardScaler: To make sure all data is on the same scale.

2. Create a Dataset

We create a small dataset to represent customer details:

 It has three columns:

1. Customer: The customer ID.


2. Annual Income (k$): How much the customer earns (in
thousands of dollars).
3. Spending Score (1-100): How much the customer spends,
scored between 1 and 100.

 This data is stored in a table using pandas.

3. Select Features

We only use two columns, Annual Income and Spending Score, for
clustering because these are the most relevant.

4. Normalize Features

The data is scaled using StandardScaler so that income and


spending score are treated equally.

 Without this step, larger numbers (like income) could dominate


the clustering.
 After scaling, the mean of both features is 0, and the range is
consistent.

5. Find the Best Number of Clusters (Elbow Method)

 The code tests different numbers of clusters (from 1 to 5).


 It calculates WCSS (a measure of how tightly the data points
in a cluster are grouped).
 A graph is created showing WCSS vs. the number of clusters.
 Look for the "elbow" in the graph (a sharp bend). This tells you
the best number of clusters.

6. Apply K-Means

 The K-Means algorithm is run with 2 clusters (based on the


elbow method).
 The algorithm groups the data and assigns a cluster to each
customer.
 These cluster labels are added as a new column, Cluster, in the
dataset.

7. Visualize the Clusters

A scatter plot is made to show the clusters:

 X-axis: Scaled Annual Income.


 Y-axis: Scaled Spending Score.
 The points are colored based on their cluster, making it easy to
see the groups.

8. Output the Results

Finally, the dataset with cluster assignments is printed.

Key Takeaways:
1. Standardization: Ensures that income and spending score
contribute equally to clustering.
2. Elbow Method: Helps pick the best number of clusters.
3. K-Means: Groups customers based on their income and
spending.
4. Visualization: Shows how well customers are grouped.

This code helps divide customers into groups based on their


spending and income habits. Businesses can use this to create
personalized offers or better marketing strategies.

You might also like