0% found this document useful (0 votes)

12 views17 pages

BDA LabReport-9

Uploaded by

bhavanavovaldasu157

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views17 pages

BDA LabReport-9

Uploaded by

bhavanavovaldasu157

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Name of the Student: Academic Year:

Bhavana Vovaldasu 2024 - 2025

Student Registration Number: Year & Term:
232P4R2043 2nd Year & 1st Term
Study Level: Class & Section: MCA-DS-B
Name of the Course: Name of the Instructor:
Big Data Analytics Priyanka Guptha

Name of the Assessment: Date of Submission:

Lab Report - 9 28 December 2024

Lab Report - 9

Introduction

I successfully applied the K-Means clustering algorithm to group

customers based on their income and spending patterns. This
approach allowed for a deeper understanding of customer behavior
and enabled the identification of similar groups for targeted
marketing. By utilizing the Elbow Method, I determined the optimal
number of clusters, ensuring accurate segmentation. Additionally,
visualizing the clusters made the results easier to interpret, offering
clear insights into customer segmentation. This analysis provides
businesses with the ability to design personalized offers, improve
customer experiences, and enhance decision-making.

Apriori Algorithm

The Apriori Algorithm is a simple method used to find items that are
often bought together in shopping data. It looks for patterns in
transactions to find groups of items that appear frequently.

What is Support?

 Support means how often an item or a group of items is bought

in all the transactions.

For example, the algorithm might show that people often buy "Milk
and Bread" together.

Association Rules

Association rules explain the connection between items that are

frequently bought together.

Parts of a Rule:

1. Antecedent: The item(s) people buy first (e.g., Milk).

2. Consequent: The item(s) people buy along with the first item
(e.g., Bread).
How Do We Measure Rules?

 Support: How often the items appear together in transactions.

 Confidence: How likely it is that people buy the second item
when they buy the first.
 Lift: How much more likely it is for the second item to be
bought with the first, compared to random chance.

Visualization

Scatter plots are used to show the results. They make it easy to spot
the best rules with high support and confidence.

Why is the Apriori Algorithm Useful?

1. Understand Shopping Patterns: Helps businesses learn what

customers like to buy together.
2. Recommend Products: Suggests items to customers, like
Butter to someone buying Bread.
3. Better Inventory Planning: Helps store owners keep popular
combinations of items in stock.

This algorithm makes it easier for businesses to improve sales, offer

better recommendations, and manage stock more effectively.
1. Importing Tools

We need some tools to help us:

 pandas: To organize and analyze the data.

 mlxtend.frequent_patterns: To run the Apriori Algorithm
and find rules.
 matplotlib.pyplot: To make charts and show the results.

2. Making a Dataset

We create a small dataset of transactions:

 Each transaction has an ID and a list of items bought.
 Example:

o Transaction 1: Milk, Bread, Butter

3. Preparing the Data

We turn the data into a table:

 Each row is a transaction.

 Each column is an item.
 If an item is bought, we mark it as 1; if not, we mark it as 0.

4. Running the Apriori Algorithm

The Apriori Algorithm looks for items often bought together:

 It only includes item groups that appear in at least 40% of the

transactions.

5. Finding Rules

The algorithm creates rules to show how items are related:

 Confidence: Measures how often one item is bought when

another is bought.
 Only rules with confidence above 70% are shown.

6. Showing Results

The results include:

 Frequent Item Groups: Lists of items bought together often.

 Rules: Explains how buying one item predicts another.

7. Making a Chart

We create a scatter plot:

 Support: How often items are bought together.

 Confidence: How likely one item is bought with another.
The chart helps us see the best rules.
1. Clustering (K-Means)

K-Means groups similar items together.

 Items in the same group (cluster) are alike.

 Items in different groups are not alike.
Example: Grouping kids in a class based on their favorite sports.

2. StandardScaler

StandardScaler makes sure all data is on the same scale.

 It adjusts numbers so they are fair to compare.

Example: Comparing heights in centimeters and weights in

kilograms. StandardScaler makes them easier to work with.

3. Elbow Method

The Elbow Method helps you find the right number of groups
(clusters).

 You draw a graph to see how many groups make sense.

 Look for the "elbow" (a bend in the graph). That tells you the
best number of groups.

Example: Deciding how many flavors of ice cream to sort into

categories based on taste.
1. Import Libraries

We use some tools in this code:

 pandas: To organize and work with data.

 KMeans: To group data into clusters.
 matplotlib.pyplot: To create charts to show results.
 StandardScaler: To make sure all data is on the same scale.

2. Create a Dataset

We create a small dataset to represent customer details:

 It has three columns:

1. Customer: The customer ID.

2. Annual Income (k$): How much the customer earns (in
thousands of dollars).
3. Spending Score (1-100): How much the customer spends,
scored between 1 and 100.

 This data is stored in a table using pandas.

3. Select Features

We only use two columns, Annual Income and Spending Score, for
clustering because these are the most relevant.

4. Normalize Features

The data is scaled using StandardScaler so that income and

spending score are treated equally.

 Without this step, larger numbers (like income) could dominate

the clustering.
 After scaling, the mean of both features is 0, and the range is
consistent.

5. Find the Best Number of Clusters (Elbow Method)

 The code tests different numbers of clusters (from 1 to 5).

 It calculates WCSS (a measure of how tightly the data points
in a cluster are grouped).
 A graph is created showing WCSS vs. the number of clusters.
 Look for the "elbow" in the graph (a sharp bend). This tells you
the best number of clusters.

6. Apply K-Means

 The K-Means algorithm is run with 2 clusters (based on the

elbow method).
 The algorithm groups the data and assigns a cluster to each
customer.
 These cluster labels are added as a new column, Cluster, in the
dataset.

7. Visualize the Clusters

A scatter plot is made to show the clusters:

 X-axis: Scaled Annual Income.

 Y-axis: Scaled Spending Score.
 The points are colored based on their cluster, making it easy to
see the groups.

8. Output the Results

Finally, the dataset with cluster assignments is printed.

Key Takeaways:
1. Standardization: Ensures that income and spending score
contribute equally to clustering.
2. Elbow Method: Helps pick the best number of clusters.
3. K-Means: Groups customers based on their income and
spending.
4. Visualization: Shows how well customers are grouped.

This code helps divide customers into groups based on their

spending and income habits. Businesses can use this to create
personalized offers or better marketing strategies.

CircuitAnalysisforCompleteIdiotsbyDavidSmith 1 PDF
No ratings yet
CircuitAnalysisforCompleteIdiotsbyDavidSmith 1 PDF
183 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Products 2019: Division Marine
No ratings yet
Products 2019: Division Marine
33 pages
Operations MGT Module #2
100% (1)
Operations MGT Module #2
4 pages
Red Black Trees: Erin Keith
No ratings yet
Red Black Trees: Erin Keith
24 pages
UNIT 5
No ratings yet
UNIT 5
38 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Objectives of Clustering
No ratings yet
Objectives of Clustering
3 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
FML - |
No ratings yet
FML - |
18 pages
ARM and Clustering
No ratings yet
ARM and Clustering
79 pages
AAM UNIT 4 QB WITH ANSWER
No ratings yet
AAM UNIT 4 QB WITH ANSWER
11 pages
Module 3
No ratings yet
Module 3
6 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
End To End Machine Learning Problem
No ratings yet
End To End Machine Learning Problem
20 pages
Predictive Analysis 5
No ratings yet
Predictive Analysis 5
8 pages
bone suplement market segmentation
No ratings yet
bone suplement market segmentation
20 pages
DataMining (4)
No ratings yet
DataMining (4)
39 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
11 pages
DATA MINING EX1
No ratings yet
DATA MINING EX1
10 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
Mastering Python For Data Science - Sample Chapter
71% (7)
Mastering Python For Data Science - Sample Chapter
24 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
DA_EXP_10
No ratings yet
DA_EXP_10
6 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Data Mining - Assignment: Girish Nayak
100% (1)
Data Mining - Assignment: Girish Nayak
21 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
S-6
No ratings yet
S-6
5 pages
soal try out UN Fis
No ratings yet
soal try out UN Fis
6 pages
Solve These
No ratings yet
Solve These
7 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
UNIT II-Segmentation, Positioning, And Product Optimization
No ratings yet
UNIT II-Segmentation, Positioning, And Product Optimization
48 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Clusteranalysisanddatamining PDF
100% (1)
Clusteranalysisanddatamining PDF
333 pages
Cluster Analysis and Data Mining
100% (1)
Cluster Analysis and Data Mining
333 pages
DATA MINING
No ratings yet
DATA MINING
44 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
200 pages
MGM3165 CHAPTER 16 17
No ratings yet
MGM3165 CHAPTER 16 17
21 pages
ifferent methods of clustering
No ratings yet
ifferent methods of clustering
8 pages
Ai Word Document Session 2 Detailed Exaple
No ratings yet
Ai Word Document Session 2 Detailed Exaple
15 pages
UNIT 4
No ratings yet
UNIT 4
42 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
Unit no 3
No ratings yet
Unit no 3
10 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Unit II Final
No ratings yet
Unit II Final
152 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
DWM PT 2 QB Soln
No ratings yet
DWM PT 2 QB Soln
8 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
ashageri assignment
No ratings yet
ashageri assignment
13 pages
Mining_Frequent_Patterns_and_Data_Mining_Topics_Cleaned
No ratings yet
Mining_Frequent_Patterns_and_Data_Mining_Topics_Cleaned
3 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
21 pages
Dmbi Ia2 Ans
No ratings yet
Dmbi Ia2 Ans
17 pages
Unit 3
No ratings yet
Unit 3
58 pages
Unit3_Datamining
No ratings yet
Unit3_Datamining
5 pages
Final_Code
No ratings yet
Final_Code
3 pages
Classification Clustering Overview
No ratings yet
Classification Clustering Overview
7 pages
Clustering Analysis: Reading The Data
100% (1)
Clustering Analysis: Reading The Data
15 pages
Outmarket the Competition: Advanced Marketing Tactics to Drive Growth and Profitability
From Everand
Outmarket the Competition: Advanced Marketing Tactics to Drive Growth and Profitability
Nick Doyle
No ratings yet
A Manual for Agribusiness Value Chain Analysis in Developing Countries
From Everand
A Manual for Agribusiness Value Chain Analysis in Developing Countries
Benjamin Dent
No ratings yet
3A1 Questions Bank
No ratings yet
3A1 Questions Bank
118 pages
All India Radio
No ratings yet
All India Radio
38 pages
03 JB Paper
No ratings yet
03 JB Paper
9 pages
Step To Remove Ads by PanCafe Pro Permanently and Safely
0% (1)
Step To Remove Ads by PanCafe Pro Permanently and Safely
3 pages
BMM Report 1
No ratings yet
BMM Report 1
28 pages
Hanger of Vibrating Feeder
No ratings yet
Hanger of Vibrating Feeder
7 pages
ESF Number: K2 Year 2 Year 4 Year 6
No ratings yet
ESF Number: K2 Year 2 Year 4 Year 6
6 pages
Modeling_of_Photovoltaic_Inverter_Losses_for_Reactive_Power_Provision
No ratings yet
Modeling_of_Photovoltaic_Inverter_Losses_for_Reactive_Power_Provision
11 pages
IM-7 Service Air Compressor PDF
No ratings yet
IM-7 Service Air Compressor PDF
146 pages
Weldability and Corrosion Studies of Aisi 316L Electropolished Tubing
100% (1)
Weldability and Corrosion Studies of Aisi 316L Electropolished Tubing
11 pages
Comparison of Roller Press With VRM For Slag Grinding: Dr. Stefan Seemann, Dr. York Reichardt Humboldt Wedag GMBH
No ratings yet
Comparison of Roller Press With VRM For Slag Grinding: Dr. Stefan Seemann, Dr. York Reichardt Humboldt Wedag GMBH
19 pages
Designing A Smart Honey Supply Chain For Sustainable Development
No ratings yet
Designing A Smart Honey Supply Chain For Sustainable Development
12 pages
6 Lessons To Help You Find Trading Opportunities in Any Market
No ratings yet
6 Lessons To Help You Find Trading Opportunities in Any Market
9 pages
Genesys Uv Vis Spectrophotometer Accessories Brochure BR52302
No ratings yet
Genesys Uv Vis Spectrophotometer Accessories Brochure BR52302
12 pages
Error Propagation Primer
No ratings yet
Error Propagation Primer
4 pages
9wall Mounted Ultrasonic Flow Meter User Manual
No ratings yet
9wall Mounted Ultrasonic Flow Meter User Manual
37 pages
Software Development Process Model
No ratings yet
Software Development Process Model
3 pages
Running Another Program With Delphi
No ratings yet
Running Another Program With Delphi
3 pages
AIX Working With Open Firmware
No ratings yet
AIX Working With Open Firmware
4 pages
BAIS 4103-Chapter Three
No ratings yet
BAIS 4103-Chapter Three
36 pages
23 Profiling and Performance Improvement
No ratings yet
23 Profiling and Performance Improvement
25 pages
Timetable M 2024 - UG1
No ratings yet
Timetable M 2024 - UG1
2 pages
Conveyor Monitoring System Brochure
No ratings yet
Conveyor Monitoring System Brochure
11 pages
Fault Detection and Localization in Empty Water Bottles Through Machine Vision
No ratings yet
Fault Detection and Localization in Empty Water Bottles Through Machine Vision
5 pages
TomCat Load Balancing-Windows
No ratings yet
TomCat Load Balancing-Windows
8 pages
HC Series Product Brochure V1.2 20230927
No ratings yet
HC Series Product Brochure V1.2 20230927
12 pages