0% found this document useful (0 votes)

6 views

Partitioning Algorithms

Uploaded by

Pradeep ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Partitioning Algorithms

Uploaded by

Pradeep ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Mining

(19ADOCN1001)
Mr.M.VijayaKumar, AP/AI&DS

19ADCN1303 - Data Mining 1

Course Outcomes

CO4: Classify
data for the given dataset using real world
applications.

19ADCN1303 - Data Mining 2

UNIT IV – Classification and Clustering
Classification: Basic Concepts - Decision Tree
Induction – Bayes Classification Methods – Rule
Based Classification – K-Nearest-Neighbor
Classifier - Model Evaluation and Selection –
Techniques to Improve Classification Accuracy.
Cluster Analysis: Basic Concepts and Methods-
Cluster Analysis - Partitioning Methods -
Hierarchical Methods - Density-Based Methods -
Grid-Based Methods.

19ADCN1303 - Data Mining 3

Partitioning Algorithms: Basic Concept
• Partitioning method: Partitioning a database D of n objects into a set of k
clusters, such that the sum of squared distances is minimized (where ci is
the centroid or medoid of cluster Ci)

E  ik1 pCi ( p  ci ) 2

• Given k, find a partition of k clusters that optimizes the chosen

partitioning criterion
• Global optimal: exhaustively enumerate all partitions
• Heuristic methods: k-means and k-medoids algorithms
• k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented
by the center of the cluster
• k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects in
the cluster
Data Mining 4
The K-Means Clustering Method
• Given k, the k-means algorithm is implemented in four
steps:
• Partition objects into k nonempty subsets
• Compute seed points as the centroids of the clusters
of the current partitioning (the centroid is the center,
i.e., mean point, of the cluster)
• Assign each object to the cluster with the nearest
seed point
• Go back to Step 2, stop when the assignment does not
change

19ADCN1303 - Data Mining 5

An Example of K-Means Clustering

19ADCN1303 - Data Mining 6

Comments on the K-Means Method
• Strength: Efficient: O(tkn), where n is # objects, k is # clusters, and t is
# iterations. Normally, k, t << n.
• Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
• Comment: Often terminates at a local optimal.
• Weakness
• Applicable only to objects in a continuous n-dimensional space
• Using the k-modes method for categorical data
• In comparison, k-medoids can be applied to a wide range of
data
• Need to specify k, the number of clusters, in advance (there are
ways to automatically determine the best k (see Hastie et al.,
2009)
• Sensitive to noisy data and outliers
• Not suitable to discover19ADCN1303
clusters- Data
with non-convex shapes 7
Mining
Variations of the K-Means Method
• Most of the variants of the k-means which differ in

• Selection of the initial k means

• Dissimilarity calculations

• Strategies to calculate cluster means

• Handling categorical data: k-modes

• Replacing means of clusters with modes

• Using new dissimilarity measures to deal with categorical

objects
• Using a frequency-based method to update modes of clusters

• A mixture of categorical and numerical data: k-prototype method

19ADCN1303 - Data Mining 8

What Is the Problem of the K-Means
Method?
• The k-means algorithm is sensitive to outliers !

• Since an object with an extremely large value may substantially

distort the distribution of the data

• K-Medoids: Instead of taking the mean value of the object in a cluster as

a reference point, medoids can be used, which is the most centrally
located object in a cluster

19ADCN1303 - Data Mining 9

PAM: A Typical K-Medoids Algorithm

19ADCN1303 - Data Mining 10

The K-Medoid Clustering Method
• K-Medoids Clustering: Find representative objects (medoids) in clusters

• PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

• Starts from an initial set of medoids and iteratively replaces one

of the medoids by one of the non-medoids if it improves the total
distance of the resulting clustering
• PAM works effectively for small data sets, but does not scale well
for large data sets (due to the computational complexity)

• Efficiency improvement on PAM

• CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples

• CLARANS (Ng & Han, 1994): Randomized re-sampling

19ADCN1303 - Data Mining 11

Summary
• Partitioning Methods

19ADCN1303 - Data Mining 12

Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2012.

19ADCN1303 - Data Mining 13

Thank you

19ADCN1303 - Data Mining 14

Lecture 01-Weighted Residual Methods
100% (1)
Lecture 01-Weighted Residual Methods
33 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Lecture5 - Clustering (K Means and K Medoids)
No ratings yet
Lecture5 - Clustering (K Means and K Medoids)
36 pages
Lesson8 Clustering
100% (1)
Lesson8 Clustering
33 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
07-Clustering
No ratings yet
07-Clustering
54 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Lect3 Clustering
No ratings yet
Lect3 Clustering
86 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
Partitioning Methods
No ratings yet
Partitioning Methods
26 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
Clustering
No ratings yet
Clustering
32 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Clustering
No ratings yet
Clustering
25 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Clustering Partitioning Methods
No ratings yet
Clustering Partitioning Methods
20 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Clustering
No ratings yet
Clustering
24 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Pam Clustering Technique
No ratings yet
Pam Clustering Technique
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
2.10 Partitioning Methods - k-Means and k-Medoids
No ratings yet
2.10 Partitioning Methods - k-Means and k-Medoids
38 pages
Chap 19 - CLustering
No ratings yet
Chap 19 - CLustering
18 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Cluster
No ratings yet
Cluster
20 pages
Clustering
No ratings yet
Clustering
80 pages
Clustering_Deep_Dive
No ratings yet
Clustering_Deep_Dive
8 pages
Ijret 110306027
No ratings yet
Ijret 110306027
4 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
Clustering_notes
No ratings yet
Clustering_notes
29 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Unit 3
No ratings yet
Unit 3
58 pages
Clustering Partition Hierachy
No ratings yet
Clustering Partition Hierachy
58 pages
Week 11
No ratings yet
Week 11
49 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
CV UNIT 4
No ratings yet
CV UNIT 4
60 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Partitioning Methods
100% (1)
Partitioning Methods
3 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Bayesian Networks: An Introduction
From Everand
Bayesian Networks: An Introduction
Timo Koski
3/5 (1)
Algorithms Lab Ex 5
No ratings yet
Algorithms Lab Ex 5
8 pages
In-Place Sorting and Not-In-Place Sorting
No ratings yet
In-Place Sorting and Not-In-Place Sorting
23 pages
Modul Maximum Flow
No ratings yet
Modul Maximum Flow
6 pages
Linear Programming I - Part 3
No ratings yet
Linear Programming I - Part 3
15 pages
20MCA203 Design & Analysis of Algorithms Core 3 1 0 4: 3/2/1: High/Medium/Low
No ratings yet
20MCA203 Design & Analysis of Algorithms Core 3 1 0 4: 3/2/1: High/Medium/Low
7 pages
ET3005 Bab 6 Sem I 1718 Mhs
No ratings yet
ET3005 Bab 6 Sem I 1718 Mhs
45 pages
Lecture 3: Simple Sorting and Searching Algorithms: Data Structure and Algorithm Analysis
No ratings yet
Lecture 3: Simple Sorting and Searching Algorithms: Data Structure and Algorithm Analysis
39 pages
Terminologies of ANN
No ratings yet
Terminologies of ANN
3 pages
Full Download Introduction to Operations Research, 11e ISE Frederick S. Hillier PDF DOCX
100% (3)
Full Download Introduction to Operations Research, 11e ISE Frederick S. Hillier PDF DOCX
50 pages
NP Hard and NP Complete
No ratings yet
NP Hard and NP Complete
5 pages
B. A. H Eco. 26 Applied Econometrics Sem. 4 2014
No ratings yet
B. A. H Eco. 26 Applied Econometrics Sem. 4 2014
5 pages
PBD Using NL Analysis by G. H. Powell
No ratings yet
PBD Using NL Analysis by G. H. Powell
103 pages
Ch03 Manipulation of Simple Polynomials All
No ratings yet
Ch03 Manipulation of Simple Polynomials All
37 pages
Lesson 4 - ALGEBRAIC EXPRESSIONS
No ratings yet
Lesson 4 - ALGEBRAIC EXPRESSIONS
9 pages
6.Np Hard and Np Complete
No ratings yet
6.Np Hard and Np Complete
5 pages
Quick Sort
No ratings yet
Quick Sort
18 pages
Linear Programming: Sources: Quantitative Techniques by Sirug/Tabuloc
No ratings yet
Linear Programming: Sources: Quantitative Techniques by Sirug/Tabuloc
48 pages
Mathematics of Von Neumann Methods
No ratings yet
Mathematics of Von Neumann Methods
1 page
Machine Learning Questions and Answers For Interview
No ratings yet
Machine Learning Questions and Answers For Interview
20 pages
Quick Sort and Selection Sort
No ratings yet
Quick Sort and Selection Sort
8 pages
223 COE 292 FinalExam Concept
No ratings yet
223 COE 292 FinalExam Concept
17 pages
Latest Chapter 2 Polynomials
No ratings yet
Latest Chapter 2 Polynomials
3 pages
Mad Test 2
No ratings yet
Mad Test 2
24 pages
Binomial Worksheet
No ratings yet
Binomial Worksheet
4 pages
Uninformed and Informed Search Algorithms
No ratings yet
Uninformed and Informed Search Algorithms
33 pages
Instant Download Model Predictive Control: Theory, Computation, and Design, 2nd Edition Rawlings James B. PDF All Chapter
100% (5)
Instant Download Model Predictive Control: Theory, Computation, and Design, 2nd Edition Rawlings James B. PDF All Chapter
62 pages
De Novo Programming: Ha Thi Xuan Chi, PHD
No ratings yet
De Novo Programming: Ha Thi Xuan Chi, PHD
21 pages
Topic - Graph Algorithms
No ratings yet
Topic - Graph Algorithms
19 pages
E Mahesh PGT Mathematics
No ratings yet
E Mahesh PGT Mathematics
14 pages