0% found this document useful (0 votes)

15 views33 pages

CLIQUE Algorithm

Uploaded by

Manish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views33 pages

CLIQUE Algorithm

Uploaded by

Manish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CLIQUE: A Dimension-Growth

Subspace Clustering Method

First dimension growth subspace clustering algorithm
Clustering starts at single-dimensionsubspace and
move upwards towards higher dimension subspace
This algorithm can be viewed as the integration
Density based and Grid based algorithm
Automatically identifying subspaces of a high dimensional data
space that allow better clustering than original space
CLIQUE can be considered as both density-based and grid-based
It partitions each dimension into the same number of equal length intervals
It partitions an m-dimensional data space into non-overlapping rectangular
units
A unit is dense if the fraction of total data points contained in the unit
exceeds the input model parameter
A cluster is a maximal set of connected dense units within a subspace
Definitions That Need to Be Known
Unit : After forming a grid structure on
the space, each rectangular cell is
called a Unit.
Dense: A unit is dense, if the fraction of
total data points contained in the
unit exceeds the input model
parameter.
Cluster: A cluster is defined as a maximal
set of connected dense units.
Informal problem statement
Given a large set of multidimensional data points, the
data space is usually not uniformly occupied by the
datapoints.
CLIQUE s clustering identifies the sparse and the
areas in space (or units), thereby
discovering the overall distribution patterns of the
dataset.
A unit is dense if the fraction of total data points
contained in it exceedsan input model parameter.
In CLIQUE, a cluster is defined as a maximal set of
connected denseunits.
Formal Problem Statement
Let A= {A1, A2, . . . , Ad } be a set of bounded, totally
ordered domains and S = A1 A2 · · · Ad a d-
dimensional numericalspace.
We will refer to A1, . . . , Ad as the dimensions
(attributes) of S.
The input consists of a set of d-dimensional points V =
{v1, v2, . . . ,vm}
Wherevi = vi1, vi2, . . . , vid . The j th componentof vi is
drawn from domainAj .
The CLIQUE Algorithm (cont.)
3.
The minimal description of a cluster C, produced by the above
procedure, is the minimum possible union of hyperrectangular regions.

For example
A B is the minimum cluster description of the shaded region.
C D E is a non-minimal cluster description of the same region.

28
Clique Working
2 Step Process

1st step Partitioning the d- dimensional data space

2nd step- Generates the minimal descriptionof each

cluster.
1st step- Partitioning
Partitioning is done for each dimension.
Example
The subspaces representing these dense units are
intersected to form a candidate search space in which
dense units of higher dimensionality may exist.
This approach of selecting candidates is quite similar
to Apiori Gen process of generating candidates.
Here it is expected that if some thing is dense in
higherdimensional space it cant be sparse in lower
dimensionstate.
More formally
If a k-dimensional unit is dense, then so are its projections
in (k-1)-dimensional space.
Given a k-dimensional candidate dense unit, if any of
(k-1)th projection unit is not dense then kth dimensional
unitcannot be dense
So,we can generate candidate dense units in k-dimensional
space from the dense units found in (k-1)-dimensional
space
The resulting space searched is much smaller than the
original space.
The dense units are then examined in order to determine
theclusters.
Intersection

Dense units found with respect to age for the dimensions salary
and vacation are intersected in order to provide a candidate
search space for dense units of higher dimensionality.
2nd stage- Minimal Description
For each cluster, Clique determines the maximal
region that covers the cluster of connected dense units.
It then determines a minimal cover (logic description)
foreach cluster.
Effectiveness of Clique-
CLIQUE automatically finds subspaces of the highest
dimensionalitysuch that high-density clusters exist in
thosesubspaces.
It is insensitive to the orderof input objects
It scales linearly with the size of input
Easily scalable with the number of dimensions in the
data
GRID-BASED CLUSTERING
METHODS

This is the approach in which we

quantize space into a finite number of
cells that form a grid structure on which
all of the operations for clustering is
performed.

So, for example assume that we have a

set of records and we want to cluster with
respect to two attributes, then, we divide
the related space (plane), into a grid
structure and then we find the clusters.
Salary (10,000)

7 plane

20 30 40 50 60
Age
Techniques for Grid-Based Clustering

The following are some techniques

that are used to perform Grid-Based
Clustering:
CLIQUE (CLustering In QUest.)
STING (STatistical Information Grid.)
WaveCluster
Looking at CLIQUE as an Example

CLIQUE is used for the clustering of high-

dimensional data present in large tables.
By high-dimensional data we mean
records that have many attributes.

CLIQUE identifies the dense units in the

subspaces of high dimensional data
space, and uses these subspaces to
provide more efficient clustering.
How Does CLIQUE Work?

Let us say that we have a set of records

that we would like to cluster in terms of
n-attributes.
So, we are dealing with an n-
dimensional space.
MAJOR STEPS :
CLIQUE partitions each subspace that has
dimension 1 into the same number of equal
length intervals.
Using this as basis, it partitions the n-
dimensional data space into non-overlapping
rectangular units.
CLIQUE: Major Steps (Cont.)
-
dimensional units.
It does this in the following way:
CLIQUE finds dense units of higher
dimensionality by finding the dense units in the
subspaces.
So, for example if we are dealing with a 3-
dimensional space, CLIQUE finds the dense
units in the 3 related PLANES (2-dimensional
subspaces.)
It then intersects the extension of the
subspaces representing the dense units to
form a candidate search space in which dense
units of higher dimensionality would exist.
CLIQUE: Major Steps. (Cont.)

Each maximal set of connected dense units is

considered a cluster.
Using this definition, the dense units in the
subspaces are examined in order to find
clusters in the subspaces.
The information of the subspaces is then used
to find clusters in the n-dimensional space.
It must be noted that all cluster boundaries are
either horizontal or vertical. This is due to the
nature of the rectangular grid cells.
Example for CLIQUE
Let us say that we want to cluster a set
of records that have three attributes,
namely, salary, vacation and age.
The data space for the this data would
be 3-dimensional.
vacation

age

salary
Example (Cont.)

After plotting the data objects,

each dimension, (i.e., salary,
vacation and age) is split into
intervals of equal length.
Then we form a 3-dimensional grid
on the space, each unit of which
would be a 3-D rectangle.
Now, our goal is to find the dense
3-D rectangular units.
Example (Cont.)

To do this, we find the dense units

of the subspaces of this 3-d space.
So, we find the dense units with
respect to age for salary. This
means that we look at the salary-
age plane and find all the 2-D
rectangular units that are dense.
We also find the dense 2-D
rectangular units for the vacation-
age plane.
Example 1
Example (Cont.)

Now let us try to visualize the

dense units of the two planes on the
following 3-d figure :
Example (Cont.)
We can extend the dense areas in the
vacation-age plane inwards.
We can extend the dense areas in the
salary-age plane upwards.
The intersection of these two spaces
would give us a candidate search space in
which 3-dimensional dense units exist.
We then find the dense units in the
salary-vacation plane and we form an
extension of the subspace that represents
these dense units.
Example (Cont.)
Now, we perform an intersection of
the candidate search space with the
extension of the dense units of the
salary-vacation plane, in order to
get all the 3-d dense units.
So, What was the main idea?
We used the dense units in
subspaces in order to find the dense
units in the 3-dimensional space.
After finding the dense units, it is
very easy to find clusters.
Reflecting upon CLIQUE
Why does CLIQUE confine its search for
dense units in high dimensions to the
intersection of dense units in subspaces?
Because the Apriori property employs
prior knowledge of the items in the search
space so that portions of the space can be
pruned.
The property for CLIQUE says that if a k-
dimensional unit is dense then so are its
projections in the (k-1) dimensional
space.
Strength and Weakness of CLIQUE
Strength
It automatically finds subspaces of the highest
dimensionality such that high density clusters exist in
those subspaces.
It is quite efficient.
It is insensitive to the order of records in input and
does not presume some canonical data distribution.
It scales linearly with the size of input and has good
scalability as the number of dimensions in the data
increases.
Weakness
The accuracy of the clustering result may be
degraded at the expense of simplicity of the simplicity
of this method.
Partition the data space and find the number of points
that lie inside each cell of the partition.
Identify the subspaces that contain clusters using the
Apriori principle
Identify clusters:
Determine dense units in all subspaces of interests
Determine connected dense units in all subspaces of interests.
Generate minimal description for the clusters
Determine maximal regions that cover a cluster of connected
dense units for each cluster
Determination of minimal cover for each cluster
Salary
(10,000)
=3

0 1 2 3 4 5 6 7
20
30
40
50

Vacation
60
age

30
Vacation
(week)

50
0 1 2 3 4 5 6 7
20
30
40

age
50
60
age
CLIQUE
Strength
It automatically finds subspaces of the highest
dimensionality such that high density clusters exist in those
subspaces
It is insensitive to the order of records in input and does not
presume some canonical data distribution
It scales linearly with the size of input and has good
scalability as the number of dimensions in the data
increases
Weakness
The accuracy of the clustering result may be degraded at
the expense of simplicity of the method

Resouce Guide The Giver
No ratings yet
Resouce Guide The Giver
46 pages
Paper-2 Clustering Algorithms in Data Mining A Review
No ratings yet
Paper-2 Clustering Algorithms in Data Mining A Review
7 pages
Schedule D SAFETY, HEALTH AND ENVIRONMENTAL REQUIREMENTS
100% (1)
Schedule D SAFETY, HEALTH AND ENVIRONMENTAL REQUIREMENTS
26 pages
CLIQUE Algorithm Grid-Based Subspace Clustering
No ratings yet
CLIQUE Algorithm Grid-Based Subspace Clustering
10 pages
Presentation On Clustering High Dimensional Data
No ratings yet
Presentation On Clustering High Dimensional Data
10 pages
Clustering High Dimensional Data
No ratings yet
Clustering High Dimensional Data
15 pages
Overview of CLIQUE Algorithm
No ratings yet
Overview of CLIQUE Algorithm
2 pages
Grid-Based Learning
No ratings yet
Grid-Based Learning
2 pages
Advanced Cluster Analysis: Clustering High-Dimensional Data
No ratings yet
Advanced Cluster Analysis: Clustering High-Dimensional Data
49 pages
Unit-4th Question-Bank Solution
No ratings yet
Unit-4th Question-Bank Solution
52 pages
CLIQUE and PROCLUS
0% (1)
CLIQUE and PROCLUS
13 pages
Unit 4 - Part 2
No ratings yet
Unit 4 - Part 2
45 pages
Density & Grid Based Clustering
100% (1)
Density & Grid Based Clustering
21 pages
Saba DM
No ratings yet
Saba DM
8 pages
Clustering Evaluation
No ratings yet
Clustering Evaluation
13 pages
ML 8
No ratings yet
ML 8
5 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
DM Unit-4 Part1
No ratings yet
DM Unit-4 Part1
21 pages
DWDM 5
No ratings yet
DWDM 5
12 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
DS143 Group 13 Presentation-1
No ratings yet
DS143 Group 13 Presentation-1
27 pages
Clustering
No ratings yet
Clustering
12 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
Clustering Part 1
No ratings yet
Clustering Part 1
12 pages
BD Unit 3
No ratings yet
BD Unit 3
27 pages
Sample Doc Final
No ratings yet
Sample Doc Final
21 pages
DM 4
No ratings yet
DM 4
76 pages
L10
No ratings yet
L10
55 pages
Lecture 6 - Clustering
No ratings yet
Lecture 6 - Clustering
25 pages
Clustering
No ratings yet
Clustering
65 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
DA Unit II
No ratings yet
DA Unit II
21 pages
Unit 4
No ratings yet
Unit 4
2 pages
DMDW Unit-5
No ratings yet
DMDW Unit-5
21 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
Machine Learning 5th Unit
No ratings yet
Machine Learning 5th Unit
12 pages
Cluster Analysis
No ratings yet
Cluster Analysis
27 pages
K Medoids
No ratings yet
K Medoids
101 pages
V3i206 PDF
No ratings yet
V3i206 PDF
5 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
DataMining Chapter4
No ratings yet
DataMining Chapter4
10 pages
Survey of Clustering Data Mining Techniques: Pavel Berkhin
100% (1)
Survey of Clustering Data Mining Techniques: Pavel Berkhin
56 pages
Duan2006 1 3
No ratings yet
Duan2006 1 3
3 pages
DM Lecture 26
No ratings yet
DM Lecture 26
17 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
DM UNIT-4 Part2
No ratings yet
DM UNIT-4 Part2
18 pages
Clustering High-Dimensional Data
No ratings yet
Clustering High-Dimensional Data
5 pages
Clustering Part2
No ratings yet
Clustering Part2
29 pages
Bcs602 ML Mod-5 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-5 Notes @vtunetwork
17 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Unsupe - Rvised Learning: Able T Understand and Prehend
No ratings yet
Unsupe - Rvised Learning: Able T Understand and Prehend
25 pages
S VD For Clustering
No ratings yet
S VD For Clustering
10 pages
Unit 5 Data Science
No ratings yet
Unit 5 Data Science
18 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
4699-Chu-201-Xme-201 Itp Weld R
No ratings yet
4699-Chu-201-Xme-201 Itp Weld R
5 pages
Admission To Foundation Program of Tianjin University 2025
No ratings yet
Admission To Foundation Program of Tianjin University 2025
5 pages
Lab Report of Strain Gauges and Load Cells PDF
No ratings yet
Lab Report of Strain Gauges and Load Cells PDF
11 pages
Syllabus For BPKMCH NEPAL
No ratings yet
Syllabus For BPKMCH NEPAL
7 pages
Roofing Thesis
100% (3)
Roofing Thesis
4 pages
(Ebook) Intelligent Materials by M. Shahinpoor, M. Shahinpoor, H-J. Schneider ISBN 9780854043354, 0854043357 PDF Download
No ratings yet
(Ebook) Intelligent Materials by M. Shahinpoor, M. Shahinpoor, H-J. Schneider ISBN 9780854043354, 0854043357 PDF Download
47 pages
Veterinary Wellness: Bien-Être Vétérinaire
No ratings yet
Veterinary Wellness: Bien-Être Vétérinaire
2 pages
Catalogue - Contact Rivets
No ratings yet
Catalogue - Contact Rivets
10 pages
AP - Suggestion PDF - SSC 2025 EPL - Class 7 - English 2nd
No ratings yet
AP - Suggestion PDF - SSC 2025 EPL - Class 7 - English 2nd
14 pages
Region Division District School Name School Id School Year
No ratings yet
Region Division District School Name School Id School Year
5 pages
Deep Learning For Predicting The Occurrence of Tipping Points
No ratings yet
Deep Learning For Predicting The Occurrence of Tipping Points
41 pages
NCM 103 Fundamentals
No ratings yet
NCM 103 Fundamentals
6 pages
MCB 202 (Lecture 1)
No ratings yet
MCB 202 (Lecture 1)
12 pages
SDO Navotas SHS DISS FirstSem FV
No ratings yet
SDO Navotas SHS DISS FirstSem FV
100 pages
Capriles y Albarracin-Jordan 2012 - Earliest Human Occupations in Bolivia
No ratings yet
Capriles y Albarracin-Jordan 2012 - Earliest Human Occupations in Bolivia
14 pages
CIVWARE Lecture Topic 5.1 (Water Treatment) PDF
No ratings yet
CIVWARE Lecture Topic 5.1 (Water Treatment) PDF
19 pages
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
No ratings yet
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
20 pages
Introduction To Geography - Week 1 First Week: Physical Geography, Our World. Media
No ratings yet
Introduction To Geography - Week 1 First Week: Physical Geography, Our World. Media
21 pages
An Analysis On Reflections by Angela Carter
100% (1)
An Analysis On Reflections by Angela Carter
2 pages
RSER PI Manuscript
No ratings yet
RSER PI Manuscript
29 pages
Chem 201 Experiment 5 - Lab Report
No ratings yet
Chem 201 Experiment 5 - Lab Report
3 pages
The Golden Ratio
No ratings yet
The Golden Ratio
2 pages
Diagram Fasa Dan Transisi-Baru-2023
No ratings yet
Diagram Fasa Dan Transisi-Baru-2023
23 pages
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
No ratings yet
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
5 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
Life in A Big City Essay
100% (2)
Life in A Big City Essay
8 pages
Single Line Graph
No ratings yet
Single Line Graph
14 pages
Importance of Leadership
No ratings yet
Importance of Leadership
2 pages

CLIQUE Algorithm

Uploaded by

CLIQUE Algorithm

Uploaded by

CLIQUE: A Dimension-Growth

Subspace Clustering Method

1st step Partitioning the d- dimensional data space

2nd step- Generates the minimal descriptionof each

This is the approach in which we

So, for example assume that we have a

The following are some techniques

CLIQUE is used for the clustering of high-

CLIQUE identifies the dense units in the

Let us say that we have a set of records

Each maximal set of connected dense units is

After plotting the data objects,

To do this, we find the dense units

Now let us try to visualize the

You might also like