Spectral Approach (BU)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Title of the Talk: - Spectral approach for tabular and graph data clustering

Speaker’s Name and Affiliation: - Dr. Debasis Mohapatra, Assistant professor, Dept. of CSE,
PMEC, BAM

Short-biography: - Debasis Mohapatra has received his Ph.D. degree in


computer science from Berhampur University, India. He is currently
working as an Assistant Professor in the Department of CSE, PMEC
Berhampur. He has received BPUT Faculty research award in the year
2022, UGC NET JRF award in the year 2012, and Vice-Chancellor silver
medal in M.Tech. in the year 2009. He has published 35 research papers
in reputed conferences and journals. His research interests include
graph anonymization, privacy preserving data publication, machine
learning, complex networks, software engineering, and e-governance.
He has acted as a reviewer for several reputed journals like Soft
Computing, IEEE Access, Scientific reports, Computers in Biology and
Medicine, etc.

Extended abstract of the talk


Clustering is an unsupervised learning approach that is used for finding out the similarity
among the data points/objects [1, 2, 3]. Application of clustering in higher dimensional data
leads to the problem of sparsity and it also increases time complexity. Hence, in such scenario
reduction in the dimensions is preferable. Spectral approach maps the high dimensional data
to a lower dimensional space where the clustering of the objects is non-sparse [4, 5].
Spectral clustering for tabular data:-
Spectral clustering is a type of clustering algorithm that uses the connectivity between the
data points to form the clustering. It uses eigenvalues and eigenvectors of the data matrix to
forecast the data into lower dimensions space to cluster the data points. It is based on the
idea of a graph representation of data where the data points are represented as nodes and
the similarity between the data points are represented by an edge [6].
The steps of the algorithm are:
1. Given a dataset of n points, the first step is to construct a similarity matrix, where the
entries of the matrix represent the pairwise similarity between the data points.
Common similarity measures include Euclidean distance, cosine similarity, and
Gaussian kernel similarity.
2. In next step, the similarity matrix is transformed to a graph representation. A Laplacian
matrix [1, 2], which is a measure of the connectivity between the data points is
obtained from the adjacency matrix. There are different types of Laplacian matrices
that can be used, such as non-normalized Laplacian, normalized Laplacian, and
symmetric normalized Laplacian.
3. The eigenvectors and eigenvalues of the Laplacian matrix are then computed. The
number of eigenvectors to be considered is a hyperparameter that needs to be tuned.
4. The eigenvectors are arranged into a matrix, and the rows of this matrix are used as
the new feature representations of the data points.
5. Finally, a clustering algorithm such as k-means is applied to the new feature
representations to obtain the final clustering.
Spectral clustering for graph data:-
Graph data is generally used in the applications like social network analysis where a network
is represented by a graph G = (V, E). V represents the set of social entities and E represents
the connections between the social entities. Clustering in social network is called as
community detection. For spectral clustering in graph, the adjacency matrix can be used
directly. The steps of the previous algorithm are used in same way.
Strength of clustering:-
Strength of clustering in a graph can be measured by modularity. Modularity can be computed
using following equation:
1 𝐾𝑖 𝐾𝑗
𝑄= ∑ (𝐴𝑖𝑗 − ) 𝛿(𝐶𝑖 , 𝐶𝑗 )
2𝑚 2𝑚
𝑖𝑗

where, m is the sum of all of the edge weights in the graph, Aij is the edge weight between
node i and node j, Ki and Kj are the sum of the weights of the edges attached to nodes i and j
respectively, Ci and Cj are the communities of nodes i and j respectively, and 𝛿(𝐶𝑖 , 𝐶𝑗 ) is the
Kronecker delta function.
Future Extension and Applicability of the talk: -
This idea is applicable in various fields. In computer science, this concept is used in social
network analysis, image processing, natural language processing, etc. In biological science, it
is helpful in finding out the closeness present among the various biological entities like
organisms, cells, proteins, etc. Apart from these two fields, this concept is very much
applicable to economics, sociology, political science, etc.
References
1. Jia, H., Ding, S., Xu, X., & Nie, R. (2014). The latest research progress on spectral
clustering. Neural Computing and Applications, 24, 1477-1486.
2. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an
algorithm. Advances in neural information processing systems, 14.
3. Kang, Z., Shi, G., Huang, S., Chen, W., Pu, X., Zhou, J. T., & Xu, Z. (2020). Multi-graph fusion
for multi-view spectral clustering. Knowledge-Based Systems, 189, 105102.
4. Khan, A. A., & Mohanty, S. K. (2022). A fast spectral clustering technique using MST based
proximity graph for diversified datasets. Information Sciences, 609, 1113-1131.
5. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an
algorithm. Advances in neural information processing systems, 14.
6. Tremblay, N., & Loukas, A. (2020). Approximating spectral clustering via sampling: a
review. Sampling Techniques for Supervised or Unsupervised Tasks, 129-183.

You might also like