Does Number of Clusters Effect the Purity and Entropy of Clustering?

Recent Advances on Soft Computing and Data Mining (SCDM 2016)

Cluster analysis automatically partitioned the data into a number of different meaningful groups or clusters using the clustering algorithms. Every clustering algorithm produces its own type of clusters. Therefore, the evaluation of clustering is very important to find the better clustering algorithm. There exist a number of evaluation measures which can be broadly divided internal, external and relative measures. Internal measures are used to assess the quality of the obtained clusters like cluster cohesion and number of clusters (NoC). The external measures such as purity and entropy find the extent to which the clustering structure discovered by a clustering algorithm matches some external structure while the relative measures are used to assess two different clustering results using internal or external measures. To explore the effect of external evaluations specifically the NoC on internal evaluation measures like purity and entropy, an empirical study is conducted. The idea is taken from the fact that the NoC obtained in the clustering process is an indicator of the successfulness of a clustering algorithm. In this paper, some necessary propositions are formulated and then four previously utilized test cases are considered to validate the effect of NoC on purity and entropy. The proofs and experimental results indicate that the purity maximizes and the entropy minimizes with increasing NoC.

The authors would like to thank Universiti Tun Hussein Onn Malaysia (UTHM) and Ministry of Higher Education (MOHE) Malaysia for financially supporting this research under the Fundamental Research Grant Scheme (FRGS), Vote No. 1235.

