keyboard_arrow_up
Multilevel Techniques for the Clustering Problem

Authors

Noureddine Bouhmala, Vestfold University College, Norway

Abstract

Data Mining is concerned with the discovery of interesting patterns and knowledge in data repositories. Cluster Analysis which belongs to the core methods of data mining is the process of discovering homogeneous groups called clusters. Given a data-set and some measure of similarity between data objects, the goal in most clustering algorithms is maximizing both the homogeneity within each cluster and the heterogeneity between different clusters. In this work, two multilevel algorithms for the clustering problem are introduced. The multilevel paradigm suggests looking at the clustering problem as a hierarchical optimization process going through different levels evolving from a coarse grain to fine grain strategy. The clustering problem is solved by first reducing the problem level by level to a coarser problem where an initial clustering is computed. The clustering of the coarser problem is mapped back level-by-level to obtain a better clustering of the original problem by refining the intermediate different clustering obtained at various levels. A benchmark using a number of data sets collected from a variety of domains is used to compare the effectiveness of the hierarchical approach against its single-level counterpart.

Keywords

Clustering Problem, Genetic Algorithm, Multilevel Paradigm, K-Means.

Full Text  Volume 4, Number 2