REDCLAN - Relative Density Based Clustering and Anomaly Detection

Diptarka Saha1, Debanjana Banerjee1 and Bodhisattwa Prasad Majumder2, 1WalmartLabs, India and 2University of California, USA; Diptarka Saha1, Debanjana Banerjee1 and Bodhisattwa Prasad Majumder2, 1WalmartLabs, India and 2University of California, USA

REDCLAN - Relative Density Based Clustering and Anomaly Detection

Authors

Diptarka Saha¹, Debanjana Banerjee¹ and Bodhisattwa Prasad Majumder², ¹WalmartLabs, India and ²University of California, USA

Abstract

Cluster analysis and Anomaly Detection are the primary methods for database mining. However, most of the data in today's world, generated from multifarious sources, don’t adhere to the assumption of single or even known distribution - hence the problem of finding clusters in the data becomes arduous as clusters are of widely differing sizes, densities and shapes, along with the presence of noise and outliers. Thus, we propose a relative-KNN-kernel density-based clustering algorithm. The un-clustered (noise) points are further classified as anomaly or non-anomaly using a weighted rank-based anomaly detection method. This method works particularly well when the clusters are of varying variability and shape, in these cases our algorithm not only finds the “dense” clusters that other clustering algorithms find, it also finds low-density clusters that these approaches fail to identify. This more accurate clustering in turn helps reduce the noise points and makes the anomaly detection more accurate.

Keywords

Clustering, Relative KNN – kernel density, Varying density clusters, Anomaly Detection, DBSCAN

CS&IT Conference Proceedings

REDCLAN - Relative Density Based Clustering and Anomaly Detection