keyboard_arrow_up
Integrating Naive Bayes and K-Means Clustering with Different Initial Centroid Selection Methods in the Diagnosis of Heart Disease Patients

Authors

Mai Shouman, Tim Turner and Rob Stocker, University of New South Wales at the Australian Defence Force Academy, Australia

Abstract

Heart disease is the leading cause of death in the world over the past 10 years. Researchers have been using several data mining techniques to help health care professionals in the diagnosis of heart disease. Naïve Bayes is one of the data mining techniques used in the diagnosis of heart disease showing considerable success. K-means clustering is one of the most popular clustering techniques; however initial centroid selection strongly affects its results. This paper demonstrates the effectiveness of an unsupervised learning technique which is k-means clustering in improving supervised learning technique which is naïve bayes. It investigates integrating K-means clustering with Naïve Bayes in the diagnosis of heart disease patients. It also investigates different methods of initial centroid selection of the K-means clustering such as range, inlier, outlier, random attribute values, and random row methods in the diagnosis of heart disease patients. The results show that integrating k-means clustering with naïve bayes with different initial centroid selection could enhance the naïve bayes accuracy in diagnosing heart disease patients. It also showed that the two clusters random row initial centroid selection method could achieve higher accuracy than other initial centroid selection methods in the diagnosis of heart disease patients showing accuracy of 84.5%.

Keywords

Data Mining, Naïve Bayes, K-Means Clustering, Initial Centroid Selection Methods, Heart Disease Diagnosis.

Full Text  Volume 2, Number 5