keyboard_arrow_up
Towards Reduction Of Data Flow in A Distributed Network Using Principal Component Analysis

Authors

Devadatta Sinha1 and Anal Acharya2, 1Calcutta University,India and 2St. Xavier's College,India

Abstract

For performing distributed data mining two approaches are possible: First, data from several sources are copied to a data warehouse and mining algorithms are applied in it. Secondly, mining can performed at the local sites and the results can be aggregated. When the number of features is high, a lot of bandwidth is consumed in transferring datasets to a centralized location. For this dimensionality reduction can be done at the local sites. In dimensionality reduction a certain encoding is applied on data so as to obtain its compressed form. The reduced features thus obtained at the local sites are aggregated and data mining algorithms are applied on them. There are several methods of performing dimensionality reduction. Two most important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis (PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across a distributed network.

Keywords

Distributed Data Mining (DDM), Principal Component Analysis (PCA), Eigen Vector,Dimensionality Reduction.

Full Text  Volume 3, Number 2