High dimensional gene expression data dimension reduction

Shi Chao, Chen Lihui, in Proceedings of IEEE Conference on Cybernetics and Intelligent Systems (CIS), Dec 2004. download!

Abstract

Gene expression data analysis is a new approach in cancer diagnosis. Feature selection is an important preprocessing step in gene expression data clustering. In this paper, we demonstrate the effectiveness of feature grouping approach in feature dimension reduction. In our proposed framework, large number of features is grouped to form several feature subsets. By criteria of clustering accuracy, one feature subset is chosen as the candidate subset for further processing by PCA or entropy ranking, and the final feature subset are formed by selecting the features from top ranked ones. Advantage of the framework is that it considers both subset and individual feature's discrimination power, also it requires little information about the class label. A prototype of the proposed framework has been implemented and tested on the leukemia data set. The results have given positive support to the framework.