Abstract
Gene microarray data contains the genetic information that provides a new method
for the prediction and diagnosis of disease. However, its characteristics of high
dimension, high noise, high redundancy and small sample size, pose challenges to the
traditional pattern recognition methods. So the feature selection has become focus in
this research field. To analyze and deal with this kind of data, researchers urgently need
design new pattern recognition methods.
We mainly studied feature selection in gene microarray data and proposed a hybrid
feature selection algorithm based on clustering and Intelligent Optimization. First of all,
we handle the data with filter method in order to remove noise information. Secondly,
we cluster genes using AP clustering algorithm based on the correlation measure as the
similarity matrix of AP algorithm. We remove the redundancy of clustering results on
the basis of a reducing redundancy algorithm and make combination of the rest of genes
in the clustering results as gene subset space. Thirdly we choose the optimized feature
genes in the gene subset space through the Wrapper method.
For algorithm validation, we applied our method to six public microarray data sets
to demonstrate its improved performance. In the algorithm validation, we explored how
to organize the component of this system, choose similarity measure and extract the
representative genes from the clustering results. We also compared the result with some
commonly methods, showing that the effectiveness of the proposed hybrid feature
selection algorithm.
Keywords: Hybrid Feature Selection Microarray Data AP Clustering
Particle Swarm Optimization Similarity Measurement
万方数据
评论0
最新资源