IUP Publications Online
Home About IUP Magazines Journals Books Archives
     
Recommend    |    Subscriber Services    |    Feedback    |     Subscribe Online
 
The IUP Journal of Information Technology
Particle Swarm Optimization for Feature Selection: A Study on Microarray Data Classification
:
:
:
:
:
:
:
:
:
 
 
 
 
 
 

DNA microarray technology allows simultaneous monitoring and measuring of thousands of gene expression activation levels in a single experiment. Data mining techniques such as classification is widely used on microarray data for medical diagnosis and gene analysis. However, high dimensionality of the data affects the performance of classification and prediction. Consequently, a key issue in microarray data is feature selection and dimensionality reduction in order to achieve better classification and predictive accuracy. There are several machine learning approaches available for feature selection. In this study, particle swarm optimization technique was used for feature selection, and the classification performance of several popular classifiers was analyzed on a set of microarray datasets. The results conclude that particle swarm optimization technique provides better results compared to genetic algorithm.

 
 

Microarray technology has attracted research attention in recent years. It is a promising tool to simultaneously monitor and measure the expression levels of thousands of genes of an organism in a single experiment. Basically, a microarray is a glass slide that contains thousands of spots. Each spot may contain a few million copies of identical DNA molecules that uniquely correspond to a gene. Microarray technology is commonly used in medical diagnosis and genetic analysis. For example, genome-wide expression data from cancerous tissues helps in cancer diagnosis and classification (Guyon et al., 2002; and Wahde and Szallasi, 2006). Machine learning techniques have been successfully applied on microarray data in the said diagnosis that involves classification and clustering. A significant number of new discoveries have been made from the microarray data analysis.

However, it remains a great challenge to the researchers as the nature of microarray data is inherently noisy and high dimensional. Due to biological fluctuations which are natural, variations in measurements are reflected in microarray data, resulting in implications for the analysis. Further, microarray experiment involves complex scientific procedures, materials and instruments. It is also possible that errors may commonly be introduced due to imperfection and limitation of instruments, impurity in materials and human negligence. It is likely that microarray data which contains thousands of genes may include many irrelevant and redundant features. Thus, microarray data usually suffers from the curse of dimensionality and poses hurdle for machine learning algorithms.

 
 

Information Technology Journal, PSO, Microarray data classification, Feature selection, Microarray data analysis, Evolutional computation.