Microarray technology has attracted research attention in recent years. It is a promising
tool to simultaneously monitor and measure the expression levels of thousands of genes
of an organism in a single experiment. Basically, a microarray is a glass slide that
contains thousands of spots. Each spot may contain a few million copies of identical
DNA molecules that uniquely correspond to a gene. Microarray technology is commonly
used in medical diagnosis and genetic analysis. For example, genome-wide expression
data from cancerous tissues helps in cancer diagnosis and classification (Guyon et al.,
2002; and Wahde and Szallasi, 2006). Machine learning techniques have been
successfully applied on microarray data in the said diagnosis that involves classification
and clustering. A significant number of new discoveries have been made from the
microarray data analysis.
However, it remains a great challenge to the researchers as the nature of microarray
data is inherently noisy and high dimensional. Due to biological fluctuations which are natural, variations in measurements are reflected in microarray data, resulting in
implications for the analysis. Further, microarray experiment involves complex scientific
procedures, materials and instruments. It is also possible that errors may commonly be
introduced due to imperfection and limitation of instruments, impurity in materials and
human negligence. It is likely that microarray data which contains thousands of genes
may include many irrelevant and redundant features. Thus, microarray data usually
suffers from the curse of dimensionality and poses hurdle for machine learning algorithms.
|