It has been observed that scientifically processed data gives rise to good
statistical conclusions. During the preparation of samples, it is possible that the variability
depends on the labeling procedure due to various physical characteristics of the samples.
The process of data cleaning transforms the original dataset by performing tasks
such as removing errors, adjusting outliers, estimating missing values, encoding
categorical variables and standardizing variables. The Transformation (T) of data into a
scale suitable for analysis is to remove as far as possible the effects of systematic sources
of variation. The process of minimizing the effects of systematic sources of variation
is referred to as `Normalization' (N). The sole purpose of Normalization of the data is
to ensure that the variation in the expression values is indeed due to biological
differences and not due to experimental artifacts (noises).
Transforming the data can sometimes help promote an additive structure by
removing interaction effects between the model and error and stabilizing the error variance.
In fact, the assumption of equal variances in the Analysis of Variance (ANOVA)
model corresponds to an assessment of the spread versus level plot.
Essentially, the purpose of T, Standardization and N are to make the data
more accountable for computational aspects and its interpretations. Some common
reasons for using either T, S or N are: (i) There is a spread-level effect across batches
of samples, (ii) The distribution of a variable is strongly skewed, (iii) The
residuals from a fitted model exhibit a systematic pattern and (iv) The data
does not satisfy the assumptions of a statistical procedure. The main difficulty that arises in these
situations is the presence of non-linearity which can substantially increase the complexity
of the statistical analysis. Another problem which can crop up is `multicollinearity'
due to the dependant structure of the observed random variables. By applying a
nonlinear T to the data, we may be able to alleviate these problems to a great extent
and produce meaningful analysis. |