Home About IUP Magazines Journals Books Archives
     
A Guided Tour | Recommend | Links | Subscriber Services | Feedback | Subscribe Online
 
The IUP Journal of Operations Management :
A Note on Transformation, Standardization and Normalization
:
:
:
:
:
:
:
:
:
 
 
 
 
 
 
 

This paper discusses the importance of data cleaning and processing using various statistical techniques like Transformation (T), Standardization (S) and Normalization (N) from a practical point of view. It is possible that the unprocessed raw data at first instance will lead to poor interpretation because of the effects of systematic variations. A suitable transformation or standardization or normalization can nullify the effect of such variations. This is explained using a couple of examples from literature.

 
 

It has been observed that scientifically processed data gives rise to good statistical conclusions. During the preparation of samples, it is possible that the variability depends on the labeling procedure due to various physical characteristics of the samples. The process of data cleaning transforms the original dataset by performing tasks such as removing errors, adjusting outliers, estimating missing values, encoding categorical variables and standardizing variables. The Transformation (T) of data into a scale suitable for analysis is to remove as far as possible the effects of systematic sources of variation. The process of minimizing the effects of systematic sources of variation is referred to as `Normalization' (N). The sole purpose of Normalization of the data is to ensure that the variation in the expression values is indeed due to biological differences and not due to experimental artifacts (noises).

Transforming the data can sometimes help promote an additive structure by removing interaction effects between the model and error and stabilizing the error variance. In fact, the assumption of equal variances in the Analysis of Variance (ANOVA) model corresponds to an assessment of the spread versus level plot.

Essentially, the purpose of T, Standardization and N are to make the data more accountable for computational aspects and its interpretations. Some common reasons for using either T, S or N are: (i) There is a spread-level effect across batches of samples, (ii) The distribution of a variable is strongly skewed, (iii) The residuals from a fitted model exhibit a systematic pattern and (iv) The data does not satisfy the assumptions of a statistical procedure. The main difficulty that arises in these situations is the presence of non-linearity which can substantially increase the complexity of the statistical analysis. Another problem which can crop up is `multicollinearity' due to the dependant structure of the observed random variables. By applying a nonlinear T to the data, we may be able to alleviate these problems to a great extent and produce meaningful analysis.

 
 

Operations Management Journal, Transformation, Standardization, Normalization, Analysis of Variance Model, ANOVA, Exploratory Data Analysis, Visualization Techniques, National Directorate for Rural Development, NDRD, Symmetric Distributions, Global Normalization, Smoothing.