Operations Management Journal | A Note on Transformation, Standardization and Normalization

The IUP Journal of Operations Management :

A Note on Transformation, Standardization and Normalization

Article Details

Pub. Date	:	Feb-May, 2010
Product Name	:	The IUP Journal of Operations Management
Product Type	:	Article
Product Code	:	IJOM81002
Author Name	:	K Muralidharan
Availability	:	YES
Subject/Domain	:	Management
Download Format	:	PDF Format
No. of Pages	:	7

Price

For delivery in electronic format: Rs. 50; For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Abstract

This paper discusses the importance of data cleaning and processing using various statistical techniques like Transformation (T), Standardization (S) and Normalization (N) from a practical point of view. It is possible that the unprocessed raw data at first instance will lead to poor interpretation because of the effects of systematic variations. A suitable transformation or standardization or normalization can nullify the effect of such variations. This is explained using a couple of examples from literature.

Description

It has been observed that scientifically processed data gives rise to good statistical conclusions. During the preparation of samples, it is possible that the variability depends on the labeling procedure due to various physical characteristics of the samples. The process of data cleaning transforms the original dataset by performing tasks such as removing errors, adjusting outliers, estimating missing values, encoding categorical variables and standardizing variables. The Transformation (T) of data into a scale suitable for analysis is to remove as far as possible the effects of systematic sources of variation. The process of minimizing the effects of systematic sources of variation is referred to as `Normalization' (N). The sole purpose of Normalization of the data is to ensure that the variation in the expression values is indeed due to biological differences and not due to experimental artifacts (noises).

Transforming the data can sometimes help promote an additive structure by removing interaction effects between the model and error and stabilizing the error variance. In fact, the assumption of equal variances in the Analysis of Variance (ANOVA) model corresponds to an assessment of the spread versus level plot.

Essentially, the purpose of T, Standardization and N are to make the data more accountable for computational aspects and its interpretations. Some common reasons for using either T, S or N are: (i) There is a spread-level effect across batches of samples, (ii) The distribution of a variable is strongly skewed, (iii) The residuals from a fitted model exhibit a systematic pattern and (iv) The data does not satisfy the assumptions of a statistical procedure. The main difficulty that arises in these situations is the presence of non-linearity which can substantially increase the complexity of the statistical analysis. Another problem which can crop up is `multicollinearity' due to the dependant structure of the observed random variables. By applying a nonlinear T to the data, we may be able to alleviate these problems to a great extent and produce meaningful analysis.

Keywords

Operations Management Journal, Transformation, Standardization, Normalization, Analysis of Variance Model, ANOVA, Exploratory Data Analysis, Visualization Techniques, National Directorate for Rural Development, NDRD, Symmetric Distributions, Global Normalization, Smoothing.