Home About IUP Magazines Journals Books Archives
     
A Guided Tour | Recommend | Links | Subscriber Services | Feedback | Subscribe Online
 
The IUP Journal of Computer Sciences :
A New ISODATA Algorithm
:
:
:
:
:
:
:
:
:
 
 
 
 
 
 
 

A new ISODATA (Iterative Self-Organizing Data Analysis Technique) algorithm that makes use of coefficient of variation in the cluster formation process is developed. The performance of the proposed algorithm is evaluated against the original ISODATA algorithm in terms of cluster validity measures, namely Sum of Squares, Purity and F-measure, using four different data sets. The results of the experimental study support the use of the newly proposed ISODATA algorithm for clustering small as well as large data sets.

 
 
 

Clustering algorithms play a vital role in various branches of data mining, especially in image processing, pattern recognition, codebook optimization, etc. Clustering algorithms can be broadly classified into ‘hierarchical’ and ‘nonhierarchical’ algorithms. Some algorithms that come under the category of ‘nonhierarchical’ clustering are: (i) partitioning algorithms, like k-means of MacQueen (1967), and k-medoids of Kaufman and Rousseeuw (1990); (ii) density-based algorithms, like DBSCAN of Ester et al. (1996) and DENCLUE of Hinneburg and Kein (1998); and (iii) model-based algorithm due to Frawley and Raftery (2002). Most of the partitive algorithms partition the given data set into different non-overlapping clusters by computing the distance between the data objects and the centroid of the clusters, and assigning the data objects to clusters based on the minimum distance. This process is carried out iteratively till the convergence in the cluster centroids is obtained. The choice of the distance measure used and the number of clusters required play a crucial role in determining the overall efficiency of these algorithms. Also, if the clusters obtained in any iteration are very large or too small, these algorithms are not capable of splitting or merging the clusters. Thus it will be useful to have algorithms that partition the data set into non-overlapping clusters and at the same time be able to split or merge the clusters based on the representation of data objects. Such algorithms are referred to as hybrid algorithms.

One of the popular hybrid algorithms that makes use of partitive and merging processes in the formation of clusters is the ISODATA (Iterative Self-Organizing Data Analysis Technique). This algorithm makes use of six predefined parameters in order to split or merge the clusters during any iteration. The process of splitting is done by considering the magnitude of the standard deviation of the attributes of the data objects in the clusters. It is to be noted that the attributes may have the same standard deviation but different mean values. Therefore, it would be meaningful to consider the magnitude of the mean as well as the standard deviation of the attributes for splitting the clusters. One such measure that makes use of the mean as well as standard deviation is the coefficient of variation, which is defined as the ratio of the standard deviation to the mean. Thus, it would be prudent to use coefficient of variation instead of standard deviation to split the clusters. Based on the above observation, in this paper, a new ISODATA algorithm has been proposed that makes use of the magnitude of the coefficient of variation as a measure to split the clusters.

 
 
 

Computer Sciences Journal, ISODATA Algorithm, Iterative Self-Organizing Data Analysis Technique, Coefficient of Variation, Clustering Algorithms, Standard Deviation Vector, Density-based Algorithms, Data Mining, Wolfgang’s Approach.