The performance of different hierarchical clustering methods varied with
distance measures used and it was found that squared Euclidean performed best among the
five distances followed by city block distance in majority of cases. Among the five methods,
the Ward's method performed best with least average percentage probability
of misclassification followed by non-hierarchical k-means method irrespective of the
sample size. Among the different distance measures used under hierarchical clustering
methods, the squared Euclidean distance showed least average percentage probability
of misclassification followed by city block distance.
The summarization of large quantities of multivariate data is being increasingly
practiced in various branches of agricultural science. A number of multivariate statistical
techniques, namely, cluster analysis, principal components analysis, factor analysis are being
widely used for classification purposes. One of the basic problems faced by the plant
breeders is to classify large number of genotypes/lines into fewer manageable
homogeneous groups/clusters. There are large number of clustering methods and dissimilarity
measures available in literature for making homogeneous
groups. One of the main problems faced by the breeder is to choose a suitable method of
clustering and dissimilarity measure among the different methods and dissimilarity measures available in
literature. There is hardly any information available in literature on the performance of
these clustering methods and dissimilarity measures. Researchers commonly use
UPGMA (Unweighted Pair-Group Method using Arithmetic Averages) and Ward's method
followed by SLINK (Single Linkage) and CLINK (Complete Linkage) among the existing
clustering methods. According to Blashfield (1976) UPGMA, Ward's and SLINK account for
3/4th of the published work which used cluster analysis technique. The lesser used
clustering methods, which appear occasionally in applications are WPGMA (Weighted
Pair-Group Method using Arithmetic Averages) method, the centroid method and the flexible
method (Sneath and Sokal, 1973). Lin (1982), Ramey and Rosielle (1983), Wahi and Kher
(1991) promoted the application of clustering techniques to group the genotypes
or environments but the number of clusters obtained from these methods are not
unique because of unrepresentativeness of the clustering groups obtained through
different clustering procedures. k-means method requires prior knowledge of the number
of clusters but unfortunately in the case of unsupervised classification usually there is
no prior idea about the number of clusters. On the contrary, hierarchical clustering
methods do not require a prior knowledge of number of clusters, which is a definite
advantage over k-means method. |