Computer Sciences Journal | Similarity Measures for Real World Data Mining

The IUP Journal of Computer Sciences :

Similarity Measures for Real World Data Mining

Article Details

Pub. Date	:	January, 2010
Product Name	:	The IUP Journal of Computer Sciences
Product Type	:	Article
Product Code	:	IJCS41001
Author Name	:	Nagalakshmi H S and Suhasini M
Availability	:	YES
Subject/Domain	:	Management
Download Format	:	PDF Format
No. of Pages	:	7

Price

For delivery in electronic format: Rs. 50; For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Abstract

In real world, data is described not in terms of crisp/singular numeric data but also in terms of multiple data wherein the data values of each entity are of varied sizes. For an effective data mining implementation, the databases which contain such multiple values of data have to be analyzed and aggregated to derive relevant meaningful data. This aspect of analyzing such data is one of the most important and fundamental basis of cluster analysis. The crux of cluster analysis lies in the design of similarity measures which aim at capturing the degree of similarities between the entities. In this paper, the authors present two such degrees of similarity measures for multiple valued data types and deal with the clustering of such multiple valued data considering real life examples. The dataset considered is of 50 individuals and their areas of study and the areas of expertise in relevance to their fields of study. The degree of similarity perceived between entities or between the data and the query, denotes the relevance/alikeness in the given data.

Description

Classification of data plays a key role in the field of decision making and data mining. Clustering is the process of organizing the given data into sensible groupings based on the similarities perceived. This aspect of cluster analysis makes it one of the most fundamental modes of understanding and learning about the identification of pattern groups/classes present within the data.

Data mining is a concept wherein the aggregation of data plays a dominant role, thereby helping the effective decision making about the given data. Cluster analysis aims at capturing this aggregation of data through the similarities deduced in the given data, thereby acting as an effective tool for data mining.

Pure Multiple Valued Data (PMVD): This type of data deals with recording multiple/ many characteristics/properties/attributes possessed by an entity. For example: topics offered are data mining, data structures, algorithms study, mathematical foundations, etc.

Priority Based Multiple Valued Data (PBMVD): This data type deals with recording many characteristics/properties/attributes possessed by an entity based on priority levels assigned by that entity. For example: areas of interest— data structures, DBMS, programming languages, algorithm study, discrete mathematics.

Clustering is the process of organizing the given data into sensible groupings. The organization is based on the similarities perceived. Cluster analysis is the formal study of algorithms and methods of grouping the data and is an unsupervised pattern of classification technique. Clustering algorithms aim at finding the structure of data. The representation of objects is based on a set of measurements, and their relationship with other object clusters can be defined in many ways,

Keywords

Computer Sciences Journal, Data Mining, Pure Multiple Valued Data, PMVD, Data Structures, Programming Languages, Discrete Mathematics, Clustering Algorithms, Computer Networks, Clustering Techniques, Decision Making Process, Hierarchical Clustering Method.