Home About IUP Magazines Journals Books Amicus Archives
     
A Guided Tour | Recommend | Links | Subscriber Services | Feedback | Subscribe Online
 
The IUP Journal of Information Technology :
An Overview of Data Pre-processing in Web Usage Mining
:
:
:
:
:
:
:
:
:
 
 
 
 
 
 
 

Web Mining is an emerging and active area of research in the Internet era. Due to globalization, an increasing number of organizations depend on the World Wide Web. Enormous amount of data are generated automatically, by Web servers and collected in server access logs. Since the information available in the Web is heterogeneous and unstructured, there is a need to pre-process the data to make it easier to mine for knowledge. Web usage mining is a type of Web mining activity that involves the automatic discovery of user access patterns from one or more Web servers and referer logs. This article discusses in-depth, the various steps of pre-processing the data before discovering user's navigation patterns in Web access logs.

To be globally competent and competitive, and sustain in the e-market, a successful presence on the Web is necessary. The World Wide Web is an interesting area for data mining, because of the abundance of information. Web-based organizations generate and collect large volumes of data in their day-to-day activities. Majority of this data is generated automatically by Web servers and collected in server access logs in an unstructured format.

Web Mining can be defined as the application of data mining techniques to automatically discover and extract useful information from the World Wide Web documents and services in order to better understand and serve the needs of Web-based applications (Etzioni, 1996). It involves the automatic discovery of patterns from one or more Web servers, which helps organizations to determine the value of specific customers, cross marketing strategies across products and the effectiveness of promotional campaigns, etc.

 
 
 

An Overview of Data Pre-processing in Web Usage Mining,Internet era, globalization, organizations, World Wide Web, information, e-market, specific customers, competitive, cross marketing strategies, products, promotional campaigns, initial raw data, statistical analysis, association rules, sequential patterns, corporate firewalls, transaction file,mining algorithms.