Over the last decade, the proliferation of information on the WWW, called in short
web, has resulted in a large repository of web documents stored in multiple websites.
This plethora and diversity of resources have promoted the need for developing a
semiautomatic mining technique on the WWW, thereby giving rise to the term web
mining (Hu et al., 2003).
Every website contains multiple webpages. Every webpage has: 1) contents which
can be in any form, e.g., text, graphics, multimedia, etc.; 2) links from one page to
another; and 3) users accessing the webpages. According to this, the area, web mining,
can be categorized as follows.
Mining the contents of webpages is called ‘Content Mining’. Mining the links
between webpages is called ‘Structure Mining’. Mining the web access logs is called
‘Web Usage Mining’. Figure 1 describes the categorization of web mining.
All web servers maintain exhaustive log files about user interactions. Whenever a
request for resources is received, the web server records it in the log file according to
the format specified by the server administrator. For example, Apache web server.
|