Computer Sciences Journal | Detection of Unsolicited E-mails and Summarization by Keyword Extraction

The IUP Journal of Computer Sciences :

Detection of Unsolicited E-mails and Summarization by Keyword Extraction

Article Details

Pub. Date	:	January, 2010
Product Name	:	The IUP Journal of Computer Sciences
Product Type	:	Article
Product Code	:	IJCS11001
Author Name	:	Shanmugasundaram Hariharan
Availability	:	YES
Subject/Domain	:	Management
Download Format	:	PDF Format
No. of Pages	:	9

Price

For delivery in electronic format: Rs. 50; For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Abstract

Electronic mail (E-mail) serves as a popular mode of communication in every day life. E-mail offers several advantages like speed of delivery, cheaper cost, acknowledgment report, transparent service and distributed environment. Managing these e-mails requires huge attention as spammers try to induce large amount of spam or unsolicited mails. This paper focuses on detecting these unwanted mails, called `unsolicited mails', received by the user. The mail messages are parsed through a filter that would identify the spam immediately, thereby generating an alert. These mails are then clustered effectively. The results obtained were promising and provide a platform for further improvements to build a domain-independent personalizer system. Also, a detailed mechanism to summarize the mail contents is proposed.

Description

Electronic mail has become a part of everyday life. Personal computer users use e-mails to communicate with friends, families and colleagues for faster and efficient communication. Such e-mails act as a moderator between communicating clients with the help of equipped network infrastructure, thereby providing secure communication and delivery of the contents. E-mails have become one of the vital components especially among educationalists, researchers, scientists, etc. The emergence of social network groups and commercial sites has enlightened researchers to focus on some of the key issues like spam or junk mails, information overload, secure transformation of content, etc. The volume of e-mail that we get is constantly growing. The rate of unsolicited (spam) e-mail also increases rapidly due to some commercial marketers.

In the past few years, Internet technology has radically affected our daily communication style. The e-mail technology makes it possible to communicate with many people simultaneously in a very easy and cheap way. But, many e-mails are received by users without their desire. Spam mail (or junk mail or bulk mail) is the general name used to denote these types of e-mail. Spam mails are defined as electronic messages posted to thousands of recipients usually for advertisement or profit. These spam mails increase day by day, hence they have to be treated immediately. It is found that about 10% of the incoming e-mails to the network were spam.

Though many methods are available to solve the above-mentioned issues, they are not satisfactory. These methods are roughly grouped into two broad categories: static methods and dynamic methods. Static methods are those which identify the spam mails through user feedback. When the server encounters such mails, it treats them as spam. Some servers even collect addresses which are reported as spammers (people who send spam messages) and treat the e-mails coming from them as spam. Dynamic methods are those which fail to identify the category of mails automatically. The paper solely focuses its attention on detection of such spam mails.

Though several anti-spam measures are available, they have limited effect, and hence, an alternate solution is required. There also exist commercial softwares to block spam. It is also essential to mine the statistics of the spam mails thereby identifying the terms occurring in the mails so as to keep updated on the spam terms (Cranor and LaMacchia, 1998). Recent research works focus on anti-spam filtering methods using Artificial Neural Network (ANN) and Bayesian Network (Levent et al., 2004), managing information overload (David et al., 2007), web log-based mining to detect unwanted mails (Emmanuel, 2007) and effective mining using WordStat (Udoh and Rhoades, 2006).

Keywords

Computer Sciences Journal, Electronic Mail, Commercial Sites, Commercial Marketers, Internet Technology, Commercial Softwares, Web Log-Based Mining, Artificial Neural Networks, Open Source Developments, Information Technology, Abstraction Mechanism.