Electronic mail has become a part of everyday life. Personal computer users use
e-mails to communicate with friends, families and colleagues for faster and
efficient communication. Such e-mails act as a moderator between communicating
clients with the help of equipped network infrastructure, thereby providing
secure communication and delivery of the contents. E-mails have become one of the
vital components especially among educationalists, researchers, scientists, etc.
The emergence of social network groups and commercial sites has
enlightened researchers to focus on some of the key issues like spam or junk mails,
information overload, secure transformation of content, etc. The volume of e-mail that we get
is constantly growing. The rate of unsolicited (spam) e-mail also increases rapidly
due to some commercial marketers.
In the past few years, Internet technology has radically affected our daily
communication style. The e-mail technology makes it possible to communicate with many
people simultaneously in a very easy and cheap way. But, many e-mails are received by
users without their desire. Spam mail (or junk mail or bulk mail) is the general name
used to denote these types of e-mail. Spam mails are defined as electronic messages
posted to thousands of recipients usually for advertisement
or profit. These spam mails increase day by day, hence they have to be treated immediately. It is found that about 10% of
the incoming e-mails to the network were
spam.
Though many methods are available to solve the above-mentioned issues, they
are not satisfactory. These methods are roughly grouped into two broad categories:
static methods and dynamic methods. Static methods are those which identify the spam
mails through user feedback. When the server encounters such mails, it treats them as
spam. Some servers even collect addresses which are reported as spammers (people who
send spam messages) and treat the e-mails coming from them as spam. Dynamic
methods are those which fail to identify the category of mails automatically. The paper
solely focuses its attention on detection of such spam mails.
Though several anti-spam measures are
available, they have limited effect,
and hence, an alternate solution is required. There also exist commercial softwares
to block spam. It is also essential to mine the statistics of the spam mails
thereby identifying the terms occurring in the mails so as to keep updated on the spam
terms (Cranor and LaMacchia, 1998). Recent research works focus on anti-spam
filtering methods using Artificial Neural Network (ANN) and Bayesian Network (Levent et al., 2004), managing information overload (David et al., 2007), web log-based mining to detect unwanted mails (Emmanuel, 2007) and effective mining using WordStat
(Udoh and Rhoades, 2006). |