IJTC201510012-Email With Classification Detection Power

8/17/2019 IJTC201510012-Email With Classification Detection Power

1/7

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)

ISSN-2455-099X,

Volume 1, Issue 1, OCTOBER 2015.

1

EMAIL WITH CLASSIFICATION DETECTION POWER

Pallavi a, ∗

, Manpreet Virk b

a,b Department of CSE , GGS Modern Technology, Kharar

ABSTRACT

This research is to classify and filter the large amount of data. The main purpose of this research is to reduce the

error rate of the data and to improve the accuracy. In the previous techniques of classification there may be

some miss classification. But in this research the problem of misclassification is reduced. The work is presented

by this research is some modifications in the classification technique. Therefore, it’s a good enterprise solution

for filtering. This will optimize the system performance and make some improvements on the previous

algorithm. This will give the better results from the previous one.

Keywords: error rate, Techniques, class labels spam and non-spam.

I. INTRODUCTION

Email filtering is the processing of email to systematize it according to the exact criteria.

Most often this refers to the automatic processing of incoming messages, but the term is also

used to the involvement of human intelligence in addition to anti-spam techniques. Bayesian

spam filtering is a statistical method of e-mail filtering. Bayesian spam filtering makes use

for Naive Bayes classifier to make out spam e-mail. Work is classified by Bayesian to

compare the use of tokens i.e typically words, or we can say irregularly other things, with

spam and non-spam e-mails. Bayesian spam filtering is a extremely powerful technique for

constricting with spam, that can adapt itself to the email needs of individual users, and gives

low false positive spam finding rates that are generally acceptable to users.

A. Email Filtering Benefits

Deal with the Service: Because our services are a managed explanation, there are no

additional costs for software/hardware upgrades, Internet bandwidth, or labor for

maintenance. Taking away of these expenses saves costs and eases budgeting by eliminating

capital lay out and surprising expenses that are sometimes incurred through enforced

upgrades of hardware- or software-based solutions. Specialized expertise is also offered on

mail routing, filtering, and blacklists, allowing the staff to concentrate on the infrastructure.


2/7


ISSN-2455-099X,


2

Improves Efficiency: These services recover staff's productivity by eliminating typically 98%

of unwanted email. Current industry estimates designate that as much as 70% of all email is

unwanted, wasting employees’ time manually filtering and deleting messages. Because we

offer a large array of configuration options, filtering solution can be tailored to needs rather

than changing the way to do business.

Reduces Communications Load: By filtering outside of the premises, one eliminates the

requirement for infrastructure to deal with the email messages that are filtered out, save

Internet bandwidth and server load. This can be facilitated to extend the useful life of assets

by deferring ability issues.

Mitigates Liability: Elimination of nasty content from mail stream completely reduces the

chances of "hostile place of work " lawsuits from employees. Although filtering can never be

100% correct the positive defensive actions of utilizing a managed service demonstrates a

good-faith attempt to protect workers to the highest degree technology permits.

Increases Safety measures: Because filtered email never enters infrastructure, reduce the

exposure to virus and other "malware". Since one act as mail agent, ones own equipment no

longer needs to be registered or usually visible on the Internet, eliminating the risk of hacking

and other malicious actions.

Avoids Investment: Unlike purchased hardware or software solutions, managed service has

no investment. The only overheads incurred are month-to-month fees, with no reduction, no

capital outlay, and no upholding contracts.

Entirely Compatible: Services are based on standard Internet protocols and will interoperate

with any new Internet mail infrastructure. The risk is avoided of compatibility issues when

replacing or upgrading other components of infrastructure, unlike solutions based on

software that are only supported with specific mail agents or operating systems.

Improves Reliability: In this the incoming mail is stored during periods of disruption of

local Internet server unavailability and re-delivers when service is again available. This

avoids pointless instances of mail returned to the sender due to local problems. By delivering

outgoing mail, problems will be avoided of server blacklisting due to dynamic addresses on

cable and DSL networks and violence from other subscribers of Internet service provider.

Simply examine the latest anti-spam filtering techniques and hit upon ways how to cut them,

usually done by simply change the message a little. This gave anti spam developers a new

challenge come up with a new anti spam technique; one that was familiar with spammers’

http://www.allspammedup.com/anti-spam/http://www.allspammedup.com/anti-spam/


3/7


ISSN-2455-099X,


3

tactics as they vary over time, and that is capable to adapt to the particular organization that it

is protecting from spam. There are different emails filtering methods.

1) Blacklist: Blacklist comes under the list based filters. This is spam filtering method

attempts to stop unwanted email by blocking messages from the list of sender. Blacklist

contains the records of email addresses. In this when in coming message arrives, the spamfilter checks to see if its IP or email address is on the blacklist. Then it considers the message

as a spam and then reject it.

2) Whitelist: Whitelist blocks spam using a system almost exactly opposite to that of

blacklist. In this if an unknown sender’s email address is checked against the database, if they

have no history of spamming, their message is sent to inbox and then they added to the

whitelist.

3) Word based filtering: Word based filtering comes under the content based filtering it is

the simplest form of filtering .word based filtering is the capable technique for fighting junk

email. For example, if the filter has been set to stop all messages containing the word “acbd”.

But spammers often purposefully misspell keywords in order to evade word based filtering

and this is the main problem in this type of filtering.

4) Bayesian filters: Bayesian filters technique is the most advance content based technique.

It employs the laws of mathematical probability to settle on which message are real and

which message is spam. In this, filter takes words and phrases finding legitimate mails ad

adds them to the list. This method acquires a training time period before it starts running well.

There are other filtering methods like challenge/response system, collaborative filters.

Bayesian spam filtering is the process of using a naive Bayes classifier to identify spam e-

mail. It is depended on the principle that most events are dependent and that the probability

of an event occurring in the future can be inferred from the previous occurrences of that

event. This similar method can be used to classify spam. If some content of text helds often in

spam but not in legitimate mail, then it would be reasonable to predict that this email is

almost certainly spam.

II. LITERATURE REVIEW

Xiaoming JIN, Yuchang LU et al (2003) Index structure that enables efficient similarity

queries in high-dimensional space is crucial for many applications. This paper discusses the

indexing problem in dataset composed of partially clustered data, which exists in number of

applications. Existing index methods are inefficient with incompletely clustered datasets. The

dynamic and adaptive index formation presented here, called a multi-cluster tree (MC-tree),

http://www.allspammedup.com/2009/01/bayesian-spam-filtering-with-exchange-server-2007/http://www.statsoft.com/textbook/stnaiveb.htmlhttp://www.statsoft.com/textbook/stnaiveb.htmlhttp://www.allspammedup.com/2009/01/bayesian-spam-filtering-with-exchange-server-2007/


4/7


ISSN-2455-099X,


4

consists of a set of height-balanced trees for indexing. This index structure improves the

querying efficiency in three ways:

1) Most bounding regions achieve uniform distributions, which results in fewer splits and less

overlap compared with a single indexing tree.

2) The clusters in the dataset are with dynamism detected when the index is updated.

3) The query process does not involve a sequential scan. The MC-tree was shown to be better

than hierarchical and cluster-based indexes for the partially clustered datasets.

This paper presents an index structure for partially clustered datasets which constitute a large

portion of data stored in current information systems. The goal was to make the index

respond efficiently to both clustered and uniform data in one database and to perform queries

on it without losing precision and recall. This index structure improves the query efficiency

in the following ways:

1) Index only the non-clustered data in the main index. It ensures that the main index has

fewer overlaps and splitting compared with a single indexing tree.

2) The clusters in the dataset are dynamically detected when the index is updated, which

ensures the index adaptive and keeps the index from the decrease of performance.

3) During the query process, each data point is retrieved from a hierarchical index, so

sequential scans are not required. Uniform data and partially clustered data were used to

evaluate the performance of MC-tree. The results verified that MC-tree outperformed the

common hierarchical indexes and cluster-based indexes for the partially clustered dataset.

Hovold Johan (2004) in this research, the use of the naive bayes classifier as the basis for

personalised spam filters is explored. According to this paper, the several machine learning

algorithms are explored already, they were included variants of naive bayes, but in this

proposal the author used word position based attribute vectors, through which very good

results are given when they tested on several publically available corpora.

III. RESULTS


5/7


ISSN-2455-099X,


5

Fig. 1: GUI of Work

Fig. 2: Scattering of the dataset on the basis of the class labels spam and non-spam


6/7


ISSN-2455-099X,


6

Fig.3: Classification plotted using Naive Bayes Kernel

Fig.4: Plotting the best choice

IV. CONCLUSION

Email is method of exchanging digital messages from source to destination. The exchange of

messages from an author to one or more. Email messages can be text files, graphics images

and sound files. Email messages are usually encoded in the ASCII text. Spam or unsolicited

e-mail has become a major problem for companies and private users. This paper explored the


7/7


ISSN-2455-099X,


7

various problems associated with spam and different methods and techniques attempting to

deal with it. From the study we identified that, many of the filtering techniques are based on

text categorization methods and there is no technique can claim to provide an ideal solution

with 0% false positive.

REFERENCES

[1] Han, J., Kamber, M., & Pei, J. (2006). Data mining: concepts and techniques. Morgan kaufmann.

[2] Jiawei, H., &Kamber, M. (2001). Data mining: concepts and techniques. San Francisco, CA, itd: Morgan

Kaufmann, Data, C. H. D. (2010). Data Mining: Concepts and Techniques.

[3] Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., &Spyropoulos, C. D. (2000, July). An experimental

comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In

Proceedings of the 23rd annual international ACM SIGIR conference on Research and development ininformation retrieval (pp. 160-167). ACM.

[4] Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C. D., &Stamatopoulos, P.

(2000). Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv

preprint cs/0009009.

[5] Basavaraju, M., &Prabhakar, R. (2010). A novel method of spam mail detection using text based clustering

approach. International Journal of Computer Applications, 5(4).

[6] Hovold, J. (2005, July). Naive bayes spam filtering using word-position-based attributes. In Proceedings of

the 2nd Conference on Email and Anti-Spam (CEAS 2005).

[7] Jin, X., Wang, L., Lu, Y., & Shi, C. (2003). MC-tree: Dynamic index structure for partially clustered multi-

dimensional database. Tsinghua Science and Technology, 8(2), 174-180.

[8] Liu, P. Y., Zhang, L. W., & Zhu, Z. F. (2009). Research on e-mail filtering based on improved Bayesian.

Journal of Computers, 4(3), 271-275.

[9] Rajput, A., &Toshniwal, D. Adaptive Spam Filtering based on Bayesian Algorithm.

[10] Rennie, J. (2000, August). ifile: An application of machine learning to e-mail filtering.In Proc. KDD 2000

Workshop on Text Mining, Boston, MA.

[11] Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998, July). A Bayesian approach to filtering junk

e-mail. In Learning for Text Categorization: Papers from the 1998 workshop (Vol. 62, pp. 98-105).

[12] Song, Y., Kołcz, A., & Giles, C. L. (2009). Better Naive Bayes classification for high ‐ precision spam

detection. Software: Practice and Experience, 39(11), 1003-1024.

[13] TIAN Jinlan, ZHANG Suqin, ZHU Lin, LIU Lu. (2005). Improvement and Parallelism of k-Means

Clustering Algorithm. Department of Computer Science and Technology, Tsinghua University, Beijing 100084,

China, 10(3), 277-281.

Documents

IJTC201510012-Email With Classification Detection Power