[IEEE 2014 International Conference on Networks & Soft Computing (ICNSC) - Guntur, Andhra Pradesh, India (2014.8.19-2014.8.20)] 2014 First International Conference on Networks & Soft

Detection and Blocking of Spammers UsingSPOT Detection Algorithm

Parvati BhadreD. Y. Patil College of Engineering, Akurdi,

University of PunePune, India

Email: [email protected]

Deepali GothawalD. Y. Patil College of Engineering, Akurdi,

University of PunePune, India

Email: [email protected]

Abstract—Spammers are one of the key securityrelated threat on the Internet nowadays. Attackerscan recruit a large number of machines at economicintensive through spamming. Spam zombies are com-promised machines in a network which are involved inthe spamming activities. Spammers use spam zombiesto perform cyber crimes. Spamming causes wastageof network bandwidth. So it is a significant challengefor system administrators to identify and block thespammers in a network. The existing spam zombiedetections algorithms are PT(Percentage Threshold),CT(Count Threshold) and SPOT. This paper showscomparison of these spam zombie detection algorithms.The result shows that SPOT gives good result ascompared to PT and CT. The proposed system as-sists system administrators to automatically detect thespammers in their networks in an online manner. Forspammer detection the system uses a SPOT detectionalgorithm which is based on a statistical tool knownas Sequential Probability Ratio Test (SPRT). Theexperimental results shows that the SPOT detectionalgorithm detects the spammers very effectively. Thesystem blocks the spammers and user can reactivatetheir account by passing a security test. The systemalso detects and deletes emails with virus files in theattachments. The proposed system focuses only ondetection of spammers and not the prevention.

Keywords: blocking spammers; detection ofspammer; spam; SPOT; SPRT

I. INTRODUCTION

It is more challenging and complicated work tofind compromised machines and spammers on theInternet. These compromised machines, which areknown as zombie machines are increasingly usedto spread various security attacks like spreadingmalware and spamming [1] [2]. Attackers can recruita large number of compromised machines in anetwork using spamming activity. An email spamis unsolicited and anonymous email which has beensent to a large group of users. The main focus ofthis proposed system is on detection of users who areinvolved in spamming activity and detection of emailattachments with viruses. A spam zombie is a com-promised machine which is involved in spammingactivity. Spammers perform various security attackslike capturing secrete data of users, click frauds,phishing, etc. So it is necessary to identify andblock such spammers in a network. The proposed

system detects and blocks spammers in a network.The existing spammer detection method detects thespammers in a social network. But our aim is tohelp system administrators to detect the spammersin their own networks. The system deletes emailswith attachments having virus files. To reactivatethe email account, user needs to pass a test. ASPOT detection algorithm is used by the system todetect spammers. The proposed system assists thesystem administrators in automatically identifyingspammers in an online manner. In addition to this italso helps in identifying emails having viruses.

The SPOT detection algorithm is based on astatistical tool known as Sequential Probability RatioTest (SPRT) [3]. Researcher Wald has designed thisstatistical tool in his seminal work [4]. The goalof SPRT is to detect which hypothesis is correctbased on two threshold values. The main advantageof SPRT is that it requires a small number ofobservations to reach to a decision for a givenrate of errors. We can detect the spammers usingother methods such as PT(Percentage Threshold)and CT(Count Threshold) [3]. The PT algorithmcalculates the percentage of spam messages sent bya same user in one time window. The CT algorithmcounts the number of spam messages sent by asame user in one time window. If the calculatedpercentage in PT and spam count in CT crossesthe predefined percentage threshold value, then thatuser is detected as a spammer. Disadvantages of CTand PT algorithms are: one is selecting the “right”values for the input parameters of PT and CT aremuch more challenging and tricky task. Second, theperformance of PT and CT algorithms is sensitive tothese input parameters. SPOT is one more spammerdetection algorithm, which detects the spammer inan online manner. The SPOT algorithm analyses thetotal number of messages sent by the machine andthe user rather than only analyzing the rate at whichthey are sent.

The remainder of the paper is organized as fol-lows. The related work in the area of spammer andspam zombie detection is covered in section II. Theworkflow of the system and module details are givenin section III. The experimental results are given insection IV. We conclude the paper in Section V.

97978-1-4799-3486-7/14/$31.00 c©2014 IEEE

II. RELATED WORK

The previous research work have focused on bothmachine learning and non-machine learning ap-proaches for detection of spam. The machine learn-ing approach includes unified model filtering andensemble filtering to classify received emails asspam or non-spam. Previous works on machinelearning techniques have used Artificial Neural Net-work (ANN), K-nearest Nave Bayesian, and Sup-port Vector Machine (SVM). We have used simplecontent based spam filtering technique. Because ourmain focus is on spammer detection.Here the existing approaches for spammer detectionare given as follows.

• Detection of spammers in social networks:Michael Fire, Gilad Katz, Yuval Elovici havedeveloped a system to detect spammers andfake profiles in social networks [5]. They havedesigned one algorithm which is based on thesocial network’s topology. Based on anomaliesin the topology of the social networks thealgorithm detects users who randomly connectto others.

• Detection of spammers on Twitter: Fabri-cio Benevenuto, Gabriel Magno, Tiago Ro-drigues, and Virgilio Almeidawe have proposeda method for detecting spammers on twitter [6].They have constructed a large labeled collec-tion of users by manually classifying into spam-mers and non-spammers. Then they identifya number of characteristics related to tweetcontent and user social behavior to detect spam-mers. For classifying users as either spammersor non spammers they used these characteristicsas attributes of the machine learning process.In this approach, each user is represented bya vector of values, one for each attribute. Thealgorithm learns a classification model from aset of previously labeled (i.e., pre-classified)data, and then applies the acquired knowl-edge to classify new (unseen) users into twoclasses: spammers and non-spammers. But thisapproach requires a large number of messageview.

• Review spammer detection: Guan, Sihong,Bing, and Philip have developed a system todetect review spammers of online store basedon review graph [7]. This system detects reviewspammers by using an iterative method andreview graph model. The advantage of thissystem is that it uses relationships betweenreview, reviewer and the store instead of onlybehavioral characteristics of the reviewer. Thismethod shows how the information in the re-view graph indicates the causes for spammingand reveals important clues of different typesof spammers.

• Detection of spammers in microblogging: Xia,Jiliang, Yanchao, and Huan have studied thechallenges in detecting social spammers [8].

They have proposed a model which effectivelyintegrates, both content information and so-cial network information to detect spammersin microblogging. They have designed a newframework for Social Spammer Detection inMicroblogging (SSDM). To model the refinedsocial networks a framework employs a di-rected Laplacian formulation, and then inte-grate the network information into a sparsesupervised formulation for the modeling ofcontent information.

The previous work of spammer detection focuseson aggregate characteristics of spammers. But thesesystems do not suite for a network where thenumber of messages generated is less.

The existing approaches to detect spam zombiesand spamming botnet are discussed as follows.

• SAS: Yuvaraj, Suguna have designed a methodby using a Semantic Aware Statistical algo-rithm (SAS). It generates state-transition-graphbased signatures using Hidden Markov Model(HMM) to detect worms. It is the first work,which generates worm signatures by the combi-nation of statistical analysis with semantic anal-ysis [9]. It detects spam zombies and worms,but not the spammer.

• SPOT based zombie detection: Duan et al. de-veloped an efficient and effective SPOT zombiedetection tool which is based on SPRT [3].Existing system detects only the spam zombiesand not the spammer. Our proposed systemdetects the spammers using a SPOT detectionalgorithm. Our system blocks the spammers inan online manner. To activate a user accountagain, the system provides a test.

• An approach based on characteristics of spam-ming botnet: A number of researchers tookefforts to study the aggregate global charac-teristics of a network consisting of compro-mised machines that were involved in spam-ming(known as a spamming botnet). Thesecharacteristics are the spamming patterns ofbotnets and the size of botnets. They havestudied these characteristics based on the sam-pled spam messages which were received froma large email service provider [10] [11]. Forsmall networks the approaches which are de-veloped in the previous work cannot be ap-plied. The outgoing messages that are gen-erated locally on a network, normally cannotprovide the aggregate large scale spam mes-sage view required by these approaches. Onemore limitations is, these approaches cannotsupport the requirement of online detection inthe environment that we consider. Our aim is todevelop a system which will assist the systemadministrators to detect the spammers in theirnetworks in an online manner.

• DBSpam: To detect proxy machines which

98 2014 First International Conference on Networks & Soft Computing

are involved in spamming activities in anetwork, Xie and Wang have developed a toolDBSpam [12]. It relies on the packet symmetryproperty of spamming activities. This toolidentifies the spam proxies that are used totranslate and forward non-SMTP packets (forexample, is HTTP) into SMTP commandsmail servers. The disadvantage of the tool isthat it detects only spam proxies and not allcompromised machines.

Few general botnet detection systems are discussedhere.

• BotHunter: Gu, Porras, Yegneswaran, Fong,and Lee have developed the system that usesIDS dialog trace to detect compromised ma-chines in a network [13]. It correlates inboundintrusion alarms and outbound communicationpatterns to detect infected machines in thenetwork. This system is reliable and scalable.

• BotSniffer: It is an effective, network anomaly-based detection system developed by Guofei,Junjie, and Wenke [14]. It detects both infectedhosts and command & control servers in thenetwork. Here the flows that are connectedto a common server forms a single group. Ifthere are behavioral similarity of flows withina single group, the system detects that corre-sponding host as being compromised.

• BotMiner: Gu, Perdisci, Zhang, and Lee havedeveloped a system known as BotMiner. It isone of the first botnet detection systems that isboth structure and protocol independent [15].In this system, flows are classified intogroups based on similar malicious activitypatterns and similar communication patterns.A compromised machine is detected by thesystem, if there is an intersection of these twogroups.

All these systems intend to detect general botnets.But we aim to detect spam zombies and spammers.

III. IMPLEMENTATION DETAILS

A. Design

The proposed system aims to deletes email withattached viruses. It detects and blocks the spammersby using a SPOT detection algorithm. The accountreactivation test is provided by the system. Fig. 1shows the flow of work on the proposed system.The proposed system works as follows:

• The system receives an email message. Thensystem checks for virus in the attachment.The system deletes email having virus filesin an attachment. If the virus is not found,then the message is checked for spam. Thesystem applies SPOT detection algorithm todetect spammers.

• For every received message probability ratiowill get updated by the algorithm. For everymessage sent by the same user, if the proba-bility ratio of the spammer detection algorithmcrosses a predefined threshold value(calculatedfrom α, β, θ0, and θ1), then that particular useris identified as a spammer and system blocksthat user’s account.

• A blocked user needs to pass a test to reactivatean email account.

Fig. 1. Work flow of the System

B. System Overview

The architecture of the proposed system is de-picted in Fig. 2. The proposed system has fourmodules: 1) Virus checks 2) Spam check and spamfilter 3) Blocking of spammers using SPOT andrecovery.

1) Virus checks:When the user receives an email, the systemwill check the attachment for the virus bycomparing the file extension with a database.The system deletes email if a virus is detected.Virus free messages will be checked for spam.

2) Check spam and spam filter:The system uses content based spam filteringtechnique [16]. It removes the stop wordsfrom message body as well as from spampatterns. After this the system performs astreaming operation, which includes obtaininga unique word by removing the characterslike ing, ly etc. If sufficient number of spam

2014 First International Conference on Networks & Soft Computing 99

Fig. 2. System Architecture

words are detected in the message body, thesystem detects that email as a spam.

3) Blocking of spammers using SPOT andRecovery:Let us consider, X as a Bernoulli randomvariable with an unknown parameter θ. Xi

for i = 1, 2, .. . It denotes the successiveobservations of X, which corresponds to thesequence of email messages sent by user U .We let Xi = 1 if message i sent by user U isa spam, and Xi = 0 otherwise. Here SPRTis used to test a hypothesis H0 that θ = θ0against a single alternative H1 that θ = θ1.Here θ0 and θ1 are calculated as [3],

Pr (Xi = 1|H0) = 1− Pr (Xi = 0|H0) = θ0

Pr (Xi = 1|H1) = 1− Pr (Xi = 0|H1) = θ1

Here the logarithm of the probability ratio forany integer n is calculated as follows [3],

Λn = lnPr (X1, X2, ..., Xn|H1)

Pr (X1, X2, ..., Xn|H0)

• SPOT based spammer detection algo-rithm:Input:

– α is False positive error value– β is False negative error value– θ0 is Probability of a message being

spam when the user is normal– θ1 is Probability of a message being

spam when the user is a spammer

1. For every email, algorithm gets an email IDof the sender.2. Consider ‘U ′ be a user and n be a messageindex sent by ‘U ′.

3. If message Xn be a spam, then the probabil-ity ratio for that particular user will be updatedas Λn+ = ln

θ1θ0

else it will be updated as

Λn+ = ln1−θ11−θ0

.4. If Λn ≥ B, then User ‘U ′ is identified asa spammer and algorithm terminates for theuser ‘U ′. Then user ‘U ′ will be blocked.5. If Λn ≤ A, then user ‘U ′ is normal userand a test is reset for ‘U ′ as Λn = 0.6. In other cases, test continues with newobservations. The threshold values of A andB are calculated as [3]:

A = lnβ

1− α,B = ln

1− β

α

7. A blocked user can reactivate an emailaccount by passing account activation test.This test contains security related questionson emails received by that user. If the usergives all correct answers below the predefinedthreshold value, then that user will be blocked;else the account will be activated.

IV. RESULTS

The proposed system maintains the database on amachine where it will be running. To detect the spammessage some spam pattern will be used. To find anemail having virus files, dataset with unique patternswill be used. These unique patterns will be the fileextensions. All incoming email messages will bescanned against the dataset. The dataset will be keptupdated. We have assumed that input to the systemwould be any text message or plain email.

• Following tables I show the performance ofspam zombies detection algorithms. The resulttable shows that SPOT outperforms CT and PT.

TABLE ICOMPARISON BETWEEN SPAM ZOMBIES DETECTION

ALGORITHMS

Total IP address traced SPOT CT PT20 16 8 6

• Following table II shows the total number ofspam messages detected by the proposed sys-tem.

• Following table III shows the total number ofspammers detected by the proposed system.

• The proposed system effectively deletes anemail message with attached virus files.

100 2014 First International Conference on Networks & Soft Computing

TABLE IITOTAL NUMBER OF SPAM MESSAGES DETECTED BY THE

SYSTEM

Day Total email Spam detected Non spammessages traced messages

Day 1 39 15 24Day 2 36 16 20Day 3 43 27 16Day 4 20 11 9Day 5 30 10 20Day 6 25 14 9Day 7 17 15 2Day 8 45 30 15Day 9 55 24 31

Day 10 62 30 32

TABLE IIITOTAL NUMBER OF SPAMMERS DETECTED BY THE SYSTEM

Day Total Active Total Spammers Total Normalusers detected users

Day 1 5 3 2Day 2 6 4 2Day 3 10 7 3Day 4 5 3 2Day 5 4 2 2Day 6 6 3 3Day 7 3 2 1Day 8 4 3 1Day 9 5 3 2

Day 10 6 4 2

V. CONCLUSION

The existing of spammers and zombie machines isa key security threat on the Internet. The proposedsystem uses a SPOT detection algorithm to detectspammers in the network. It can be concluded thatthe SPOT detection algorithm gives good results incase of spammer detection. The advantage of SPOTis that it requires a small number of observationsto reach a decision. One more advantage is thatthe performance of SPOT algorithm is less sensitiveto the input parameters. The proposed system notonly detects spammers, but also blocks spammer’semail account ID. This causes the spammers toreactivate or create a new email account repeatedly.The system also deletes emails having attached virusfiles. The system enables a system administrators toautomatically identify the spammers in their networkwho wastes network bandwidth. Here the systemassumes only text messages as a input. But infuture the email messages containing the images andvideos can be scanned for spams.

ACKNOWLEDGMENT

The author would like to thank the researchers,publishers for making their resources available andteachers for their guidance.

We also thank the college authority for providingthe required infrastructure and support. Finally, wewould like to extend a heartfelt gratitude to friendsand family members.

REFERENCES

[1] T. H. Felix C. Freiling and G. Wicherski, “Botnet tracking:Exploring a root-cause methodology to prevent distributeddenial-of-service attacks,” ESORICS, 2005.

[2] N. Ianelli and A. Hackworth, “Botnets as a vehicle foronline crime,” In Proc. of First International Conferenceon Forensic Computer Science, pp. 1–30, 2006.

[3] F. S. Y. D. M. S. Zhenhai Duan, Peng Chen and J. Barker,“Detecting spam zombies by monitoring outgoing mes-sages,” IEEE Transactions Dependable and Secure Com-puting, vol. 9, no. 2, 2012.

[4] A. Wald, “Sequential analysis,” John Wiley and Sons, Inc,1947.

[5] Y. E. Michael Fire, Gilad Katz, “Strangers intrusion de-tection - detecting spammers and fake profiles in socialnetworks based on topology anomalies,”

[6] T. R. Fabricio Benevenuto, Gabriel Magno and V. Almei-dawe, “Detecting spammers on twitter,” Proceedings of the26th Annual Computer Security Applications Conference,2010.

[7] G. Wang, S. Xie, B. Liu, and P. S. Yu, “Review graph basedonline store review spammer detection,” IEEE InternationalConference on Data Mining, pp. 1242–1247, 2011.

[8] Y. Z. Xia Hu, Jiliang Tang and H. Liu, “Social spammerdetection in microblogging,” 23rd International Joint Con-ference on Artificial Intelligence, pp. 1242–1247, August3-9, 2013.

[9] M. S. S. Yuvaraj, “An effective defense against com-promised machines by sas worm detection,” InternationalJournal of Computer Science and Management Research,pp. 33–37, 2013.

[10] K. G. Z. Duan and X. Yuan, “Behavioral characteristicsof spammers and their network reachability properties,”Proceedings of IEEE ICC, 2007.

[11] K. A. R. P. G. H. Y. Xie, F. Xu and I. Osipkov, “Spam-ming botnets: Signatures and characteristics,” In Proc. ACMSIGCOMM, pp. 171–182, August 2008.

[12] H. Y. M. Xie and H. Wang, “An effective defense againstemail spam laundering,” In ACM Conference on Computerand Communications Security, November 2006.

[13] V. Y. M. F. G. Gu, P. Porras and W. Lee, “Bothunter: De-tecting malware infection through ids-driven dialog corre-lation,” In Proc. 16th USENIX Security Symposium, August2007.

[14] J. Z. Guofei Gu and W. Lee, “Botsniffer: Detecting botnetcommand and control channels in network traffic,” Proceed-ings of The 15th Annual Network and Distributed SystemSecurity Symposium (NDSS 2008), February 2008.

[15] J. Z. G. Gu, R. Perdisci and W. Lee, “Botminer: Clusteringanalysis of network traffic for protocol- and structure-independent botnet detecion,” In Proc. 17th USENIX Se-curity Symposium, July 2008.

[16] J. Aycock and N. Friess, “Spam zombies from outer space,”pp. 1–16, January 2006.

2014 First International Conference on Networks & Soft Computing 101

Documents

[IEEE 2014 International Conference on Networks & Soft Computing (ICNSC) - Guntur, Andhra Pradesh, India (2014.8.19-2014.8.20)] 2014 First International Conference on Networks & Soft