11
Research Article Integrating Traffics with Network Device Logs for Anomaly Detection Jiazhong Lu, 1 Fengmao Lv , 2 Zhongliu Zhuo, 1 Xiaosong Zhang , 1 Xiaolei Liu, 1 Teng Hu , 1 and Wei Deng 2 1 Center for Cyber Security, University of Electronic Science and Technology of China, Chengdu 611731, China 2 School of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, China Correspondence should be addressed to Fengmao Lv; [email protected] and Xiaosong Zhang; [email protected] Received 25 December 2018; Revised 30 April 2019; Accepted 15 May 2019; Published 13 June 2019 Guest Editor: Pelin Angin Copyright © 2019 Jiazhong Lu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Advanced cyberattacks are oſten featured by multiple types, layers, and stages, with the goal of cheating the monitors. Existing anomaly detection systems usually search logs or traffics alone for evidence of attacks but ignore further analysis about attack processes. For instance, the traffic detection methods can only detect the attack flows roughly but fail to reconstruct the attack event process and reveal the current network node status. As a result, they cannot fully model the complex multistage attack. To address these problems, we present Traffic-Log Combined Detection (TLCD), which is a multistage intrusion analysis system. Inspired by multiplatform intrusion detection techniques, we integrate traffics with network device logs through association rules. TLCD correlates log data with traffic characteristics to reflect the attack process and construct a federated detection platform. Specifically, TLCD can discover the process steps of a cyberattack attack, reflect the current network status, and reveal the behaviors of normal users. Our experimental results over different cyberattacks demonstrate that TLCD works well with high accuracy and low false positive rate. 1. Introduction Cyberattacks usually leave footprints on network devices. Typically, an attacker’s attack path jumps through multiple routers or servers and then uploads malicious code (e.g., XSS script), implants virus (e.g., botnet), and submits Trojaned soſtware or unofficial patch containing malicious payloads [1–7]. Generally, the footprints leſt by cyberattacks are spatiotemporally dispersed across logs of different victims’ machines [8]. For instance, XSS script attack may leave evi- dence in server’s weblog. However, as the logs are dispersed across diverse disconnected sources, piecing together the contextual information of each malicious footprint still needs human involvement. erefore, directly leveraging the logs for anomaly detection is ineffective. On the other hand, network traffic can also provide complementary evidence for attack-related activities, such as anomalous data about connections from IRC/HTTP/DNS servers to botnet. How- ever, it is insufficient to precisely detect attack behaviors and grasp a complete view of attacks with only the network traffic data. To date, most existing log-based or traffic-based intrusion detection methods have the following limitations: (1) ey only focus on a single or a few logs, lacking context infor- mation (especially the contacts in internal network). (2) e traffic characteristics are not diverse enough to achieve good detection performance. (3) Both the log-systems and the traffic-systems heavily rely on heſty equipment, which incurs very large cost overhead [9]. (4) e false positives and false negatives are not satisfactory in realistic detection process [10]. In this paper, we propose to integrate traffics with network device logs for detecting cyberattacks. Specifically, we collect logs and traffics from switch, router, firewall, and servers. en we use fuzzy association rules to integrate the device logs with traffics to reconstruct the cyberattack. Overall, the main contributions of this paper are listed as follows: (1) We propose a novel combined detection method to reconstruct the attack process. (2) Our method Hindawi Security and Communication Networks Volume 2019, Article ID 5695021, 10 pages https://doi.org/10.1155/2019/5695021

Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

Research ArticleIntegrating Traffics with Network Device Logs forAnomaly Detection

Jiazhong Lu1 Fengmao Lv 2 Zhongliu Zhuo1 Xiaosong Zhang 1 Xiaolei Liu1

Teng Hu 1 and Wei Deng 2

1Center for Cyber Security University of Electronic Science and Technology of China Chengdu 611731 China2School of Statistics Southwestern University of Finance and Economics Chengdu 611130 China

Correspondence should be addressed to Fengmao Lv fengmaolv126com and Xiaosong Zhang cdgbsjfzx126com

Received 25 December 2018 Revised 30 April 2019 Accepted 15 May 2019 Published 13 June 2019

Guest Editor Pelin Angin

Copyright copy 2019 Jiazhong Lu et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Advanced cyberattacks are often featured by multiple types layers and stages with the goal of cheating the monitors Existinganomaly detection systems usually search logs or traffics alone for evidence of attacks but ignore further analysis about attackprocesses For instance the traffic detectionmethods can only detect the attack flows roughly but fail to reconstruct the attack eventprocess and reveal the current network node status As a result they cannot fully model the complex multistage attack To addressthese problems we present Traffic-Log Combined Detection (TLCD) which is a multistage intrusion analysis system Inspiredby multiplatform intrusion detection techniques we integrate traffics with network device logs through association rules TLCDcorrelates log data with traffic characteristics to reflect the attack process and construct a federated detection platform SpecificallyTLCD can discover the process steps of a cyberattack attack reflect the current network status and reveal the behaviors of normalusers Our experimental results over different cyberattacks demonstrate that TLCD works well with high accuracy and low falsepositive rate

1 Introduction

Cyberattacks usually leave footprints on network devicesTypically an attackerrsquos attack path jumps through multiplerouters or servers and then uploads malicious code (eg XSSscript) implants virus (eg botnet) and submits Trojanedsoftware or unofficial patch containing malicious payloads[1ndash7] Generally the footprints left by cyberattacks arespatiotemporally dispersed across logs of different victimsrsquomachines [8] For instance XSS script attack may leave evi-dence in serverrsquos weblog However as the logs are dispersedacross diverse disconnected sources piecing together thecontextual information of each malicious footprint still needshuman involvement Therefore directly leveraging the logsfor anomaly detection is ineffective On the other handnetwork traffic can also provide complementary evidencefor attack-related activities such as anomalous data aboutconnections from IRCHTTPDNS servers to botnet How-ever it is insufficient to precisely detect attack behaviors and

grasp a complete view of attacks with only the network trafficdata

To datemost existing log-based or traffic-based intrusiondetection methods have the following limitations (1) Theyonly focus on a single or a few logs lacking context infor-mation (especially the contacts in internal network) (2)Thetraffic characteristics are not diverse enough to achieve gooddetection performance (3) Both the log-systems and thetraffic-systems heavily rely on hefty equipment which incursvery large cost overhead [9] (4)The false positives and falsenegatives are not satisfactory in realistic detection process[10] In this paper we propose to integrate traffics withnetwork device logs for detecting cyberattacks Specificallywe collect logs and traffics from switch router firewall andservers Then we use fuzzy association rules to integrate thedevice logs with traffics to reconstruct the cyberattack

Overall the main contributions of this paper are listedas follows (1) We propose a novel combined detectionmethod to reconstruct the attack process (2) Our method

HindawiSecurity and Communication NetworksVolume 2019 Article ID 5695021 10 pageshttpsdoiorg10115520195695021

2 Security and Communication Networks

can effectively integrate multiple network device logs withtraffics by leveraging fuzzy association rules (3)We conductextensive evaluation of TLCD over diverse cyberattacks (egphishing XSS and botnet) The experimental results clearlydemonstrate the effectiveness of our method

2 Related Work

In general network intrusion detection mainly includessignature-based detection and anomaly-based detection [1112] Specifically signature-based detection relies on exist-ing signature databases to detect malware infections Byusing signature-based detection methods malwares can beeffectively identified through pattern matching Howeversignature-based detection techniques have the fatal disadvan-tage that a new malware infection cannot be detected if itssignature is not contained in the signature database

Anomaly detection is a technique for detecting abnormalbehaviors that deviate from normal behaviors [13 14] Specif-ically it aims to detect events in the monitored domain dif-ferent from the pattern defined by normal behaviors [13 15]Basically the normal behavior of the network needs to beidentified first Compared to the signature-based detectionthe main advantage of anomaly-based intrusion detectionis the ability to detect new or unknown attacks sinceabnormal behaviors can also occur when the signature ofnewmalware is not available However due to the complexityof the behaviors from different networks and applicationsit is difficult to accurately identify the normal behaviorsThe existing anomaly detection methods are usually basedon device logs or traffic flows alone [16] In general theirmethods are too simple to achieve satisfactory results [17]Additionally they fail to effectively reconstruct the attackconditions [18] On the contrast our detection method ismultinetwork device interrelated and verified and can furtherimprove the accuracy and reflect the state of the networkenvironment at the time

3 Method

This work mainly proposes to implement anomaly detec-tion through integrating multiple network device logs withtraffics Specifically the device logs and traffics are inte-grated through association rules Our method can effectivelyimprove the detection performance and reconstruct thenetwork attack process which enables us to grasp a completeview of the entire network environment

31 Method Overview Due to the diversity of networkattacks the network environment is quite complex Forinstance botnets must first send control commands to eachCampC server and then to the controlled-host while wormneeds first to upload malware code to target-host and theninfect others computers by the target-host Therefore thenetwork device logs and traffic flows can play very importantroles in cyberattack detection To collect our data for anomalydetection we first obtain the traffic flows with port mirroringand adopt TCPDUMP to extract useful traffic attributes (such

Traffic capture Log Collect

Data Filter Feature Extraction

Fuzzy AssociationAnomaly Detection

Benign Anomaly

Restore Attacks

Figure 1 The overview of TLCD

as the source port destination port protocol number sourceIP destination IP the number of packet size and send time)Additionally we also extract log information (such as DataTimeModule Level PID Type Action Application Reasonetc) from the gatewayrsquos internal routers switches firewallsand servers After that we attempt to extract the mappingrelations between logs and traffics with association rulesFinally the extracted relationships can be used to generate thetime stamps of log records and reconstruct the attack process

In Figure 1 we display the overview of TLCD Thelocations of the traffic captures and log collectors are shown inFigure 2 Traffic capture modules are placed at the universityservers and enterprise servers which are connected outsidethe Internet In thisway we can capture both the inbound andoutbound data in real time The inputs to TLCD are multipleraw data from network devices (eg router switch firewalland servers) The detection is specifically designed to be for-mat agnostic for both the traffics and logs Through a parserplugin TLCDcanhandle any input format of traffics and logsAs the detection task is usually featured by large-scale datafiltering is necessary for reducing the detection cost and time

Security and Communication Networks 3

ROUTE1

ROUTE2 SERVER1Set traffic capture FIREWALL1

SWITCH1SERVER2

Set traffic capture

SERVER3Set traffic capture

FIREWALL2

PC

PC

PC

PC

PUBLIC NETWORK

SERVER

SERVER

SERVER

Figure 2 The deployment of the traffic captures and log collectors (server-1 firewall-1 router-1 and router-2 are deployed in enterpriseServers-2 switch-1 firewall-1 and servers-3 are deployed in campus)

overhead Therefore we will filter out the irrelevant data inboth logs and traffics In the feature extraction module weextract 63 traffic features (including 5 new TCP flags) and 16log features The log features are mainly used to reveal whathappened before and after the cyberattack and auxiliarilydetect the anomaly behaviors The fuzzy association moduleaims at modeling strong associations between traffics andlogs After obtaining the confidence intervals of candidaterules the ones among high confidence intervals are used toconstruct strong association rules Note that there are manykinds of mappings here It is possible that multiple trafficsrelate to one single log or one traffic relates to several logs

To present the mechanisms of TLCD we list the details asfollows

(1) Phase I reprocessing the traffic captures and logcollectors first collect the original traffics and logswhich are considered as the inputs to TLCD Thenthese traffic-log inputs are reprocessed via the parserplugin and the filter module Specifically the filtermodule filters out the irrelevant data in both logs andtraffics

(2) Phase II feature extraction the feature extractionmodule consists of five components including thetraffic correlation (to analyze the traffic packets formalware) temporal correlation (to obtain the timecharacteristics of malware) combination correlation(to model the strong relevance of malware traffic)TCP flag (to record the sending and responding oftraffic data) and log-information (to record the loginformation for attack reconstruction) components

(3) Phase III fuzzy association module this module aimsto integrate the traffics with logs through associationrules

Table 1 Details of the TCP flags

TCP flags

TCP handshake situationACK URG FIN RST valuesThe destination IP repeatedly responds with ACK = 1The destination IP only has ACK = 1 SYN = 1 andFIN=1The source IP only has SYN = 1

(4) Phase IV anomaly detection module advancedmachine learning techniques are adopted to recognizemalicious data as anomalies

(5) Phase V attack reconstruction module the recon-struction module leverages the associations betweenthe logs and traffics to generate the time stamps of logrecords which can be finally used to reconstruct theattack process

32 Feature Extraction

321 Network Traffics To collect the traffic-log data wehave monitored the university-enterprise network for onemonth The features used in our framework include temporalcorrelation features [19] TCP flag features (displayed inTable 1) and log features (displayed in Table 2) Usuallysome cyberattacks such as botnets and phishing emailsneed to automatically send commands through programsThese automatic attack commands more or less containinherent patterns Specifically we capture the traffics from 4common cyberattacks (XSS HTTP botnet P2P botnet andphishing) and find that different network attacks behave dif-ferently in the TCP handshake stage For instance phishingmail transmission process adopts POP3 and IMP4 protocolallowing attackers to send different types of files Also the

4 Security and Communication Networks

Table 2 Detailsof the network device logs

Firewall logs Traffic logs Event logs Networklogs Security logs System

logs Cron logs Mail logs Messageslogs

Mysqldlogs

Data lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastTime lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastModule lowast lowast lowastLevel lowast lowast lowast lowast lowast lowast lowastPID lowast lowast lowast lowast lowast lowastType lowast lowast lowast lowast lowast lowast lowastAction lowast lowast lowast lowast lowastSource lowastDestination lowastTranslatedSource lowastTranslatedDestination lowastDuration lowastBytes Sent lowastBytes Received lowastApplication lowast lowastReason lowast lowast lowast lowast lowast lowast lowast lowast lowast lowast

corresponding data volume can be very small Thereforethe phishing mails can result in the same TCP handshakestates to the normal ones and it can be kinda easy toestablish a connection However in a botnet an attackerneeds to control the CampC server and send a large number ofcommands whichwill inevitably cause a handshake failure inTCPhandshake InTable 1 we display the details of TCPflagsSpecifically SYNdenoteswhether a connection is establishedFIN and ACK denote the corresponding responses and RSTdenotes the connection reset Note that the ACK informationcan be used together with SYN and FIN as evidences forattack detection For instance if both SYN and ACK areactivated it means that the connection is established withconfirmation On the contrast if only SYN is activated wecan conclude that that the connection is established withoutconfirmation Usually most of the unreachable attacks canonly activate SYN Additionally for the situation with FINand RST activated and SYN unactivated the firewalls maystill detect the SYNFIN packet When such a packet appearsin the situation it is most likely that the network has beenattacked As the ACKFIN packet represents a completedTCP connection a normal FIN packet is always marked byACK A ldquoNULLrdquo packet is the packet not marked by any TCPflags (URG ACK PSH RST SYN and FIN are all set to 0)For normal network activities theTCP stack cannever gener-ate packets featured by unreasonable TCP flag combinationsotherwise the networks have been attacked Therefore theTCP flag features can provide useful information about thenetwork status [19]

322 Network Device Logs In Table 2 we display the detailsof network device logs As we can see different types of

network device logs have different characteristics Specifi-cally the firewall logs record the events between the insideand outside the network such as port filtering hazard leveland authentication the traffic logs record current trafficconditions such as packet size IP address and duration theevent logs record events that occur during the execution ofthe system in order to provide traces for activity monitorand problem diagnosis the network logs record the processof network access such as data packet request or uploadingthe security logs mainly record the operations of networkdevices and the system errors the system logs record thehardware and software errors as well as events that occurin the monitoring system allowing the user to check thecause of errors and find traces left by the attacker theCron logs record periodic tasks in Linux (Cron reads theconfiguration files and writes them in memory when Linuxstarts As there exist some cyberattacks featured by cyclicalityCron logs are effective for identifying this type of attack)the mail logs allow the administrators to get the copies ofmessages processed by the Domino system router (whenthe mail log is enabled Domino will check the messagesas they go through MAILBOX and save their copies toMAILJRNNSF for future recovery) the message logs areplain text files that will be first checked for error messageswhen a problem occurs theMySQL logs contain informationof log-err query log log-slow-queries log-update and log-bin By default all logs are created in the MySQL directoryIn this work we extract attributes from ten types of logsfrom different network devices Each type of log has itsown attributes For instance the firewall log has attributes ofdata time module level type and reason while the trafficlogs have attributes of data time action source destinationtranslated source translated destination duration bytes sent

Security and Communication Networks 5

bytes received application and reason Although differentlogs reflect different characteristics of the device status theshared attributes such as time date and reason can beeffectively used to infer the status of one event in differentlogs

33 Anomaly Detection

331 Feature Integration through Association Rules In dailynetworks there are no direct correlations between the logdata and the traffic data However they can be correlatedthrough the shared attributes like time and date Thereforewe need to model the mutual mappings between trafficsand logs by effectively leveraging the correlated attributesWith that we can obtain the classification boundaries ofthe log attributes based on the corresponding attributes oftraffics

The discretization of traffic features which is useful forboundary division plays an important role in detectinganomaly traffics Specifically we use the Fuzzy-C Means(FCM) algorithm to divide the traffic characteristics (includ-ing quantitative attributes and Boolean attributes) into sev-eral fuzzy sets Note that the elements and nonelements ofeach fuzzy set can be mutually transformed in order toachieve the goal of softening features In the process of highlyskewed data FCM algorithm can effectively model the actualdistribution of data and clearly reveal the boundary betweennormal data and anomalies

In our method we first extract 29 basic attributes ofthe traffics including the five-tuple (source IP addressdestination IP address source port destination port andprotocol number) the total number of uplink and downlinkpackets the total number of uplink and downlink payloadpackets the total amount of uplink and downlink load flowduration average load the maximum load the minimumload average time interval between the uplink and downlinkdata packets the minimum time interval the maximum timeinterval and so on Then we extract 16 basic attributes ofthe logs including data time module PID type actionsource destination translated source and destination dura-tion bytes sent and received application reason and soon We assume that each feature comes from a Gaussiandistribution Then according to the membership function offuzzy recognition we can determine the fuzzy numbers of themaximum fuzzy set Denoting the center of the maximumfuzzy set as 120583 the membership degree as 119903119894 (i = 1 2 3 119899)and 120590 as the parameter the Gaussian fuzzy expression can berepresented as follows

y = exp[minus(119909 minus 120583)21205902 ] (1)

To approximate the maximum membership degree wedesign the objective function as

119892 (120590) = 119899sum119894=1

exp[minus (119909119894 minus 120583)21205902 ] minus 1199031198942

(2)

The corresponding membership function is expressed as

119860 119894119895 (119909119895) =

0 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 gt 21199041198951 minus (119909119895 minus 1199091198952119904119895 )

2 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 le 2119904119895(3)

where 119909119895 is the center and 2119904119895 is the standard deviation 120590Finally we can identify whether a sample is an anomaly basedon the principle of the maximum membership

332 Anomaly Detection The key point in anomaly detec-tion is to detect anomalies from benign data according tothe extracted features To achieve this we adopt supervisedlearning methods such as K-Nearest Neighbor (KNN) Sup-port Vector Machine (SVM) neural networks or decisiontrees to design the detection module Basically supervisedlearning first needs to establish a training set and thentrain a classification model over the training set For thisanomaly detection task our goal is to learn a classifier thatcan effectively detect out the anomalies In this work weadopt Gradient Boosting Decision Tree (GBDT) which isan advanced machine learning technique and models thedata with an ensemble of decision trees To finally evaluatethe performance of our method 10-fold cross validation isadopted in our work

4 Experiments

41 Dataset In our experiments we evaluate our methodover 4 types of network attack (XSS HTTP botnet P2P bot-net and phishing) These cyberattacks are carefully injectedinto the normal business and will not bring undesirableeffects for other business Both the traffic data and the logdata are collected from university servers and enterpriseservers To obtain the traffic data we have monitored theuniversity-enterprise network for one month Specifically wesimulate the P2P botnet and HTTP botnet attack accordingto the Contagio blog [21] and white paper [22] whichprovide guidance about how to make botnet evade intrusiondetection techniques To simulate the XSS attacks we injectmalicious code into the web pages of university serversThe simulated phishing emails are sent to both universityservers and enterprise servers Note that in our simulationthe anomaly traffics only account for 01 of the total trafficflows which is close to the real situations As displayed inTable 3 the collected data include 30 normal traffic datasets6 traffic datasets for XSS injection attacks 5 traffic datasets forphishing emails and 20 ones for botnets (13 P2P botnets and7 HTTP botnets) On the other hand as displayed in Table 4the log data are collected from 1 switch 2 routers 2 firewallsand 3 servers

42 Experimental Results To validate the performance ofintegrating traffics with logs for anomaly detection weconduct comparison experiments through only leveragingthe traffic data (or the log data) As displayed in Tables5ndash8 and Figures 3ndash6 it is clear that neither traffics norlogs can independently achieve desirable results in detecting

6 Security and Communication Networks

Table 3 The collection details for traffic data

Type Traffic Amount NameNormal 30 NAXSS 6 NAPhishing 5 NAHTTP botnets[20] 7 Virut SogouP2P botnets [21] 13 NSISay SMTP Spam Zeus (CampC) UDP Storm Zeus Zero access Weasel

Table 4 The collection details for log data

Device name Quantity BrandSwitch 1 HuaweiRouter 2 CiscoHuaweiFirewall 2 JuniperServer 3 Cisco

Table 5 The detection results over XSS attack

XSS FP FN10-fold KNN for traffics 82 5610-fold SVM for traffics 86 5810-fold KNN for logs 91 9910-fold SVM for logs 90 8610-fold SVM for logs-and-traffics 52 6310-fold KNN for logs-and-traffics 62 36TLCD (GBDT) 43 25

Table 6 The detection results over phishing email

Phishing FP FN10-fold KNN for traffics 71 7310-fold SVM for traffics 65 7310-fold KNN for logs 88 8310-fold SVM for logs 79 8210-fold SVM for logs-and-traffics 50 6010-fold KNN for logs-and-traffics 55 48TLCD (GBDT) 53 49

Table 7 The detection results over HTTP botnet

Http Botnet FP FN10-fold KNN for traffics 55 4810-fold SVM for traffics 53 5010-fold KNN for logs 63 5910-fold SVM for logs 63 5810-fold SVM for logs-and-traffics 36 2910-fold KNN for logs-and-traffics 38 27TLCD (GBDT) 25 28

cyberattacks (both the False Negative (FN) and False Positive(FP) values decrease significantly) which is consistent with[16] On the contrast when we integrate the traffic flowswith network device logs the detection performance canbe significantly improved Additionally we also compare

Table 8 The detection results over P2P botnet

P2P botnet FP FN10-fold KNN for traffics 45 4610-fold SVM for traffics 52 5010-fold KNN for logs 64 6010-fold SVM for logs 60 5910-fold SVM for logs-and-traffics 29 2910-fold KNN for logs-and-traffics 33 29TLCD (GBDT) 28 26

0010203040506070809

1

0 1 2 3 4 5 6 7

XSS attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 3TheF1 value obtained by eachmethod over the XSS attack

the detection performance of different supervised learningmethods including SVM KNN and GBDT As we can seethese compared methods can achieve very similar resultswith GDBT slightly better than the others This effectivelydemonstrates that the features obtained through integratingtraffics with logs are robust for our cyberdetection task

43 Attack Reconstructions In our experiments we alsoevaluate the performance of TLCD on attack reconstructionIn particular for these detected attacks we first obtaintheir time horizon and communication address accordingto the information of the corresponding anomaly trafficssuch as data time IP and so on With that we can get thecorresponding log features and then the concrete networkdevices are determined Finally we reconstruct the originalattack paths based on the abnormal information above

Figures 7ndash10 display our attack reconstruction results forthe four simulated cyberattacks Generally the XSS attack

Security and Communication Networks 7

0010203040506070809

1

0 1 2 3 4 5 6 7

Phishing attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

Timestamp delta

F1

Figure 4 The F1 value obtained by each method over the phishing email

0010203040506070809

1

0 1 2 3 4 5 6 7

P2P botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 5 The F1 value obtained by each method over the P2P botnet

0010203040506070809

1

0 1 2 3 4 5 6 7

HTTP botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 6 The F1 value obtained by each method over the HTTP botnet

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

2 Security and Communication Networks

can effectively integrate multiple network device logs withtraffics by leveraging fuzzy association rules (3)We conductextensive evaluation of TLCD over diverse cyberattacks (egphishing XSS and botnet) The experimental results clearlydemonstrate the effectiveness of our method

2 Related Work

In general network intrusion detection mainly includessignature-based detection and anomaly-based detection [1112] Specifically signature-based detection relies on exist-ing signature databases to detect malware infections Byusing signature-based detection methods malwares can beeffectively identified through pattern matching Howeversignature-based detection techniques have the fatal disadvan-tage that a new malware infection cannot be detected if itssignature is not contained in the signature database

Anomaly detection is a technique for detecting abnormalbehaviors that deviate from normal behaviors [13 14] Specif-ically it aims to detect events in the monitored domain dif-ferent from the pattern defined by normal behaviors [13 15]Basically the normal behavior of the network needs to beidentified first Compared to the signature-based detectionthe main advantage of anomaly-based intrusion detectionis the ability to detect new or unknown attacks sinceabnormal behaviors can also occur when the signature ofnewmalware is not available However due to the complexityof the behaviors from different networks and applicationsit is difficult to accurately identify the normal behaviorsThe existing anomaly detection methods are usually basedon device logs or traffic flows alone [16] In general theirmethods are too simple to achieve satisfactory results [17]Additionally they fail to effectively reconstruct the attackconditions [18] On the contrast our detection method ismultinetwork device interrelated and verified and can furtherimprove the accuracy and reflect the state of the networkenvironment at the time

3 Method

This work mainly proposes to implement anomaly detec-tion through integrating multiple network device logs withtraffics Specifically the device logs and traffics are inte-grated through association rules Our method can effectivelyimprove the detection performance and reconstruct thenetwork attack process which enables us to grasp a completeview of the entire network environment

31 Method Overview Due to the diversity of networkattacks the network environment is quite complex Forinstance botnets must first send control commands to eachCampC server and then to the controlled-host while wormneeds first to upload malware code to target-host and theninfect others computers by the target-host Therefore thenetwork device logs and traffic flows can play very importantroles in cyberattack detection To collect our data for anomalydetection we first obtain the traffic flows with port mirroringand adopt TCPDUMP to extract useful traffic attributes (such

Traffic capture Log Collect

Data Filter Feature Extraction

Fuzzy AssociationAnomaly Detection

Benign Anomaly

Restore Attacks

Figure 1 The overview of TLCD

as the source port destination port protocol number sourceIP destination IP the number of packet size and send time)Additionally we also extract log information (such as DataTimeModule Level PID Type Action Application Reasonetc) from the gatewayrsquos internal routers switches firewallsand servers After that we attempt to extract the mappingrelations between logs and traffics with association rulesFinally the extracted relationships can be used to generate thetime stamps of log records and reconstruct the attack process

In Figure 1 we display the overview of TLCD Thelocations of the traffic captures and log collectors are shown inFigure 2 Traffic capture modules are placed at the universityservers and enterprise servers which are connected outsidethe Internet In thisway we can capture both the inbound andoutbound data in real time The inputs to TLCD are multipleraw data from network devices (eg router switch firewalland servers) The detection is specifically designed to be for-mat agnostic for both the traffics and logs Through a parserplugin TLCDcanhandle any input format of traffics and logsAs the detection task is usually featured by large-scale datafiltering is necessary for reducing the detection cost and time

Security and Communication Networks 3

ROUTE1

ROUTE2 SERVER1Set traffic capture FIREWALL1

SWITCH1SERVER2

Set traffic capture

SERVER3Set traffic capture

FIREWALL2

PC

PC

PC

PC

PUBLIC NETWORK

SERVER

SERVER

SERVER

Figure 2 The deployment of the traffic captures and log collectors (server-1 firewall-1 router-1 and router-2 are deployed in enterpriseServers-2 switch-1 firewall-1 and servers-3 are deployed in campus)

overhead Therefore we will filter out the irrelevant data inboth logs and traffics In the feature extraction module weextract 63 traffic features (including 5 new TCP flags) and 16log features The log features are mainly used to reveal whathappened before and after the cyberattack and auxiliarilydetect the anomaly behaviors The fuzzy association moduleaims at modeling strong associations between traffics andlogs After obtaining the confidence intervals of candidaterules the ones among high confidence intervals are used toconstruct strong association rules Note that there are manykinds of mappings here It is possible that multiple trafficsrelate to one single log or one traffic relates to several logs

To present the mechanisms of TLCD we list the details asfollows

(1) Phase I reprocessing the traffic captures and logcollectors first collect the original traffics and logswhich are considered as the inputs to TLCD Thenthese traffic-log inputs are reprocessed via the parserplugin and the filter module Specifically the filtermodule filters out the irrelevant data in both logs andtraffics

(2) Phase II feature extraction the feature extractionmodule consists of five components including thetraffic correlation (to analyze the traffic packets formalware) temporal correlation (to obtain the timecharacteristics of malware) combination correlation(to model the strong relevance of malware traffic)TCP flag (to record the sending and responding oftraffic data) and log-information (to record the loginformation for attack reconstruction) components

(3) Phase III fuzzy association module this module aimsto integrate the traffics with logs through associationrules

Table 1 Details of the TCP flags

TCP flags

TCP handshake situationACK URG FIN RST valuesThe destination IP repeatedly responds with ACK = 1The destination IP only has ACK = 1 SYN = 1 andFIN=1The source IP only has SYN = 1

(4) Phase IV anomaly detection module advancedmachine learning techniques are adopted to recognizemalicious data as anomalies

(5) Phase V attack reconstruction module the recon-struction module leverages the associations betweenthe logs and traffics to generate the time stamps of logrecords which can be finally used to reconstruct theattack process

32 Feature Extraction

321 Network Traffics To collect the traffic-log data wehave monitored the university-enterprise network for onemonth The features used in our framework include temporalcorrelation features [19] TCP flag features (displayed inTable 1) and log features (displayed in Table 2) Usuallysome cyberattacks such as botnets and phishing emailsneed to automatically send commands through programsThese automatic attack commands more or less containinherent patterns Specifically we capture the traffics from 4common cyberattacks (XSS HTTP botnet P2P botnet andphishing) and find that different network attacks behave dif-ferently in the TCP handshake stage For instance phishingmail transmission process adopts POP3 and IMP4 protocolallowing attackers to send different types of files Also the

4 Security and Communication Networks

Table 2 Detailsof the network device logs

Firewall logs Traffic logs Event logs Networklogs Security logs System

logs Cron logs Mail logs Messageslogs

Mysqldlogs

Data lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastTime lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastModule lowast lowast lowastLevel lowast lowast lowast lowast lowast lowast lowastPID lowast lowast lowast lowast lowast lowastType lowast lowast lowast lowast lowast lowast lowastAction lowast lowast lowast lowast lowastSource lowastDestination lowastTranslatedSource lowastTranslatedDestination lowastDuration lowastBytes Sent lowastBytes Received lowastApplication lowast lowastReason lowast lowast lowast lowast lowast lowast lowast lowast lowast lowast

corresponding data volume can be very small Thereforethe phishing mails can result in the same TCP handshakestates to the normal ones and it can be kinda easy toestablish a connection However in a botnet an attackerneeds to control the CampC server and send a large number ofcommands whichwill inevitably cause a handshake failure inTCPhandshake InTable 1 we display the details of TCPflagsSpecifically SYNdenoteswhether a connection is establishedFIN and ACK denote the corresponding responses and RSTdenotes the connection reset Note that the ACK informationcan be used together with SYN and FIN as evidences forattack detection For instance if both SYN and ACK areactivated it means that the connection is established withconfirmation On the contrast if only SYN is activated wecan conclude that that the connection is established withoutconfirmation Usually most of the unreachable attacks canonly activate SYN Additionally for the situation with FINand RST activated and SYN unactivated the firewalls maystill detect the SYNFIN packet When such a packet appearsin the situation it is most likely that the network has beenattacked As the ACKFIN packet represents a completedTCP connection a normal FIN packet is always marked byACK A ldquoNULLrdquo packet is the packet not marked by any TCPflags (URG ACK PSH RST SYN and FIN are all set to 0)For normal network activities theTCP stack cannever gener-ate packets featured by unreasonable TCP flag combinationsotherwise the networks have been attacked Therefore theTCP flag features can provide useful information about thenetwork status [19]

322 Network Device Logs In Table 2 we display the detailsof network device logs As we can see different types of

network device logs have different characteristics Specifi-cally the firewall logs record the events between the insideand outside the network such as port filtering hazard leveland authentication the traffic logs record current trafficconditions such as packet size IP address and duration theevent logs record events that occur during the execution ofthe system in order to provide traces for activity monitorand problem diagnosis the network logs record the processof network access such as data packet request or uploadingthe security logs mainly record the operations of networkdevices and the system errors the system logs record thehardware and software errors as well as events that occurin the monitoring system allowing the user to check thecause of errors and find traces left by the attacker theCron logs record periodic tasks in Linux (Cron reads theconfiguration files and writes them in memory when Linuxstarts As there exist some cyberattacks featured by cyclicalityCron logs are effective for identifying this type of attack)the mail logs allow the administrators to get the copies ofmessages processed by the Domino system router (whenthe mail log is enabled Domino will check the messagesas they go through MAILBOX and save their copies toMAILJRNNSF for future recovery) the message logs areplain text files that will be first checked for error messageswhen a problem occurs theMySQL logs contain informationof log-err query log log-slow-queries log-update and log-bin By default all logs are created in the MySQL directoryIn this work we extract attributes from ten types of logsfrom different network devices Each type of log has itsown attributes For instance the firewall log has attributes ofdata time module level type and reason while the trafficlogs have attributes of data time action source destinationtranslated source translated destination duration bytes sent

Security and Communication Networks 5

bytes received application and reason Although differentlogs reflect different characteristics of the device status theshared attributes such as time date and reason can beeffectively used to infer the status of one event in differentlogs

33 Anomaly Detection

331 Feature Integration through Association Rules In dailynetworks there are no direct correlations between the logdata and the traffic data However they can be correlatedthrough the shared attributes like time and date Thereforewe need to model the mutual mappings between trafficsand logs by effectively leveraging the correlated attributesWith that we can obtain the classification boundaries ofthe log attributes based on the corresponding attributes oftraffics

The discretization of traffic features which is useful forboundary division plays an important role in detectinganomaly traffics Specifically we use the Fuzzy-C Means(FCM) algorithm to divide the traffic characteristics (includ-ing quantitative attributes and Boolean attributes) into sev-eral fuzzy sets Note that the elements and nonelements ofeach fuzzy set can be mutually transformed in order toachieve the goal of softening features In the process of highlyskewed data FCM algorithm can effectively model the actualdistribution of data and clearly reveal the boundary betweennormal data and anomalies

In our method we first extract 29 basic attributes ofthe traffics including the five-tuple (source IP addressdestination IP address source port destination port andprotocol number) the total number of uplink and downlinkpackets the total number of uplink and downlink payloadpackets the total amount of uplink and downlink load flowduration average load the maximum load the minimumload average time interval between the uplink and downlinkdata packets the minimum time interval the maximum timeinterval and so on Then we extract 16 basic attributes ofthe logs including data time module PID type actionsource destination translated source and destination dura-tion bytes sent and received application reason and soon We assume that each feature comes from a Gaussiandistribution Then according to the membership function offuzzy recognition we can determine the fuzzy numbers of themaximum fuzzy set Denoting the center of the maximumfuzzy set as 120583 the membership degree as 119903119894 (i = 1 2 3 119899)and 120590 as the parameter the Gaussian fuzzy expression can berepresented as follows

y = exp[minus(119909 minus 120583)21205902 ] (1)

To approximate the maximum membership degree wedesign the objective function as

119892 (120590) = 119899sum119894=1

exp[minus (119909119894 minus 120583)21205902 ] minus 1199031198942

(2)

The corresponding membership function is expressed as

119860 119894119895 (119909119895) =

0 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 gt 21199041198951 minus (119909119895 minus 1199091198952119904119895 )

2 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 le 2119904119895(3)

where 119909119895 is the center and 2119904119895 is the standard deviation 120590Finally we can identify whether a sample is an anomaly basedon the principle of the maximum membership

332 Anomaly Detection The key point in anomaly detec-tion is to detect anomalies from benign data according tothe extracted features To achieve this we adopt supervisedlearning methods such as K-Nearest Neighbor (KNN) Sup-port Vector Machine (SVM) neural networks or decisiontrees to design the detection module Basically supervisedlearning first needs to establish a training set and thentrain a classification model over the training set For thisanomaly detection task our goal is to learn a classifier thatcan effectively detect out the anomalies In this work weadopt Gradient Boosting Decision Tree (GBDT) which isan advanced machine learning technique and models thedata with an ensemble of decision trees To finally evaluatethe performance of our method 10-fold cross validation isadopted in our work

4 Experiments

41 Dataset In our experiments we evaluate our methodover 4 types of network attack (XSS HTTP botnet P2P bot-net and phishing) These cyberattacks are carefully injectedinto the normal business and will not bring undesirableeffects for other business Both the traffic data and the logdata are collected from university servers and enterpriseservers To obtain the traffic data we have monitored theuniversity-enterprise network for one month Specifically wesimulate the P2P botnet and HTTP botnet attack accordingto the Contagio blog [21] and white paper [22] whichprovide guidance about how to make botnet evade intrusiondetection techniques To simulate the XSS attacks we injectmalicious code into the web pages of university serversThe simulated phishing emails are sent to both universityservers and enterprise servers Note that in our simulationthe anomaly traffics only account for 01 of the total trafficflows which is close to the real situations As displayed inTable 3 the collected data include 30 normal traffic datasets6 traffic datasets for XSS injection attacks 5 traffic datasets forphishing emails and 20 ones for botnets (13 P2P botnets and7 HTTP botnets) On the other hand as displayed in Table 4the log data are collected from 1 switch 2 routers 2 firewallsand 3 servers

42 Experimental Results To validate the performance ofintegrating traffics with logs for anomaly detection weconduct comparison experiments through only leveragingthe traffic data (or the log data) As displayed in Tables5ndash8 and Figures 3ndash6 it is clear that neither traffics norlogs can independently achieve desirable results in detecting

6 Security and Communication Networks

Table 3 The collection details for traffic data

Type Traffic Amount NameNormal 30 NAXSS 6 NAPhishing 5 NAHTTP botnets[20] 7 Virut SogouP2P botnets [21] 13 NSISay SMTP Spam Zeus (CampC) UDP Storm Zeus Zero access Weasel

Table 4 The collection details for log data

Device name Quantity BrandSwitch 1 HuaweiRouter 2 CiscoHuaweiFirewall 2 JuniperServer 3 Cisco

Table 5 The detection results over XSS attack

XSS FP FN10-fold KNN for traffics 82 5610-fold SVM for traffics 86 5810-fold KNN for logs 91 9910-fold SVM for logs 90 8610-fold SVM for logs-and-traffics 52 6310-fold KNN for logs-and-traffics 62 36TLCD (GBDT) 43 25

Table 6 The detection results over phishing email

Phishing FP FN10-fold KNN for traffics 71 7310-fold SVM for traffics 65 7310-fold KNN for logs 88 8310-fold SVM for logs 79 8210-fold SVM for logs-and-traffics 50 6010-fold KNN for logs-and-traffics 55 48TLCD (GBDT) 53 49

Table 7 The detection results over HTTP botnet

Http Botnet FP FN10-fold KNN for traffics 55 4810-fold SVM for traffics 53 5010-fold KNN for logs 63 5910-fold SVM for logs 63 5810-fold SVM for logs-and-traffics 36 2910-fold KNN for logs-and-traffics 38 27TLCD (GBDT) 25 28

cyberattacks (both the False Negative (FN) and False Positive(FP) values decrease significantly) which is consistent with[16] On the contrast when we integrate the traffic flowswith network device logs the detection performance canbe significantly improved Additionally we also compare

Table 8 The detection results over P2P botnet

P2P botnet FP FN10-fold KNN for traffics 45 4610-fold SVM for traffics 52 5010-fold KNN for logs 64 6010-fold SVM for logs 60 5910-fold SVM for logs-and-traffics 29 2910-fold KNN for logs-and-traffics 33 29TLCD (GBDT) 28 26

0010203040506070809

1

0 1 2 3 4 5 6 7

XSS attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 3TheF1 value obtained by eachmethod over the XSS attack

the detection performance of different supervised learningmethods including SVM KNN and GBDT As we can seethese compared methods can achieve very similar resultswith GDBT slightly better than the others This effectivelydemonstrates that the features obtained through integratingtraffics with logs are robust for our cyberdetection task

43 Attack Reconstructions In our experiments we alsoevaluate the performance of TLCD on attack reconstructionIn particular for these detected attacks we first obtaintheir time horizon and communication address accordingto the information of the corresponding anomaly trafficssuch as data time IP and so on With that we can get thecorresponding log features and then the concrete networkdevices are determined Finally we reconstruct the originalattack paths based on the abnormal information above

Figures 7ndash10 display our attack reconstruction results forthe four simulated cyberattacks Generally the XSS attack

Security and Communication Networks 7

0010203040506070809

1

0 1 2 3 4 5 6 7

Phishing attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

Timestamp delta

F1

Figure 4 The F1 value obtained by each method over the phishing email

0010203040506070809

1

0 1 2 3 4 5 6 7

P2P botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 5 The F1 value obtained by each method over the P2P botnet

0010203040506070809

1

0 1 2 3 4 5 6 7

HTTP botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 6 The F1 value obtained by each method over the HTTP botnet

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

Security and Communication Networks 3

ROUTE1

ROUTE2 SERVER1Set traffic capture FIREWALL1

SWITCH1SERVER2

Set traffic capture

SERVER3Set traffic capture

FIREWALL2

PC

PC

PC

PC

PUBLIC NETWORK

SERVER

SERVER

SERVER

Figure 2 The deployment of the traffic captures and log collectors (server-1 firewall-1 router-1 and router-2 are deployed in enterpriseServers-2 switch-1 firewall-1 and servers-3 are deployed in campus)

overhead Therefore we will filter out the irrelevant data inboth logs and traffics In the feature extraction module weextract 63 traffic features (including 5 new TCP flags) and 16log features The log features are mainly used to reveal whathappened before and after the cyberattack and auxiliarilydetect the anomaly behaviors The fuzzy association moduleaims at modeling strong associations between traffics andlogs After obtaining the confidence intervals of candidaterules the ones among high confidence intervals are used toconstruct strong association rules Note that there are manykinds of mappings here It is possible that multiple trafficsrelate to one single log or one traffic relates to several logs

To present the mechanisms of TLCD we list the details asfollows

(1) Phase I reprocessing the traffic captures and logcollectors first collect the original traffics and logswhich are considered as the inputs to TLCD Thenthese traffic-log inputs are reprocessed via the parserplugin and the filter module Specifically the filtermodule filters out the irrelevant data in both logs andtraffics

(2) Phase II feature extraction the feature extractionmodule consists of five components including thetraffic correlation (to analyze the traffic packets formalware) temporal correlation (to obtain the timecharacteristics of malware) combination correlation(to model the strong relevance of malware traffic)TCP flag (to record the sending and responding oftraffic data) and log-information (to record the loginformation for attack reconstruction) components

(3) Phase III fuzzy association module this module aimsto integrate the traffics with logs through associationrules

Table 1 Details of the TCP flags

TCP flags

TCP handshake situationACK URG FIN RST valuesThe destination IP repeatedly responds with ACK = 1The destination IP only has ACK = 1 SYN = 1 andFIN=1The source IP only has SYN = 1

(4) Phase IV anomaly detection module advancedmachine learning techniques are adopted to recognizemalicious data as anomalies

(5) Phase V attack reconstruction module the recon-struction module leverages the associations betweenthe logs and traffics to generate the time stamps of logrecords which can be finally used to reconstruct theattack process

32 Feature Extraction

321 Network Traffics To collect the traffic-log data wehave monitored the university-enterprise network for onemonth The features used in our framework include temporalcorrelation features [19] TCP flag features (displayed inTable 1) and log features (displayed in Table 2) Usuallysome cyberattacks such as botnets and phishing emailsneed to automatically send commands through programsThese automatic attack commands more or less containinherent patterns Specifically we capture the traffics from 4common cyberattacks (XSS HTTP botnet P2P botnet andphishing) and find that different network attacks behave dif-ferently in the TCP handshake stage For instance phishingmail transmission process adopts POP3 and IMP4 protocolallowing attackers to send different types of files Also the

4 Security and Communication Networks

Table 2 Detailsof the network device logs

Firewall logs Traffic logs Event logs Networklogs Security logs System

logs Cron logs Mail logs Messageslogs

Mysqldlogs

Data lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastTime lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastModule lowast lowast lowastLevel lowast lowast lowast lowast lowast lowast lowastPID lowast lowast lowast lowast lowast lowastType lowast lowast lowast lowast lowast lowast lowastAction lowast lowast lowast lowast lowastSource lowastDestination lowastTranslatedSource lowastTranslatedDestination lowastDuration lowastBytes Sent lowastBytes Received lowastApplication lowast lowastReason lowast lowast lowast lowast lowast lowast lowast lowast lowast lowast

corresponding data volume can be very small Thereforethe phishing mails can result in the same TCP handshakestates to the normal ones and it can be kinda easy toestablish a connection However in a botnet an attackerneeds to control the CampC server and send a large number ofcommands whichwill inevitably cause a handshake failure inTCPhandshake InTable 1 we display the details of TCPflagsSpecifically SYNdenoteswhether a connection is establishedFIN and ACK denote the corresponding responses and RSTdenotes the connection reset Note that the ACK informationcan be used together with SYN and FIN as evidences forattack detection For instance if both SYN and ACK areactivated it means that the connection is established withconfirmation On the contrast if only SYN is activated wecan conclude that that the connection is established withoutconfirmation Usually most of the unreachable attacks canonly activate SYN Additionally for the situation with FINand RST activated and SYN unactivated the firewalls maystill detect the SYNFIN packet When such a packet appearsin the situation it is most likely that the network has beenattacked As the ACKFIN packet represents a completedTCP connection a normal FIN packet is always marked byACK A ldquoNULLrdquo packet is the packet not marked by any TCPflags (URG ACK PSH RST SYN and FIN are all set to 0)For normal network activities theTCP stack cannever gener-ate packets featured by unreasonable TCP flag combinationsotherwise the networks have been attacked Therefore theTCP flag features can provide useful information about thenetwork status [19]

322 Network Device Logs In Table 2 we display the detailsof network device logs As we can see different types of

network device logs have different characteristics Specifi-cally the firewall logs record the events between the insideand outside the network such as port filtering hazard leveland authentication the traffic logs record current trafficconditions such as packet size IP address and duration theevent logs record events that occur during the execution ofthe system in order to provide traces for activity monitorand problem diagnosis the network logs record the processof network access such as data packet request or uploadingthe security logs mainly record the operations of networkdevices and the system errors the system logs record thehardware and software errors as well as events that occurin the monitoring system allowing the user to check thecause of errors and find traces left by the attacker theCron logs record periodic tasks in Linux (Cron reads theconfiguration files and writes them in memory when Linuxstarts As there exist some cyberattacks featured by cyclicalityCron logs are effective for identifying this type of attack)the mail logs allow the administrators to get the copies ofmessages processed by the Domino system router (whenthe mail log is enabled Domino will check the messagesas they go through MAILBOX and save their copies toMAILJRNNSF for future recovery) the message logs areplain text files that will be first checked for error messageswhen a problem occurs theMySQL logs contain informationof log-err query log log-slow-queries log-update and log-bin By default all logs are created in the MySQL directoryIn this work we extract attributes from ten types of logsfrom different network devices Each type of log has itsown attributes For instance the firewall log has attributes ofdata time module level type and reason while the trafficlogs have attributes of data time action source destinationtranslated source translated destination duration bytes sent

Security and Communication Networks 5

bytes received application and reason Although differentlogs reflect different characteristics of the device status theshared attributes such as time date and reason can beeffectively used to infer the status of one event in differentlogs

33 Anomaly Detection

331 Feature Integration through Association Rules In dailynetworks there are no direct correlations between the logdata and the traffic data However they can be correlatedthrough the shared attributes like time and date Thereforewe need to model the mutual mappings between trafficsand logs by effectively leveraging the correlated attributesWith that we can obtain the classification boundaries ofthe log attributes based on the corresponding attributes oftraffics

The discretization of traffic features which is useful forboundary division plays an important role in detectinganomaly traffics Specifically we use the Fuzzy-C Means(FCM) algorithm to divide the traffic characteristics (includ-ing quantitative attributes and Boolean attributes) into sev-eral fuzzy sets Note that the elements and nonelements ofeach fuzzy set can be mutually transformed in order toachieve the goal of softening features In the process of highlyskewed data FCM algorithm can effectively model the actualdistribution of data and clearly reveal the boundary betweennormal data and anomalies

In our method we first extract 29 basic attributes ofthe traffics including the five-tuple (source IP addressdestination IP address source port destination port andprotocol number) the total number of uplink and downlinkpackets the total number of uplink and downlink payloadpackets the total amount of uplink and downlink load flowduration average load the maximum load the minimumload average time interval between the uplink and downlinkdata packets the minimum time interval the maximum timeinterval and so on Then we extract 16 basic attributes ofthe logs including data time module PID type actionsource destination translated source and destination dura-tion bytes sent and received application reason and soon We assume that each feature comes from a Gaussiandistribution Then according to the membership function offuzzy recognition we can determine the fuzzy numbers of themaximum fuzzy set Denoting the center of the maximumfuzzy set as 120583 the membership degree as 119903119894 (i = 1 2 3 119899)and 120590 as the parameter the Gaussian fuzzy expression can berepresented as follows

y = exp[minus(119909 minus 120583)21205902 ] (1)

To approximate the maximum membership degree wedesign the objective function as

119892 (120590) = 119899sum119894=1

exp[minus (119909119894 minus 120583)21205902 ] minus 1199031198942

(2)

The corresponding membership function is expressed as

119860 119894119895 (119909119895) =

0 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 gt 21199041198951 minus (119909119895 minus 1199091198952119904119895 )

2 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 le 2119904119895(3)

where 119909119895 is the center and 2119904119895 is the standard deviation 120590Finally we can identify whether a sample is an anomaly basedon the principle of the maximum membership

332 Anomaly Detection The key point in anomaly detec-tion is to detect anomalies from benign data according tothe extracted features To achieve this we adopt supervisedlearning methods such as K-Nearest Neighbor (KNN) Sup-port Vector Machine (SVM) neural networks or decisiontrees to design the detection module Basically supervisedlearning first needs to establish a training set and thentrain a classification model over the training set For thisanomaly detection task our goal is to learn a classifier thatcan effectively detect out the anomalies In this work weadopt Gradient Boosting Decision Tree (GBDT) which isan advanced machine learning technique and models thedata with an ensemble of decision trees To finally evaluatethe performance of our method 10-fold cross validation isadopted in our work

4 Experiments

41 Dataset In our experiments we evaluate our methodover 4 types of network attack (XSS HTTP botnet P2P bot-net and phishing) These cyberattacks are carefully injectedinto the normal business and will not bring undesirableeffects for other business Both the traffic data and the logdata are collected from university servers and enterpriseservers To obtain the traffic data we have monitored theuniversity-enterprise network for one month Specifically wesimulate the P2P botnet and HTTP botnet attack accordingto the Contagio blog [21] and white paper [22] whichprovide guidance about how to make botnet evade intrusiondetection techniques To simulate the XSS attacks we injectmalicious code into the web pages of university serversThe simulated phishing emails are sent to both universityservers and enterprise servers Note that in our simulationthe anomaly traffics only account for 01 of the total trafficflows which is close to the real situations As displayed inTable 3 the collected data include 30 normal traffic datasets6 traffic datasets for XSS injection attacks 5 traffic datasets forphishing emails and 20 ones for botnets (13 P2P botnets and7 HTTP botnets) On the other hand as displayed in Table 4the log data are collected from 1 switch 2 routers 2 firewallsand 3 servers

42 Experimental Results To validate the performance ofintegrating traffics with logs for anomaly detection weconduct comparison experiments through only leveragingthe traffic data (or the log data) As displayed in Tables5ndash8 and Figures 3ndash6 it is clear that neither traffics norlogs can independently achieve desirable results in detecting

6 Security and Communication Networks

Table 3 The collection details for traffic data

Type Traffic Amount NameNormal 30 NAXSS 6 NAPhishing 5 NAHTTP botnets[20] 7 Virut SogouP2P botnets [21] 13 NSISay SMTP Spam Zeus (CampC) UDP Storm Zeus Zero access Weasel

Table 4 The collection details for log data

Device name Quantity BrandSwitch 1 HuaweiRouter 2 CiscoHuaweiFirewall 2 JuniperServer 3 Cisco

Table 5 The detection results over XSS attack

XSS FP FN10-fold KNN for traffics 82 5610-fold SVM for traffics 86 5810-fold KNN for logs 91 9910-fold SVM for logs 90 8610-fold SVM for logs-and-traffics 52 6310-fold KNN for logs-and-traffics 62 36TLCD (GBDT) 43 25

Table 6 The detection results over phishing email

Phishing FP FN10-fold KNN for traffics 71 7310-fold SVM for traffics 65 7310-fold KNN for logs 88 8310-fold SVM for logs 79 8210-fold SVM for logs-and-traffics 50 6010-fold KNN for logs-and-traffics 55 48TLCD (GBDT) 53 49

Table 7 The detection results over HTTP botnet

Http Botnet FP FN10-fold KNN for traffics 55 4810-fold SVM for traffics 53 5010-fold KNN for logs 63 5910-fold SVM for logs 63 5810-fold SVM for logs-and-traffics 36 2910-fold KNN for logs-and-traffics 38 27TLCD (GBDT) 25 28

cyberattacks (both the False Negative (FN) and False Positive(FP) values decrease significantly) which is consistent with[16] On the contrast when we integrate the traffic flowswith network device logs the detection performance canbe significantly improved Additionally we also compare

Table 8 The detection results over P2P botnet

P2P botnet FP FN10-fold KNN for traffics 45 4610-fold SVM for traffics 52 5010-fold KNN for logs 64 6010-fold SVM for logs 60 5910-fold SVM for logs-and-traffics 29 2910-fold KNN for logs-and-traffics 33 29TLCD (GBDT) 28 26

0010203040506070809

1

0 1 2 3 4 5 6 7

XSS attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 3TheF1 value obtained by eachmethod over the XSS attack

the detection performance of different supervised learningmethods including SVM KNN and GBDT As we can seethese compared methods can achieve very similar resultswith GDBT slightly better than the others This effectivelydemonstrates that the features obtained through integratingtraffics with logs are robust for our cyberdetection task

43 Attack Reconstructions In our experiments we alsoevaluate the performance of TLCD on attack reconstructionIn particular for these detected attacks we first obtaintheir time horizon and communication address accordingto the information of the corresponding anomaly trafficssuch as data time IP and so on With that we can get thecorresponding log features and then the concrete networkdevices are determined Finally we reconstruct the originalattack paths based on the abnormal information above

Figures 7ndash10 display our attack reconstruction results forthe four simulated cyberattacks Generally the XSS attack

Security and Communication Networks 7

0010203040506070809

1

0 1 2 3 4 5 6 7

Phishing attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

Timestamp delta

F1

Figure 4 The F1 value obtained by each method over the phishing email

0010203040506070809

1

0 1 2 3 4 5 6 7

P2P botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 5 The F1 value obtained by each method over the P2P botnet

0010203040506070809

1

0 1 2 3 4 5 6 7

HTTP botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 6 The F1 value obtained by each method over the HTTP botnet

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

4 Security and Communication Networks

Table 2 Detailsof the network device logs

Firewall logs Traffic logs Event logs Networklogs Security logs System

logs Cron logs Mail logs Messageslogs

Mysqldlogs

Data lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastTime lowast lowast lowast lowast lowast lowast lowast lowast lowast lowastModule lowast lowast lowastLevel lowast lowast lowast lowast lowast lowast lowastPID lowast lowast lowast lowast lowast lowastType lowast lowast lowast lowast lowast lowast lowastAction lowast lowast lowast lowast lowastSource lowastDestination lowastTranslatedSource lowastTranslatedDestination lowastDuration lowastBytes Sent lowastBytes Received lowastApplication lowast lowastReason lowast lowast lowast lowast lowast lowast lowast lowast lowast lowast

corresponding data volume can be very small Thereforethe phishing mails can result in the same TCP handshakestates to the normal ones and it can be kinda easy toestablish a connection However in a botnet an attackerneeds to control the CampC server and send a large number ofcommands whichwill inevitably cause a handshake failure inTCPhandshake InTable 1 we display the details of TCPflagsSpecifically SYNdenoteswhether a connection is establishedFIN and ACK denote the corresponding responses and RSTdenotes the connection reset Note that the ACK informationcan be used together with SYN and FIN as evidences forattack detection For instance if both SYN and ACK areactivated it means that the connection is established withconfirmation On the contrast if only SYN is activated wecan conclude that that the connection is established withoutconfirmation Usually most of the unreachable attacks canonly activate SYN Additionally for the situation with FINand RST activated and SYN unactivated the firewalls maystill detect the SYNFIN packet When such a packet appearsin the situation it is most likely that the network has beenattacked As the ACKFIN packet represents a completedTCP connection a normal FIN packet is always marked byACK A ldquoNULLrdquo packet is the packet not marked by any TCPflags (URG ACK PSH RST SYN and FIN are all set to 0)For normal network activities theTCP stack cannever gener-ate packets featured by unreasonable TCP flag combinationsotherwise the networks have been attacked Therefore theTCP flag features can provide useful information about thenetwork status [19]

322 Network Device Logs In Table 2 we display the detailsof network device logs As we can see different types of

network device logs have different characteristics Specifi-cally the firewall logs record the events between the insideand outside the network such as port filtering hazard leveland authentication the traffic logs record current trafficconditions such as packet size IP address and duration theevent logs record events that occur during the execution ofthe system in order to provide traces for activity monitorand problem diagnosis the network logs record the processof network access such as data packet request or uploadingthe security logs mainly record the operations of networkdevices and the system errors the system logs record thehardware and software errors as well as events that occurin the monitoring system allowing the user to check thecause of errors and find traces left by the attacker theCron logs record periodic tasks in Linux (Cron reads theconfiguration files and writes them in memory when Linuxstarts As there exist some cyberattacks featured by cyclicalityCron logs are effective for identifying this type of attack)the mail logs allow the administrators to get the copies ofmessages processed by the Domino system router (whenthe mail log is enabled Domino will check the messagesas they go through MAILBOX and save their copies toMAILJRNNSF for future recovery) the message logs areplain text files that will be first checked for error messageswhen a problem occurs theMySQL logs contain informationof log-err query log log-slow-queries log-update and log-bin By default all logs are created in the MySQL directoryIn this work we extract attributes from ten types of logsfrom different network devices Each type of log has itsown attributes For instance the firewall log has attributes ofdata time module level type and reason while the trafficlogs have attributes of data time action source destinationtranslated source translated destination duration bytes sent

Security and Communication Networks 5

bytes received application and reason Although differentlogs reflect different characteristics of the device status theshared attributes such as time date and reason can beeffectively used to infer the status of one event in differentlogs

33 Anomaly Detection

331 Feature Integration through Association Rules In dailynetworks there are no direct correlations between the logdata and the traffic data However they can be correlatedthrough the shared attributes like time and date Thereforewe need to model the mutual mappings between trafficsand logs by effectively leveraging the correlated attributesWith that we can obtain the classification boundaries ofthe log attributes based on the corresponding attributes oftraffics

The discretization of traffic features which is useful forboundary division plays an important role in detectinganomaly traffics Specifically we use the Fuzzy-C Means(FCM) algorithm to divide the traffic characteristics (includ-ing quantitative attributes and Boolean attributes) into sev-eral fuzzy sets Note that the elements and nonelements ofeach fuzzy set can be mutually transformed in order toachieve the goal of softening features In the process of highlyskewed data FCM algorithm can effectively model the actualdistribution of data and clearly reveal the boundary betweennormal data and anomalies

In our method we first extract 29 basic attributes ofthe traffics including the five-tuple (source IP addressdestination IP address source port destination port andprotocol number) the total number of uplink and downlinkpackets the total number of uplink and downlink payloadpackets the total amount of uplink and downlink load flowduration average load the maximum load the minimumload average time interval between the uplink and downlinkdata packets the minimum time interval the maximum timeinterval and so on Then we extract 16 basic attributes ofthe logs including data time module PID type actionsource destination translated source and destination dura-tion bytes sent and received application reason and soon We assume that each feature comes from a Gaussiandistribution Then according to the membership function offuzzy recognition we can determine the fuzzy numbers of themaximum fuzzy set Denoting the center of the maximumfuzzy set as 120583 the membership degree as 119903119894 (i = 1 2 3 119899)and 120590 as the parameter the Gaussian fuzzy expression can berepresented as follows

y = exp[minus(119909 minus 120583)21205902 ] (1)

To approximate the maximum membership degree wedesign the objective function as

119892 (120590) = 119899sum119894=1

exp[minus (119909119894 minus 120583)21205902 ] minus 1199031198942

(2)

The corresponding membership function is expressed as

119860 119894119895 (119909119895) =

0 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 gt 21199041198951 minus (119909119895 minus 1199091198952119904119895 )

2 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 le 2119904119895(3)

where 119909119895 is the center and 2119904119895 is the standard deviation 120590Finally we can identify whether a sample is an anomaly basedon the principle of the maximum membership

332 Anomaly Detection The key point in anomaly detec-tion is to detect anomalies from benign data according tothe extracted features To achieve this we adopt supervisedlearning methods such as K-Nearest Neighbor (KNN) Sup-port Vector Machine (SVM) neural networks or decisiontrees to design the detection module Basically supervisedlearning first needs to establish a training set and thentrain a classification model over the training set For thisanomaly detection task our goal is to learn a classifier thatcan effectively detect out the anomalies In this work weadopt Gradient Boosting Decision Tree (GBDT) which isan advanced machine learning technique and models thedata with an ensemble of decision trees To finally evaluatethe performance of our method 10-fold cross validation isadopted in our work

4 Experiments

41 Dataset In our experiments we evaluate our methodover 4 types of network attack (XSS HTTP botnet P2P bot-net and phishing) These cyberattacks are carefully injectedinto the normal business and will not bring undesirableeffects for other business Both the traffic data and the logdata are collected from university servers and enterpriseservers To obtain the traffic data we have monitored theuniversity-enterprise network for one month Specifically wesimulate the P2P botnet and HTTP botnet attack accordingto the Contagio blog [21] and white paper [22] whichprovide guidance about how to make botnet evade intrusiondetection techniques To simulate the XSS attacks we injectmalicious code into the web pages of university serversThe simulated phishing emails are sent to both universityservers and enterprise servers Note that in our simulationthe anomaly traffics only account for 01 of the total trafficflows which is close to the real situations As displayed inTable 3 the collected data include 30 normal traffic datasets6 traffic datasets for XSS injection attacks 5 traffic datasets forphishing emails and 20 ones for botnets (13 P2P botnets and7 HTTP botnets) On the other hand as displayed in Table 4the log data are collected from 1 switch 2 routers 2 firewallsand 3 servers

42 Experimental Results To validate the performance ofintegrating traffics with logs for anomaly detection weconduct comparison experiments through only leveragingthe traffic data (or the log data) As displayed in Tables5ndash8 and Figures 3ndash6 it is clear that neither traffics norlogs can independently achieve desirable results in detecting

6 Security and Communication Networks

Table 3 The collection details for traffic data

Type Traffic Amount NameNormal 30 NAXSS 6 NAPhishing 5 NAHTTP botnets[20] 7 Virut SogouP2P botnets [21] 13 NSISay SMTP Spam Zeus (CampC) UDP Storm Zeus Zero access Weasel

Table 4 The collection details for log data

Device name Quantity BrandSwitch 1 HuaweiRouter 2 CiscoHuaweiFirewall 2 JuniperServer 3 Cisco

Table 5 The detection results over XSS attack

XSS FP FN10-fold KNN for traffics 82 5610-fold SVM for traffics 86 5810-fold KNN for logs 91 9910-fold SVM for logs 90 8610-fold SVM for logs-and-traffics 52 6310-fold KNN for logs-and-traffics 62 36TLCD (GBDT) 43 25

Table 6 The detection results over phishing email

Phishing FP FN10-fold KNN for traffics 71 7310-fold SVM for traffics 65 7310-fold KNN for logs 88 8310-fold SVM for logs 79 8210-fold SVM for logs-and-traffics 50 6010-fold KNN for logs-and-traffics 55 48TLCD (GBDT) 53 49

Table 7 The detection results over HTTP botnet

Http Botnet FP FN10-fold KNN for traffics 55 4810-fold SVM for traffics 53 5010-fold KNN for logs 63 5910-fold SVM for logs 63 5810-fold SVM for logs-and-traffics 36 2910-fold KNN for logs-and-traffics 38 27TLCD (GBDT) 25 28

cyberattacks (both the False Negative (FN) and False Positive(FP) values decrease significantly) which is consistent with[16] On the contrast when we integrate the traffic flowswith network device logs the detection performance canbe significantly improved Additionally we also compare

Table 8 The detection results over P2P botnet

P2P botnet FP FN10-fold KNN for traffics 45 4610-fold SVM for traffics 52 5010-fold KNN for logs 64 6010-fold SVM for logs 60 5910-fold SVM for logs-and-traffics 29 2910-fold KNN for logs-and-traffics 33 29TLCD (GBDT) 28 26

0010203040506070809

1

0 1 2 3 4 5 6 7

XSS attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 3TheF1 value obtained by eachmethod over the XSS attack

the detection performance of different supervised learningmethods including SVM KNN and GBDT As we can seethese compared methods can achieve very similar resultswith GDBT slightly better than the others This effectivelydemonstrates that the features obtained through integratingtraffics with logs are robust for our cyberdetection task

43 Attack Reconstructions In our experiments we alsoevaluate the performance of TLCD on attack reconstructionIn particular for these detected attacks we first obtaintheir time horizon and communication address accordingto the information of the corresponding anomaly trafficssuch as data time IP and so on With that we can get thecorresponding log features and then the concrete networkdevices are determined Finally we reconstruct the originalattack paths based on the abnormal information above

Figures 7ndash10 display our attack reconstruction results forthe four simulated cyberattacks Generally the XSS attack

Security and Communication Networks 7

0010203040506070809

1

0 1 2 3 4 5 6 7

Phishing attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

Timestamp delta

F1

Figure 4 The F1 value obtained by each method over the phishing email

0010203040506070809

1

0 1 2 3 4 5 6 7

P2P botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 5 The F1 value obtained by each method over the P2P botnet

0010203040506070809

1

0 1 2 3 4 5 6 7

HTTP botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 6 The F1 value obtained by each method over the HTTP botnet

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

Security and Communication Networks 5

bytes received application and reason Although differentlogs reflect different characteristics of the device status theshared attributes such as time date and reason can beeffectively used to infer the status of one event in differentlogs

33 Anomaly Detection

331 Feature Integration through Association Rules In dailynetworks there are no direct correlations between the logdata and the traffic data However they can be correlatedthrough the shared attributes like time and date Thereforewe need to model the mutual mappings between trafficsand logs by effectively leveraging the correlated attributesWith that we can obtain the classification boundaries ofthe log attributes based on the corresponding attributes oftraffics

The discretization of traffic features which is useful forboundary division plays an important role in detectinganomaly traffics Specifically we use the Fuzzy-C Means(FCM) algorithm to divide the traffic characteristics (includ-ing quantitative attributes and Boolean attributes) into sev-eral fuzzy sets Note that the elements and nonelements ofeach fuzzy set can be mutually transformed in order toachieve the goal of softening features In the process of highlyskewed data FCM algorithm can effectively model the actualdistribution of data and clearly reveal the boundary betweennormal data and anomalies

In our method we first extract 29 basic attributes ofthe traffics including the five-tuple (source IP addressdestination IP address source port destination port andprotocol number) the total number of uplink and downlinkpackets the total number of uplink and downlink payloadpackets the total amount of uplink and downlink load flowduration average load the maximum load the minimumload average time interval between the uplink and downlinkdata packets the minimum time interval the maximum timeinterval and so on Then we extract 16 basic attributes ofthe logs including data time module PID type actionsource destination translated source and destination dura-tion bytes sent and received application reason and soon We assume that each feature comes from a Gaussiandistribution Then according to the membership function offuzzy recognition we can determine the fuzzy numbers of themaximum fuzzy set Denoting the center of the maximumfuzzy set as 120583 the membership degree as 119903119894 (i = 1 2 3 119899)and 120590 as the parameter the Gaussian fuzzy expression can berepresented as follows

y = exp[minus(119909 minus 120583)21205902 ] (1)

To approximate the maximum membership degree wedesign the objective function as

119892 (120590) = 119899sum119894=1

exp[minus (119909119894 minus 120583)21205902 ] minus 1199031198942

(2)

The corresponding membership function is expressed as

119860 119894119895 (119909119895) =

0 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 gt 21199041198951 minus (119909119895 minus 1199091198952119904119895 )

2 10038161003816100381610038161003816119909119895 minus 11990911989510038161003816100381610038161003816 le 2119904119895(3)

where 119909119895 is the center and 2119904119895 is the standard deviation 120590Finally we can identify whether a sample is an anomaly basedon the principle of the maximum membership

332 Anomaly Detection The key point in anomaly detec-tion is to detect anomalies from benign data according tothe extracted features To achieve this we adopt supervisedlearning methods such as K-Nearest Neighbor (KNN) Sup-port Vector Machine (SVM) neural networks or decisiontrees to design the detection module Basically supervisedlearning first needs to establish a training set and thentrain a classification model over the training set For thisanomaly detection task our goal is to learn a classifier thatcan effectively detect out the anomalies In this work weadopt Gradient Boosting Decision Tree (GBDT) which isan advanced machine learning technique and models thedata with an ensemble of decision trees To finally evaluatethe performance of our method 10-fold cross validation isadopted in our work

4 Experiments

41 Dataset In our experiments we evaluate our methodover 4 types of network attack (XSS HTTP botnet P2P bot-net and phishing) These cyberattacks are carefully injectedinto the normal business and will not bring undesirableeffects for other business Both the traffic data and the logdata are collected from university servers and enterpriseservers To obtain the traffic data we have monitored theuniversity-enterprise network for one month Specifically wesimulate the P2P botnet and HTTP botnet attack accordingto the Contagio blog [21] and white paper [22] whichprovide guidance about how to make botnet evade intrusiondetection techniques To simulate the XSS attacks we injectmalicious code into the web pages of university serversThe simulated phishing emails are sent to both universityservers and enterprise servers Note that in our simulationthe anomaly traffics only account for 01 of the total trafficflows which is close to the real situations As displayed inTable 3 the collected data include 30 normal traffic datasets6 traffic datasets for XSS injection attacks 5 traffic datasets forphishing emails and 20 ones for botnets (13 P2P botnets and7 HTTP botnets) On the other hand as displayed in Table 4the log data are collected from 1 switch 2 routers 2 firewallsand 3 servers

42 Experimental Results To validate the performance ofintegrating traffics with logs for anomaly detection weconduct comparison experiments through only leveragingthe traffic data (or the log data) As displayed in Tables5ndash8 and Figures 3ndash6 it is clear that neither traffics norlogs can independently achieve desirable results in detecting

6 Security and Communication Networks

Table 3 The collection details for traffic data

Type Traffic Amount NameNormal 30 NAXSS 6 NAPhishing 5 NAHTTP botnets[20] 7 Virut SogouP2P botnets [21] 13 NSISay SMTP Spam Zeus (CampC) UDP Storm Zeus Zero access Weasel

Table 4 The collection details for log data

Device name Quantity BrandSwitch 1 HuaweiRouter 2 CiscoHuaweiFirewall 2 JuniperServer 3 Cisco

Table 5 The detection results over XSS attack

XSS FP FN10-fold KNN for traffics 82 5610-fold SVM for traffics 86 5810-fold KNN for logs 91 9910-fold SVM for logs 90 8610-fold SVM for logs-and-traffics 52 6310-fold KNN for logs-and-traffics 62 36TLCD (GBDT) 43 25

Table 6 The detection results over phishing email

Phishing FP FN10-fold KNN for traffics 71 7310-fold SVM for traffics 65 7310-fold KNN for logs 88 8310-fold SVM for logs 79 8210-fold SVM for logs-and-traffics 50 6010-fold KNN for logs-and-traffics 55 48TLCD (GBDT) 53 49

Table 7 The detection results over HTTP botnet

Http Botnet FP FN10-fold KNN for traffics 55 4810-fold SVM for traffics 53 5010-fold KNN for logs 63 5910-fold SVM for logs 63 5810-fold SVM for logs-and-traffics 36 2910-fold KNN for logs-and-traffics 38 27TLCD (GBDT) 25 28

cyberattacks (both the False Negative (FN) and False Positive(FP) values decrease significantly) which is consistent with[16] On the contrast when we integrate the traffic flowswith network device logs the detection performance canbe significantly improved Additionally we also compare

Table 8 The detection results over P2P botnet

P2P botnet FP FN10-fold KNN for traffics 45 4610-fold SVM for traffics 52 5010-fold KNN for logs 64 6010-fold SVM for logs 60 5910-fold SVM for logs-and-traffics 29 2910-fold KNN for logs-and-traffics 33 29TLCD (GBDT) 28 26

0010203040506070809

1

0 1 2 3 4 5 6 7

XSS attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 3TheF1 value obtained by eachmethod over the XSS attack

the detection performance of different supervised learningmethods including SVM KNN and GBDT As we can seethese compared methods can achieve very similar resultswith GDBT slightly better than the others This effectivelydemonstrates that the features obtained through integratingtraffics with logs are robust for our cyberdetection task

43 Attack Reconstructions In our experiments we alsoevaluate the performance of TLCD on attack reconstructionIn particular for these detected attacks we first obtaintheir time horizon and communication address accordingto the information of the corresponding anomaly trafficssuch as data time IP and so on With that we can get thecorresponding log features and then the concrete networkdevices are determined Finally we reconstruct the originalattack paths based on the abnormal information above

Figures 7ndash10 display our attack reconstruction results forthe four simulated cyberattacks Generally the XSS attack

Security and Communication Networks 7

0010203040506070809

1

0 1 2 3 4 5 6 7

Phishing attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

Timestamp delta

F1

Figure 4 The F1 value obtained by each method over the phishing email

0010203040506070809

1

0 1 2 3 4 5 6 7

P2P botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 5 The F1 value obtained by each method over the P2P botnet

0010203040506070809

1

0 1 2 3 4 5 6 7

HTTP botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 6 The F1 value obtained by each method over the HTTP botnet

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

6 Security and Communication Networks

Table 3 The collection details for traffic data

Type Traffic Amount NameNormal 30 NAXSS 6 NAPhishing 5 NAHTTP botnets[20] 7 Virut SogouP2P botnets [21] 13 NSISay SMTP Spam Zeus (CampC) UDP Storm Zeus Zero access Weasel

Table 4 The collection details for log data

Device name Quantity BrandSwitch 1 HuaweiRouter 2 CiscoHuaweiFirewall 2 JuniperServer 3 Cisco

Table 5 The detection results over XSS attack

XSS FP FN10-fold KNN for traffics 82 5610-fold SVM for traffics 86 5810-fold KNN for logs 91 9910-fold SVM for logs 90 8610-fold SVM for logs-and-traffics 52 6310-fold KNN for logs-and-traffics 62 36TLCD (GBDT) 43 25

Table 6 The detection results over phishing email

Phishing FP FN10-fold KNN for traffics 71 7310-fold SVM for traffics 65 7310-fold KNN for logs 88 8310-fold SVM for logs 79 8210-fold SVM for logs-and-traffics 50 6010-fold KNN for logs-and-traffics 55 48TLCD (GBDT) 53 49

Table 7 The detection results over HTTP botnet

Http Botnet FP FN10-fold KNN for traffics 55 4810-fold SVM for traffics 53 5010-fold KNN for logs 63 5910-fold SVM for logs 63 5810-fold SVM for logs-and-traffics 36 2910-fold KNN for logs-and-traffics 38 27TLCD (GBDT) 25 28

cyberattacks (both the False Negative (FN) and False Positive(FP) values decrease significantly) which is consistent with[16] On the contrast when we integrate the traffic flowswith network device logs the detection performance canbe significantly improved Additionally we also compare

Table 8 The detection results over P2P botnet

P2P botnet FP FN10-fold KNN for traffics 45 4610-fold SVM for traffics 52 5010-fold KNN for logs 64 6010-fold SVM for logs 60 5910-fold SVM for logs-and-traffics 29 2910-fold KNN for logs-and-traffics 33 29TLCD (GBDT) 28 26

0010203040506070809

1

0 1 2 3 4 5 6 7

XSS attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 3TheF1 value obtained by eachmethod over the XSS attack

the detection performance of different supervised learningmethods including SVM KNN and GBDT As we can seethese compared methods can achieve very similar resultswith GDBT slightly better than the others This effectivelydemonstrates that the features obtained through integratingtraffics with logs are robust for our cyberdetection task

43 Attack Reconstructions In our experiments we alsoevaluate the performance of TLCD on attack reconstructionIn particular for these detected attacks we first obtaintheir time horizon and communication address accordingto the information of the corresponding anomaly trafficssuch as data time IP and so on With that we can get thecorresponding log features and then the concrete networkdevices are determined Finally we reconstruct the originalattack paths based on the abnormal information above

Figures 7ndash10 display our attack reconstruction results forthe four simulated cyberattacks Generally the XSS attack

Security and Communication Networks 7

0010203040506070809

1

0 1 2 3 4 5 6 7

Phishing attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

Timestamp delta

F1

Figure 4 The F1 value obtained by each method over the phishing email

0010203040506070809

1

0 1 2 3 4 5 6 7

P2P botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 5 The F1 value obtained by each method over the P2P botnet

0010203040506070809

1

0 1 2 3 4 5 6 7

HTTP botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 6 The F1 value obtained by each method over the HTTP botnet

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

Security and Communication Networks 7

0010203040506070809

1

0 1 2 3 4 5 6 7

Phishing attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

Timestamp delta

F1

Figure 4 The F1 value obtained by each method over the phishing email

0010203040506070809

1

0 1 2 3 4 5 6 7

P2P botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 5 The F1 value obtained by each method over the P2P botnet

0010203040506070809

1

0 1 2 3 4 5 6 7

HTTP botnet attack

10-fold KNN for traffics

10-fold SVM for traffics

10-fold KNN for logs

10-fold SVM for logs

10-fold SVM for logs-and-traffics

10-fold KNN for logs-and-traffics 2

TLCD

F1

Timestamp delta

Figure 6 The F1 value obtained by each method over the HTTP botnet

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

8 Security and Communication Networks

XSS ATTACKROUTE1

SWITCHROUTE2

145401-2021121437145512-Accept 2021121437

1921681100 1 TCP145549-bulit inbound TCP

connection 7852369 for

outside19216811008000 to

inside 192168125138000

145612-NSTL_01 SHELL5

CMDtaskvt0 ip19216812513

user

145649-Accept 19216812513 1921682200 1

TCP

145719-nstlserverPID5929 ifconfig 5938

Python 5600 bash

lowastlowast

Figure 7 XSS attack reconstruction

PHISHING ATTACKROUTE1

SWITCHROUTE2

083326-2021121437

083359-Accept 2021121437

1921681105 1 TCP(IMAP4) 083601-NSTL_01

SHELL5CMDtaskvt0

ip19216812517 user

083504-bulit inbound TCP

connection 5869534 for

outside1921681105143 to

inside 19216812517143

083656-Accept 19216812517 1921682200 1

TCP

083715-nstlserverPID43318 44062 90600

lowastlowast

Figure 8 Phishing attack reconstruction

damages the web server through passing several networkdevices including routers firewalls switches and so onThe phishing attack shares similar attack process to XSSexcept that it adopts the IMP4 protocol and a fixed portDifferent from XSS and phishing the HTTP botnet doesnot aim to attack the servers but the hosts through passingthe servers Note that the HTTP botnet attack is completedby web pages and the ICMP protocol The P2P botnetattack is implemented not only through the servers butalso directly over the hosts As displayed in Figures 7ndash10our reconstructed results have accurately revealed the attackpaths of the corresponding cyberattacks which demon-strates that our method can effectively reconstruct the attackprocess

5 Conclusion

In this paper we propose to integrate traffics with networkdevice logs for detecting cyberattacks Specifically we use

fuzzy association rules to integrate the device logswith trafficsto obtain the features for attack detection and reconstructionThe experiments over four common network attacks clearlydemonstrate that our TLCD method can effectively detectdiverse cyberattacks and reconstruct their event process

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper is supported by National Natural Science Founda-tion of China under grant no 6157211

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

Security and Communication Networks 9

P2P botnet ATTACK ROUTE1

221858-testwsupper p stashtestnet

pgtget lampsupperlamp -sT -p 15-561232916000

wwwourtestcom

221903-Accept 2021121437 10593139 1

TCPUDP

221913-Accept 2021121437

1921681143 1 TCPUDP

222003-bulit inbound TCP connection 365480 for

outside192168114380 to inside 192168239480 222034-block ICMP echo

req1921681143-gt192168239

222109-accept ICMP echo req1921681143-

gt192168239

PC-a PC-b

SWITCHROUTE2

222141-nstlserverPID6935 21569 59871

222221-192168239 80 POSTcatalogserachasp501 748 536 24

wwwourtestcom mozilla40+(compatible+windosws+51)

222243-Accept 192168239 1921682200 TCPUDP

PC 1

PC n

Figure 9 HTTP botnet attack reconstruction

HTTP botnet ATTACKROUTE1

121539-2021121437

121623-Accept 2021121437 192168120

1 TCPUDP

121711-bulit inbound TCP connection 365480 for

outside19216812080 to inside 19216813355480 121721-block ICMP echo

req192168120-gt19216813355

121743-accept ICMP echo req192168120-gt19216813355

121754-NSTL_01 SHELL5CMDtaskvt0

ip19216813355 user

SWITCHROUTE2

121807-Accept 19216813355 1921682200 TCPUDP

121845-nstlserverPID86948 56396

121901-19216813355 80 POSTcatalogserachasp501 748 536 24 wwwourtestcom mozilla40+(compatible+windosws+51)

PC 1

PC n

lowastlowast

Figure 10 P2P botnet attack reconstruction

References

[1] A Sood and R Enbody Targeted Cyber Attacks Multi-StagedAttacks Driven by Exploits and Malware Syngress 2014

[2] K L Chiew K S C Yong and C L Tan ldquoA survey of phishingattacks Their types vectors and technical approachesrdquo ExpertSystems with Applications vol 106 pp 1ndash20 2018

[3] B Caswell J Beale and A Baker Snort Intrusion Detection andPrevention Toolkit Syngress 2007

[4] A Yaar A Perrig and D Song ldquoPi a path identificationmechanism to defend against DDoS attacksrdquo in Proceedings ofthe 2003 Symposium on Security and Privacy SP 2003 pp 93ndash107 USA May 2003

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

10 Security and Communication Networks

[5] H Wenhua and Y Geng ldquoIdentification method of attack pathbased on immune intrusion detectionrdquo Journal of Networks vol9 no 4 pp 964ndash971 2014

[6] B Parno D Wendlandt E Shi A Perrig B Maggs and Y-C Hu ldquoPortcullis protecting connection setup from denial-of-capability attacksrdquoACM SIGCOMMComputer CommunicationReview vol 37 no 4 pp 289ndash300 2007

[7] K Kleemola M Crete-Nishihata and J Scott-Railton TargetedAttacks against Tibetan and Hong Kong Groups ExploitingCVE-2014-4114 Citizen Lab June 2015

[8] J R Crandall GWassermannD A deOliveira Z Su S FWuand F T Chong ldquoTemporal search detecting hidden malwaretimebombs with virtual machinesrdquo ACM SIGARCH ComputerArchitecture News vol 34 no 5 pp 25ndash36 2006

[9] Z Gu K Pei Q Wang L Si X Zhang and D Xu ldquoLEAPSdetecting camouflaged attacks with statistical learning guidedby program analysisrdquo in Proceedings of the 2015 45th AnnualIEEEIFIP International Conference on Dependable Systems andNetworks (DSN) pp 57ndash68 Rio de Janeiro Brazil June 2015

[10] K Pei Z Gu B Saltaformaggio et al ldquoHercule attack storyreconstruction via community discovery on correlated loggraphrdquo in Proceedings of the the 32nd Annual Conferenceon Computer Security Applications pp 583ndash595 ACM LosAngeles Calif USA December 2016

[11] H-S ChenW Gao andD G Daut ldquoSignature based spectrumsensing algorithms for IEEE 80222 WRANrdquo in Proceedings ofthe 2007 IEEE International Conference on CommunicationsICCrsquo07 pp 6487ndash6492 IEEE UK June 2007

[12] J Zhang andMZulkernine ldquoAnomaly based network intrusiondetection with unsupervised outlier detectionrdquo in Proceedingsof the 2006 IEEE International Conference on CommunicationsICC 2006 pp 2388ndash2393 IEEE Turkey July 2006

[13] P Garcia-Teodoro J Diaz-Verdejo G Macia-Fernandez etal ldquoAnomaly-based network intrusion detection techniquessystems and challengesrdquo Journal of Computers and Security vol28 no 1-2 pp 18ndash28 2009

[14] F Gong ldquoDeciphering detection techniques Part ii anomaly-based intrusion detectionrdquoWhite PaperMcAfee Security 2003

[15] Y Yu ldquoA survey of anomaly intrusion detection techniquesrdquoJournal of Computing Sciences in Colleges vol 28 no 1 pp 9ndash172012

[16] F Sonmez M Zontul O Kaynar and H Tutar ldquoAnomalydetection using data mining methods in it systems a decisionsupport applicationrdquo Sakarya University Journal of Science vol22 no 4 pp 1109ndash1123 2018

[17] N Kamiyama and T Mori ldquoSimple and accurate identificationof high-rate flows by packet samplingrdquo in Proceedings of theProceedings IEEE INFOCOM 2006 25TH IEEE InternationalConference on Computer Communications pp 1ndash13 IEEEBarcelona Spain April 2006

[18] G Carl G Kesidis R R Brooks and S Rai ldquoDenial-of-serviceattack-detection techniquesrdquo IEEE Internet Computing vol 10no 1 pp 82ndash89 2006

[19] J Lu K Chen Z Zhuo and X Zhang ldquoA temporal correlationand traffic analysis approach for APT attacks detectionrdquoClusterComputing pp 1ndash12 2017

[20] S Garcıa M Grill J Stiborek and A Zunino ldquoAn empiricalcomparison of botnet detectionmethodsrdquo Journal of Computersand Security vol 45 pp 100ndash123 2014

[21] M Parkour Contagio malware database httpswwwmedi-afirecomfolderc2az029ch6ckeTRAFFIC PATTE20RNSCOLLECTION 2013

[22] S Siddiqui M S Khan K Ferens and W Kinsner ldquoDetectingadvanced persistent threats using fractal dimension basedmachine learning classificationrdquo in Proceedings of the 2ndACM International Workshop on Security and Privacy Analytics(IWSPA rsquo16) pp 64ndash69 ACM 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Integrating Traffics with Network Device Logs for Anomaly ...downloads.hindawi.com/journals/scn/2019/5695021.pdf · SecurityandCommunicationNetworks ROUTE1 ROUTE2 SERVER1 Set traffic

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom