11
Research Article An Anomaly Detection Based on Data Fusion Algorithm in Wireless Sensor Networks Xingfeng Guo, 1 Dianhong Wang, 2 and Fenxiong Chen 2 1 Institute of Geophysics & Geomatics, China University of Geosciences, Wuhan 430074, China 2 Faculty of Mechanical & Electronic Information, China University of Geosciences, Wuhan 430074, China Correspondence should be addressed to Xingfeng Guo; [email protected] Received 16 November 2014; Revised 30 April 2015; Accepted 3 May 2015 Academic Editor: Lisimachos Kondi Copyright © 2015 Xingfeng Guo et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In recent years, with the development of wireless sensor networks (WSN), it has been applied in more and more areas. However, energy consumption and outlier detection have been always the hot topics in WSN. In order to solve the above problems, this paper proposes a timely anomaly detection algorithm which is based on the data fusion algorithm. is algorithm firstly employs the piecewise aggregate approximation (PAA) to compress the original data so that the energy consumption can be reduced. It then combines an improved unsupervised detection algorithm of -Means and artificial immune system (AIS) to classify the compressed data to normal and abnormal data. Finally, relevant experiments on virtual and actual sensor databases show that our algorithm can achieve a high outlier detection rate while the false alarm rate is low. In addition, our detection algorithm can effectively prolong the life because it is based on data fusion algorithm. 1. Introduction With the development of digital electronics and wireless communications, a new breed of tiny embedded systems known as wireless sensor nodes has emerged in the past decade. A wireless sensor network (WSN) which consists of a large number of cheap, tiny, battery powered sensor nodes equipped with limited on-board processing, storage and radio capabilities has provided a variety of applications, such as health monitoring, scientific data collection, environ- mental monitoring, and military operations [1]. Event-driven WSN applications [2] require timely data analysis and assessment in order to facilitate (near) real- time, efficient, and accurate critical decision making and situation awareness. However, the stringent resource con- straints such as battery life, computational capacity, and com- munication overload may lead to unreliable and inaccurate sensor data, especially when battery power is exhausted. What is worse, these sensor nodes are usually deployed in harsh and unattended environments where the data may be unreliable because of noise, missing, redundant data, and so on. erefore, how to ensure the accuracy of sensor data and find these unreliable data timely has become an urgent task in the applications of WSN [3]. To identify the unreliable data or malicious invasion, outlier detection is used in WSN, and some algorithms of outlier detection have been proposed [48], but almost none of them consider the problem of energy consumption during the process of detection. In this paper, we propose an anomaly detection algorithm which is based on the data fusion algorithm. To reduce the communication overload and prolong the life of battery, we firstly use the algorithm of PAA which is lightweight to compress the time series data of each node. en based on the result of PAA, we combine an improved unsupervised detection algorithm of -Means and AIS to effectively classify the normal and abnormal data. e improved -Means algorithm is used to complete the classifi- cation of compressed data, and the AIS offsets the drawbacks of -Means so that the detection result is global optimal. Simulations with two synthetic datasets and several real envi- ronmental datasets adopted in the paper [9] show that our adaptive outlier detection technique achieves high detection accuracy and low false alarm, and it saves the energy of network through data compression at the same time. Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2015, Article ID 943532, 10 pages http://dx.doi.org/10.1155/2015/943532

Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

Research ArticleAn Anomaly Detection Based on Data Fusion Algorithm inWireless Sensor Networks

Xingfeng Guo1 Dianhong Wang2 and Fenxiong Chen2

1 Institute of Geophysics amp Geomatics China University of Geosciences Wuhan 430074 China2Faculty of Mechanical amp Electronic Information China University of Geosciences Wuhan 430074 China

Correspondence should be addressed to Xingfeng Guo rongx0118163com

Received 16 November 2014 Revised 30 April 2015 Accepted 3 May 2015

Academic Editor Lisimachos Kondi

Copyright copy 2015 Xingfeng Guo et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In recent years with the development of wireless sensor networks (WSN) it has been applied in more and more areas Howeverenergy consumption and outlier detection have been always the hot topics inWSN In order to solve the above problems this paperproposes a timely anomaly detection algorithm which is based on the data fusion algorithm This algorithm firstly employs thepiecewise aggregate approximation (PAA) to compress the original data so that the energy consumption can be reduced It thencombines an improved unsupervised detection algorithmof119870-Means and artificial immune system (AIS) to classify the compresseddata to normal and abnormal data Finally relevant experiments on virtual and actual sensor databases show that our algorithm canachieve a high outlier detection rate while the false alarm rate is low In addition our detection algorithm can effectively prolongthe life because it is based on data fusion algorithm

1 Introduction

With the development of digital electronics and wirelesscommunications a new breed of tiny embedded systemsknown as wireless sensor nodes has emerged in the pastdecade A wireless sensor network (WSN) which consistsof a large number of cheap tiny battery powered sensornodes equipped with limited on-board processing storageand radio capabilities has provided a variety of applicationssuch as healthmonitoring scientific data collection environ-mental monitoring and military operations [1]

Event-driven WSN applications [2] require timely dataanalysis and assessment in order to facilitate (near) real-time efficient and accurate critical decision making andsituation awareness However the stringent resource con-straints such as battery life computational capacity and com-munication overload may lead to unreliable and inaccuratesensor data especially when battery power is exhaustedWhat is worse these sensor nodes are usually deployed inharsh and unattended environments where the data may beunreliable because of noise missing redundant data and soon Therefore how to ensure the accuracy of sensor data and

find these unreliable data timely has become an urgent taskin the applications of WSN [3]

To identify the unreliable data or malicious invasionoutlier detection is used in WSN and some algorithmsof outlier detection have been proposed [4ndash8] but almostnone of them consider the problem of energy consumptionduring the process of detection In this paper we proposean anomaly detection algorithm which is based on the datafusion algorithm To reduce the communication overload andprolong the life of battery we firstly use the algorithm ofPAA which is lightweight to compress the time series data ofeach node Then based on the result of PAA we combine animproved unsupervised detection algorithm of119870-Means andAIS to effectively classify the normal and abnormal data Theimproved119870-Means algorithm is used to complete the classifi-cation of compressed data and the AIS offsets the drawbacksof 119870-Means so that the detection result is global optimalSimulations with two synthetic datasets and several real envi-ronmental datasets adopted in the paper [9] show that ouradaptive outlier detection technique achieves high detectionaccuracy and low false alarm and it saves the energy ofnetwork through data compression at the same time

Hindawi Publishing CorporationInternational Journal of Distributed Sensor NetworksVolume 2015 Article ID 943532 10 pageshttpdxdoiorg1011552015943532

2 International Journal of Distributed Sensor Networks

The rest of the paper is organized as follows In Section 2we review the characteristics of outlier detection and someexisting outlier detection algorithms inWSN In Section 3 weillustrate our improved 119870-Means and AIS algorithm whichis based on compressed data in detail In Section 4 wepresent our experimental results to validate the capabilitiesof our algorithm with a synthetic datasets and several realenvironmental datasets Finally we conclude the paper anddiscuss directions for future work in Section 5

2 Related Works

In this section we firstly analyze the challenges of outlierdetection inWSNwhen comparedwith the traditional outlierdetection techniques In Section 22 we analyze the advan-tages and disadvantages of some recent outlier detectionalgorithms in WSN

21 Challenges of Outlier Detection inWSN Outlier inWSNalso known as anomaly can be defined as ldquothose measure-ments that significantly deviate from the normal pattern ofsensed data [13]rdquo This definition indicates that an effectiveway for outlier detection inWSN is to define a normal behav-ior of sensor data and consider those sensor observations thatdeviate from the defined normal behavior of sensor data asoutliers However conventional outlier detection techniquesmight not be suitable for sensor data inWSNswhere there aremore challenges for outlier detection [3]

(i) Resource constraints as said before WSN has astringent resource constraints such as battery lifecomputational capacity and communication over-load Thus compared to the most of traditionaltechniques which are computationally expensive andrequire much memory for data analysis and storagehow to minimize the communication load of WSNneeds more attention

(ii) Distributed streaming data during the lifetime ofdata collection the underlying phenomenon which isbeing measured may alter so the sensor data will be anonstationary streaming distribution It suggests thatmost of traditional outlier techniques with analyzingstationary offline data are not suitable for WSN andthese algorithms which need a priori knowledge ofthe data distribution are also infeasible Thus a keyof outlier detection in WSN is how to detect thedistributed streaming data online

(iii) Large-scale deployment massive sensor nodes maybe deployed in the WSN This requires the construc-tion of an accurate normal profile that represents thenormal behavior of sensor data so that the outlierdetection techniques can still maintain a high detec-tion rate while keeping the false alarm rate low

According to the above analysis an effective and efficientoutlier detection technique for WSN should be able toidentify outliers in a distributed and online manner withhigh detection accuracy and low false alarm at the sametime satisfyingWSN constraints in terms of communicationcomputational and memory complexity [4]

22 Analysis of Several Previous Algorithm Recently manydetection techniques have been proposed Sun et al [6]proposed a technique based on extended Kalman filter basedmechanism to detect false injected data The paper furtherapplies an algorithm of combining cumulative summationand generalized likelihood ratio to increase detection sen-sitivity This algorithm is practical on resource stringenthardware but too much calculation fails to ensure the real-time detection Salem et al [5] presented an algorithm whichcombined the Mahalanobis distance (MD) and the kerneldensity estimator The former is used for spatial analysisand the latter completes the identification of abnormalpatternsThis technique does not require a priori known datadistribution and can achieve good detection accuracy withlow false alarm rate A remaining problem in this techniqueis its high dependency on the predefined threshold of MDwhile an appropriate threshold is quite difficult to figure outand a single threshold may also not be suitable for outlierdetection in multidimensional data Zhang et al [4] tookinto account the correlation among sensor data attributesand proposed two distributed and online outlier detectiontechniques based on a hyperellipsoidal one-class supportvector machine (SVM) The algorithm takes advantage ofthe theory of spatiotemporal correlation to identify outliersand updates the ellipsoidal SVM-based model representingthe changed normal behavior of sensor data for furtheroutlier identification However this technique ignores thecommunication capability of each node where the spatialcorrelation is calculated by exchanging some parametersbetween neighbors Bhargava and Raghuvanshi [7] proposedan anomaly detection algorithm based on 119878-transform andone-SVM (support vector machine) To reduce the data size119878-transform is applied to extract the significant componentsof the time series data and then one-SVM is used to classifydata into original and anomalous This algorithm adoptscompression transformation before outlier detection but itneeds to classify every extract components for final decisionwhich generates heavy calculation Some other outlier detec-tion techniques in WSN also have been proposed [14ndash16]Above outlier detection algorithms can detect outliers effec-tively but they have not fully considered the characteristicsof outlier detection inWSN such as the constrained resourceand the dynamic data flow To solve these problems whilealso ensuring the detection accuracy high we propose theoutlier detection algorithm based on the data fusion Firstlywe consider using improved PAA to save sensor energy andreduce the calculated amount in the following process ofoutlier detection Then use an improved 119870-Means algorithmwhich does not need a priori knowledge of the data distri-bution to distinguish normal and abnormal data Finally theAIS algorithm is used to make the detection result by 119870-Means global optimal and the detection accuracy high Theadvantages and reasons of selecting PAA 119870-Means and AISfor our outlier algorithm are shown in next chapter in detail

3 Our Proposed Algorithm

In this section we will present our outlier detection approachin detail Figure 1 shows the diagram of our proposed

International Journal of Distributed Sensor Networks 3

NoYes

Original sensor data

Compress data with PAA

Classify with improved

Determine whether meets the

requirement of classification

Output detection result

Optimize by AIS

K-Means algorithm

Figure 1 Flow diagram of our proposed algorithm

algorithm Firstly we utilize the PAA to compress the originalsensor data Then the improved 119870-Means algorithm is usedto complete the classification of compressed data and the AISoffsets drawbacks of 119870-Means so that the detection result isglobal optimal The two phases of compression and outlierdetection will be discussed below in detail

31 Data Compression with PAA To save energy of sensorstechniques of data compression have been used in WSNbecause most energy is consumed in data transmission [17ndash19] However due to resource constraints a suitable compres-sion algorithm for WSN should be efficient and simple Asa dimensionality reduction algorithm PAA is more intuitiveand simple when compared with these techniques such asFourier transforms wavelets and so on [20] What is morethe degree of compression can be changed by adjusting thecompression ratio parameter Thus the PAA technique isadopted to compress the data in this paper

A time series 119862 of length 119899 can be represented in a 119908-dimensional space by vector

119862 = 1198881 119888

119908 (1)

And the 119894th element of 119862 is calculated as follows

119888119894=1

119896times

119896sdot119894

sum

119895=119896(119894minus1)+1

119888119895

119896 =119899

119908

(2)

where the119870 is defined as compression ratio and 119862119895is the 119895th

element in the original data More information about PAA ispresent in this paper [20]

Obviously the compression process of PAA is simple andfast but this way of compression also introduces a problemthat this technique compresses the data directly withoutconsidering the characteristics of data and the correlationbetween data which may lead to some mistakes in thesubsequent handling of compressed data This problem alsohas an influence on the outlier detection behind For examplethe situation in which the value of data is greater than 10 isconsidered abnormal and 119862

1= 2 4 9 and 119862

2= 1 1 13

are two original data series After PAA with 119896 = 3 1198881=

1198882= 3 the difference between two series is hidden and the

outlier data cannot be detected In order to further presentthe differences between sequences the variance of sequenceis added as the second output of PAAThe variance associatedwith the 119894th element of 119862 is defined as

Var119894=1

119896times

119896sdot119894

sum

119895=119896(119894minus1)+1

(119888119895minus 119888119894)2

(3)

Due to the energy consumption on acquiring the neigh-bor data it is not encouraged to compress the data with PAAon the space On the contrary every sensor has a cache tostorage previous data so it is convenient to compress timeseries data with PAA in each node A sliding window is usedto implement the selection of data with 119896 length where 119896represents the compression ratio of PAA According to (3)

4 International Journal of Distributed Sensor Networks

and (4) a binary array119883119895= (119888119894119895Var119894119895) is defined to represent

the 119894th compression data of the 119895th sensor node

32 Outlier Detection on Compressed Data Based on com-pression result of PAA in the preceding subsection wecombine an improved unsupervised detection algorithm of119870-Means and AIS to effectively classify the normal andabnormal compressed sensor data Because reconstructionof the initial data is not needed it will have good real-time performance After all sensors sent their data to theSink node the improved 119870-Means algorithm which needsno prior knowledge of data distribution is used to completethe classification of compressed data and the AIS offsets thedrawbacks of 119870-Means so that the detection result is globaloptimal We will introduce the process of outlier detection indetail in the following

321 Classify by the Algorithm of119870-Means According to thedefinition of outlier in WSN the key of the outlier detectionis how to effectively separate the outliers from normal dataClassification is a process to classify sampled data intodifferent classes or clusters Some classification techniquessuch as SVM and artificial neural network (ANN) have beenutilized in the outlier detection in WSN [21ndash23] As one ofthe simplest classification algorithm the119870-Means techniquealso has been proved to be effective in the outlier detectionin other areas [24ndash26] which groups data with length 119873points into 1198961015840 clusters where 1198961015840 represents the number ofclusters and is used to distinguish the compression ratio 119870In addition compared to other classification techniques the119870-Means algorithm based on outlier detection in WSN hasthe following advantages

(i) The 119870-Means do not need knowledge of the datadistribution and it is suitable for dynamic data

(ii) The calculation is simpler than other classificationtechniques such as SVM and ANN

However there are two elements included in every com-pressed data so we have an improvement on original 119870-Means algorithm to ensure that the 119870-Means algorithm canbe combined with PAA algorithm Suppose that the numberof sensors in a WSN is 119873 and the improved steps of the 119870-Means in our algorithm are as bellow

(1) Select 1198961015840 random instances 1198721 (1198721111987212) 1198722

(1198722111987222) sdot sdot sdot119872

1198961015840 (1198721198701015840111987211989610158402) from the all com-

pressed sensor data as the initial centroids of theclusters 119862

1 1198622sdot sdot sdot 1198621198701015840

(2) For every training data119883119895= (119888119894119895Var119894119895)

(a) Calculate the Euclidean distance

1198841198991198951=10038171003817100381710038171003817119862119894119895minus1198721198991

10038171003817100381710038171003817

2

1198841198991198952=10038171003817100381710038171003817Var119894119895minus1198721198992

10038171003817100381710038171003817

2

119899 = 1 2 sdot sdot sdot 1198961015840

(4)

where 1198841198991198951

is the distance between the firstelement of 119883

119895and the first element of the 119899th

cluster and1198841198991198952

is the distance of the second ele-ment between119883

119895and the 119899th cluster To reduce

the influence of different order of magnitude1198841198991198951

and1198841198991198952

are normalized to11988410158401198991198951

and11988410158401198991198952 and

the distance between119883119895and119872

119902is defined as

119863(119883119895119899) = 119898 lowast 119884

1015840

1198991198951+ (1 minus 119898) lowast 119884

1015840

1198991198952119899 = 1 2 sdot sdot sdot 119896

1015840 (5)

where119898 is the weighted parameter to adjust theproportion of two factors The distance repre-sents the correlation between sensor data andcluster center and it also determines whetherthe data belong to the outlier class Finally findcluster 119862119902 that is closest to119883119895

(b) Assign 119883119895to 119862119902

and update the centroid(11987211990211198721199022) of 119862119902(the centroid of a cluster is the

arithmetic mean of the instances in the cluster)

(3) Repeat steps (2) until the centers no longer changeFinally the algorithm aims at minimizing the squarederror function 119869

119869 =

119873

sum

119895=1

10038171003817100381710038171003817119863 (119883119895

(119894))10038171003817100381710038171003817

2

119895 = 1 2 sdot sdot sdot 1198961015840 (6)

where 119863(119883119895

(119894)) represents the distance between 119883

119895

and the center value of cluster 119862(119894)119895

in which 119883119894is

located

After the process of classification according to the def-inition of outlier the ideal result is that the abnormal datawill be assigned to the same cluster while the normal datawill be assigned to the same cluster because these outliers aredeviated from the normal dataTherefore the less the result 119869is themore precise the classification is In addition comparedto the normal data the number of abnormal data is relativelyless so these data in the cluster where the number of data isthe least is the identified outlier

322 Classification Improvement with AIS As it is knownthat the 119870-Means algorithm depends on the initial centroidsof the clusters and it is easy to fall into local optimum so westill need to solve these problems to make the classificationmore precise The AIS algorithm which is also knownas clonal selection algorithm (CSA) and a global optimalsearching algorithm is considered appropriate to offset thedrawbacks of119870-Means algorithm in the paper

Clonal selection algorithm is an emerging intelligentalgorithmwhich is inspired by the immune system It uses thediversity of immune system to maintain population diversityso that it can avoid the ldquopremature problemrdquo in general opti-mization and get the global optimization [27ndash29] The detaildescription of this algorithm is presented in this paper [30]

According to the defects of the 119870-Means algorithm thepurpose of CSA in this paper is to find the best initialcentroids of the clusters which ensure that the classification

International Journal of Distributed Sensor Networks 5

result is the global optimum and our outliersrsquo detection rateis high The application of CSA applied in our paper can bedescribed as follows

(1) In our 119870-Means algorithm the squared error func-tion 119869 in (6) is the judgment standard of classificationresult so we choose 119869 as the objective function andthe affinity

(2) Because our purpose is to find the best initial cen-troids of clusters we define centroids of the clustersas antibody and randomly initialize multiple array ofcentroids as initial antibody group 119879

119879 =

[[[[[[[[[[[

[

1198721

11198721

2sdot sdot sdot sdot sdot sdot 119872

1

1198701015840

1198722

11198722

2sdot sdot sdot sdot sdot sdot 119872

2

1198701015840

119872119876

1119872119876

2sdot sdot sdot sdot sdot sdot 119872

119876

1198701015840

]]]]]]]]]]]

]

(7)

where (1198721198761119872119876

2sdot sdot sdot119872119876

1198701015840) is the 119876th initial centroids

of clusters(3) For every antibody in 119879 we classify the compressed

data with the improved 119870-Means and record theaffinity 119869 Then we sort the affinity sequence by size

(4) According to the reorder affinity sequence we selectsome antibody whose affinity is at the top of thesequence as parent antibody group which is as theinitial antibody group in the next round because thegood genes are more likely to be propagated to thenext generation according to the genetics Then clonethese antibodies based on the size of affinities

(5) Determine whether or not the classification resultcalculated with the antibody corresponding to theminimum 119869 meets the end condition that the 119869 isenough small If meets this antibody is the bestinitial centroids of clusters which can ensure theclassification result with 119870-Means is best Otherwiseit will continue the following steps

(6) Process these antibodies selected in step (4) with theoperation of clone crossover and mutation to formata new diversity generation of antibodies group

(7) If the number of iteration has been arrived then theprocess is also finished otherwise turn to step (3)

After the above process we can find the more idealinitial cluster heads and get the more accurate classificationthan initial 119870-Means algorithm As a result the normal andabnormal sensor data will be selected to different clustermore effectively so our algorithm can have higher detectionaccuracy while the false alarm is lower However an effectiveoutlier detection algorithm in WSN not only need to havehigh detection rate but also need to satisfy the characteristicof sensor data and constrained resource Compared to otheralgorithms the advantages of our algorithm are shown inTable 1

Table 1 The comparison of characteristics

OnlineBased on

compresseddata

No a prioriknowledgeof data

Our algorithm Yes Yes YesSalem et al [5] Yes No YesBhargava andRaghuvanshi [7] Yes Yes No

Zhang et al [4] Yes No No

Table 2 The experimental datasets

Name ofdataset

Length ofdataset Source of dataset

Ma Data 1200 Synthetic dataset in this paper [10]Keogh Data 1200 Synthetic dataset in this paper [10]chfdb chf01 3600 ECG [11]chfdb chf13 3600 ECG [11]stdb 308 3600 ECG [11]Synthetic control 600 UCI [12]

As shown in Table 1 compared to other algorithms ouralgorithm can satisfy more requirements of outlier detectionin WSN

33 Pseudocode of Our Algorithm Based on the previousintroduction of every part in our algorithm Pseudocode 1 isused to describe the whole process of our algorithm

4 Experimental Evaluation

In order to evaluate the performance of our proposed algo-rithm experiments were carried out based on two syntheticanomaly datasets and some realmedicine datasets commonlyused in anomaly detection The name and detail source ofthese datasets are shown in Table 2 For comparison the 119870-Means algorithm without AIS is used as baseline

41 Experimental Setup and Evaluation Metrics Our simula-tion is conducted inMATLABWe assume that knowledge ofsensor node locations is available at the base station We donot assume any specific routing or medium access protocolin this network ormake any assumptions on the node densityof the network because our algorithm is not for a particularapplication scenario

Detection rate (DR) and false alarm rate (FAR) are used toevaluate the performance of our outlier detection algorithmDetection rate is the ratio of correctly detected outlier datato the total number of outlier data The false alarm rate isthe ratio of the number of normal data which is incorrectlydetected as outlier data to the total number of normal data

The number of abnormal data is very few in general so wewill select the cluster where the number of data is minimumas outlier cluster and calculate the DR and FAR depending onthese data in the outlier cluster

6 International Journal of Distributed Sensor Networks

Input Original data from every sensorOutput abnormal sensor and outlier dataFor original data from every sensor(1) Compress data with improved PAA algorithm in each node and get compressed data (119888

119894119895 Var119894119895)

(2) All nodes send the compressed data of the same time to the base station(3) Initialize the antibody group 119879 and set the initial number of iteration as 0(4) While (The number of iteration has not been reached the specified value)(5) For each antibody in 119879(6) Classify all the compressed data with improved119870-Means algorithm and calculate the affinity 119869(7) End the classification(8) Sort the all affinity by size and choose these antibody at the top of the reorder sequence

as patient antibody group Then clone these antibodies based on the size of affinities(9) Determine whether or not the minimum 119869meets the end condition(10) If (it does)(11) Jump out of the loop of optimization and end the classification(12) Else(13) Process the patient antibody group with the operation of clone crossover and

mutation to format a new diversity generation of antibodies group(14) The number of iteration plus one(15) End the processing of optimization and select the antibody in first line as the best initial centroids of clusters(16) Classify the compressed data with the selected antibody and pick out the abnormal compression data(17) Judge the abnormal type based on the location of the abnormal data

Pseudocode 1 Pseudocode of our algorithm

42 Experimental Results

421 Decision of Related Parameters As shown in the pro-cess of our algorithm in the chapter 3 the compression ratio119870 and the number 1198961015840 of clusters are two key parameters whichhave a great impact on the classification result Thereforerelevant experiments have been completed to determine theappropriate scope of119870 and 1198961015840 whichmeans that the followingdetection results are robust and accurate In order to showthe results clearly and concisely we will only present theresults on database of stdb 308 because the results on otherdatabases are similar The database of stdb 308 is shown inFigure 2 and the data whose absolute value is more than 05is abnormalThe results of DR and FARunder different119870 and1198961015840 are separately shown in Tables 3 and 4 In order to reduce

the error we take the average value of 100 times experimentsunder the same parameters as our finial result

According to the described algorithm in last chapter thesmaller 119870 is the less the data information after compressingloses and the more accurate the detection is On the otherhand the larger 119870 is the better the effect of compression isand the more the energy is saved For the parameter 1198961015840 theless it is the less the calculation amount is but the worsethe effect of classification is and the performance is oppositewhen 1198701015840 increases

As shown in Tables 3 and 4 the results are in accordancewith theoretical analysis However in order to balance theenergy saving and effective outlier detection while consider-ing the calculation amount we determine that 119870 is 4 and 1198961015840is 3

422 Results of Outlier Detection With 119870 and 1198961015840 selectedin the previous subsection relevant experiments were

Table 3 DR () under different 119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 9612 9645 9705 9712 97153 9677 9712 9723 9730 97284 9765 9715 9720 9732 97355 9750 9789 9791 9786 97856 9762 9750 9803 9811 98107 9759 9785 9800 9808 98128 9765 9779 9805 9812 9814

Table 4 FAR () under different119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 512 445 387 312 2983 511 432 358 309 2784 485 421 356 298 2775 490 387 334 301 2656 485 375 298 265 2567 483 368 255 234 2458 475 357 214 198 153

conducted to compare the performance of our algorithm(OA) and the algorithmof119870-Meanswithout AIS (KWA) andthe results are shown in Table 5

As shown in the Table 5 the performance of our algo-rithm with every database is obviously better DR is higher

International Journal of Distributed Sensor Networks 7

0 500 1000 1500 2000 2500 3000 3500

0

05

1

15

minus15

minus1

minus05

Figure 2 The database of stdb 308

Table 5 The experimental results

Name of dataset DRwith OA

DRwith KWA

FARwith OA

FARwith KWA

Ma Data 9750 8112 334 823Keogh Data 9810 8045 315 778Chfdb chf01 9886 8134 322 712Chfdb chf13 9713 7865 197 802Stdb 308 9723 8067 358 784Synthetic control 9834 7812 278 845

and FAR is lower than the algorithmwithout AISThe reasonis that our algorithm adopts the AIS algorithm to ensure thatthe result is global optimization on the contrary the otheralgorithm is depended on the initial cluster centers so thatthe result of every experiment is local optimum and unstableIn addition although the original data is compressed beforeoutlier detection the DR of our algorithm in every line ishigher than 95 while the FAR is lower than 5 so ouralgorithm is an ideal and effective outlier detection systeminWSN which can ensure high detection rate while the falsealarm rate is low

However there is more or less environmental noise in theactual application situation whichmakes the original changeso we artificially add Gaussian white noise in the original sig-nals and conduct the experiments again to test the robustnessof our algorithmWe take the Ma Data (Ma) Stdb 308 (Std)and Synthetic control (Syn) databases as examples to carryout the experiments and results are shown in Figures 3 and 4Figure 3 shows the DA of these databases with different SNRand Figure 4 shows the FRA with different SNR

As shown in Figures 3 and 4 we test the robustness of ouralgorithm under several Gaussian white noise with differentsignal to noise ratio (SNR) In these two pictures DR has aslight downward trend and FAR is on the slight rise with thereducing of the SNR because smaller SNR will have a greaterinfluence on original data However the DR is still biggerthan 85 and the FAR smaller than 15under different noise

so it is obvious that our algorithm has a good robustness onresisting noise

423 Analysis of Energy Saving and Real-Time PerformanceCompared to other outlier algorithms another advantage isour algorithm can prolong the network life because the PAAalgorithm reduces the amount of sent data For example forthe database of stdb 308 the number of sent data will reduceto 1800 from 3600 when 119870 is 4 so the network can save halfof the energy and the network can save more energy when119870is more

Some outlier detection methods which combine withdata fusion algorithm reconstruct the original data beforedetecting However the reconstruct process of a data fusionalgorithm usually need long time so it will lead to poorreal-time performance of algorithmOur detection algorithmdirectly deals with the compressed data so it can have a betterreal-time performance

424 The Comparison of Our Algorithm with Other OutlierDetection Algorithms in WSN In order to show the effec-tiveness of our algorithm more clearly our algorithm willbe compared with these algorithms listed in Table 1 Thedatabases stdb 308 is selected to do the relevant experimentsFor every algorithm we will choose the optimal parameterssettings according to every initial paper In addition we willtake the average value of 100 times experiments as the finalexperiments results to ensure the reliability of the experi-ments The comparison result is shown in Table 6

As shown inTable 6 theDR and FARof our algorithm areboth the best among above algorithms Because we combinethe AIS with 119870-Means to optimize the classification resultsit can detect outliers effectively Also our algorithm can saveenergy consumption best compared with other algorithmsbecause the function of PAA ensures that the lifetime ofnetwork can be prolonged effectively Because there is no theprocess of data compression in the algorithm proposed byZhang et al [4] in which the time consumption is less than inour algorithmHowever because the PAA119870-Means andAISare lightweight methods in our paper the time consumption

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

2 International Journal of Distributed Sensor Networks

The rest of the paper is organized as follows In Section 2we review the characteristics of outlier detection and someexisting outlier detection algorithms inWSN In Section 3 weillustrate our improved 119870-Means and AIS algorithm whichis based on compressed data in detail In Section 4 wepresent our experimental results to validate the capabilitiesof our algorithm with a synthetic datasets and several realenvironmental datasets Finally we conclude the paper anddiscuss directions for future work in Section 5

2 Related Works

In this section we firstly analyze the challenges of outlierdetection inWSNwhen comparedwith the traditional outlierdetection techniques In Section 22 we analyze the advan-tages and disadvantages of some recent outlier detectionalgorithms in WSN

21 Challenges of Outlier Detection inWSN Outlier inWSNalso known as anomaly can be defined as ldquothose measure-ments that significantly deviate from the normal pattern ofsensed data [13]rdquo This definition indicates that an effectiveway for outlier detection inWSN is to define a normal behav-ior of sensor data and consider those sensor observations thatdeviate from the defined normal behavior of sensor data asoutliers However conventional outlier detection techniquesmight not be suitable for sensor data inWSNswhere there aremore challenges for outlier detection [3]

(i) Resource constraints as said before WSN has astringent resource constraints such as battery lifecomputational capacity and communication over-load Thus compared to the most of traditionaltechniques which are computationally expensive andrequire much memory for data analysis and storagehow to minimize the communication load of WSNneeds more attention

(ii) Distributed streaming data during the lifetime ofdata collection the underlying phenomenon which isbeing measured may alter so the sensor data will be anonstationary streaming distribution It suggests thatmost of traditional outlier techniques with analyzingstationary offline data are not suitable for WSN andthese algorithms which need a priori knowledge ofthe data distribution are also infeasible Thus a keyof outlier detection in WSN is how to detect thedistributed streaming data online

(iii) Large-scale deployment massive sensor nodes maybe deployed in the WSN This requires the construc-tion of an accurate normal profile that represents thenormal behavior of sensor data so that the outlierdetection techniques can still maintain a high detec-tion rate while keeping the false alarm rate low

According to the above analysis an effective and efficientoutlier detection technique for WSN should be able toidentify outliers in a distributed and online manner withhigh detection accuracy and low false alarm at the sametime satisfyingWSN constraints in terms of communicationcomputational and memory complexity [4]

22 Analysis of Several Previous Algorithm Recently manydetection techniques have been proposed Sun et al [6]proposed a technique based on extended Kalman filter basedmechanism to detect false injected data The paper furtherapplies an algorithm of combining cumulative summationand generalized likelihood ratio to increase detection sen-sitivity This algorithm is practical on resource stringenthardware but too much calculation fails to ensure the real-time detection Salem et al [5] presented an algorithm whichcombined the Mahalanobis distance (MD) and the kerneldensity estimator The former is used for spatial analysisand the latter completes the identification of abnormalpatternsThis technique does not require a priori known datadistribution and can achieve good detection accuracy withlow false alarm rate A remaining problem in this techniqueis its high dependency on the predefined threshold of MDwhile an appropriate threshold is quite difficult to figure outand a single threshold may also not be suitable for outlierdetection in multidimensional data Zhang et al [4] tookinto account the correlation among sensor data attributesand proposed two distributed and online outlier detectiontechniques based on a hyperellipsoidal one-class supportvector machine (SVM) The algorithm takes advantage ofthe theory of spatiotemporal correlation to identify outliersand updates the ellipsoidal SVM-based model representingthe changed normal behavior of sensor data for furtheroutlier identification However this technique ignores thecommunication capability of each node where the spatialcorrelation is calculated by exchanging some parametersbetween neighbors Bhargava and Raghuvanshi [7] proposedan anomaly detection algorithm based on 119878-transform andone-SVM (support vector machine) To reduce the data size119878-transform is applied to extract the significant componentsof the time series data and then one-SVM is used to classifydata into original and anomalous This algorithm adoptscompression transformation before outlier detection but itneeds to classify every extract components for final decisionwhich generates heavy calculation Some other outlier detec-tion techniques in WSN also have been proposed [14ndash16]Above outlier detection algorithms can detect outliers effec-tively but they have not fully considered the characteristicsof outlier detection inWSN such as the constrained resourceand the dynamic data flow To solve these problems whilealso ensuring the detection accuracy high we propose theoutlier detection algorithm based on the data fusion Firstlywe consider using improved PAA to save sensor energy andreduce the calculated amount in the following process ofoutlier detection Then use an improved 119870-Means algorithmwhich does not need a priori knowledge of the data distri-bution to distinguish normal and abnormal data Finally theAIS algorithm is used to make the detection result by 119870-Means global optimal and the detection accuracy high Theadvantages and reasons of selecting PAA 119870-Means and AISfor our outlier algorithm are shown in next chapter in detail

3 Our Proposed Algorithm

In this section we will present our outlier detection approachin detail Figure 1 shows the diagram of our proposed

International Journal of Distributed Sensor Networks 3

NoYes

Original sensor data

Compress data with PAA

Classify with improved

Determine whether meets the

requirement of classification

Output detection result

Optimize by AIS

K-Means algorithm

Figure 1 Flow diagram of our proposed algorithm

algorithm Firstly we utilize the PAA to compress the originalsensor data Then the improved 119870-Means algorithm is usedto complete the classification of compressed data and the AISoffsets drawbacks of 119870-Means so that the detection result isglobal optimal The two phases of compression and outlierdetection will be discussed below in detail

31 Data Compression with PAA To save energy of sensorstechniques of data compression have been used in WSNbecause most energy is consumed in data transmission [17ndash19] However due to resource constraints a suitable compres-sion algorithm for WSN should be efficient and simple Asa dimensionality reduction algorithm PAA is more intuitiveand simple when compared with these techniques such asFourier transforms wavelets and so on [20] What is morethe degree of compression can be changed by adjusting thecompression ratio parameter Thus the PAA technique isadopted to compress the data in this paper

A time series 119862 of length 119899 can be represented in a 119908-dimensional space by vector

119862 = 1198881 119888

119908 (1)

And the 119894th element of 119862 is calculated as follows

119888119894=1

119896times

119896sdot119894

sum

119895=119896(119894minus1)+1

119888119895

119896 =119899

119908

(2)

where the119870 is defined as compression ratio and 119862119895is the 119895th

element in the original data More information about PAA ispresent in this paper [20]

Obviously the compression process of PAA is simple andfast but this way of compression also introduces a problemthat this technique compresses the data directly withoutconsidering the characteristics of data and the correlationbetween data which may lead to some mistakes in thesubsequent handling of compressed data This problem alsohas an influence on the outlier detection behind For examplethe situation in which the value of data is greater than 10 isconsidered abnormal and 119862

1= 2 4 9 and 119862

2= 1 1 13

are two original data series After PAA with 119896 = 3 1198881=

1198882= 3 the difference between two series is hidden and the

outlier data cannot be detected In order to further presentthe differences between sequences the variance of sequenceis added as the second output of PAAThe variance associatedwith the 119894th element of 119862 is defined as

Var119894=1

119896times

119896sdot119894

sum

119895=119896(119894minus1)+1

(119888119895minus 119888119894)2

(3)

Due to the energy consumption on acquiring the neigh-bor data it is not encouraged to compress the data with PAAon the space On the contrary every sensor has a cache tostorage previous data so it is convenient to compress timeseries data with PAA in each node A sliding window is usedto implement the selection of data with 119896 length where 119896represents the compression ratio of PAA According to (3)

4 International Journal of Distributed Sensor Networks

and (4) a binary array119883119895= (119888119894119895Var119894119895) is defined to represent

the 119894th compression data of the 119895th sensor node

32 Outlier Detection on Compressed Data Based on com-pression result of PAA in the preceding subsection wecombine an improved unsupervised detection algorithm of119870-Means and AIS to effectively classify the normal andabnormal compressed sensor data Because reconstructionof the initial data is not needed it will have good real-time performance After all sensors sent their data to theSink node the improved 119870-Means algorithm which needsno prior knowledge of data distribution is used to completethe classification of compressed data and the AIS offsets thedrawbacks of 119870-Means so that the detection result is globaloptimal We will introduce the process of outlier detection indetail in the following

321 Classify by the Algorithm of119870-Means According to thedefinition of outlier in WSN the key of the outlier detectionis how to effectively separate the outliers from normal dataClassification is a process to classify sampled data intodifferent classes or clusters Some classification techniquessuch as SVM and artificial neural network (ANN) have beenutilized in the outlier detection in WSN [21ndash23] As one ofthe simplest classification algorithm the119870-Means techniquealso has been proved to be effective in the outlier detectionin other areas [24ndash26] which groups data with length 119873points into 1198961015840 clusters where 1198961015840 represents the number ofclusters and is used to distinguish the compression ratio 119870In addition compared to other classification techniques the119870-Means algorithm based on outlier detection in WSN hasthe following advantages

(i) The 119870-Means do not need knowledge of the datadistribution and it is suitable for dynamic data

(ii) The calculation is simpler than other classificationtechniques such as SVM and ANN

However there are two elements included in every com-pressed data so we have an improvement on original 119870-Means algorithm to ensure that the 119870-Means algorithm canbe combined with PAA algorithm Suppose that the numberof sensors in a WSN is 119873 and the improved steps of the 119870-Means in our algorithm are as bellow

(1) Select 1198961015840 random instances 1198721 (1198721111987212) 1198722

(1198722111987222) sdot sdot sdot119872

1198961015840 (1198721198701015840111987211989610158402) from the all com-

pressed sensor data as the initial centroids of theclusters 119862

1 1198622sdot sdot sdot 1198621198701015840

(2) For every training data119883119895= (119888119894119895Var119894119895)

(a) Calculate the Euclidean distance

1198841198991198951=10038171003817100381710038171003817119862119894119895minus1198721198991

10038171003817100381710038171003817

2

1198841198991198952=10038171003817100381710038171003817Var119894119895minus1198721198992

10038171003817100381710038171003817

2

119899 = 1 2 sdot sdot sdot 1198961015840

(4)

where 1198841198991198951

is the distance between the firstelement of 119883

119895and the first element of the 119899th

cluster and1198841198991198952

is the distance of the second ele-ment between119883

119895and the 119899th cluster To reduce

the influence of different order of magnitude1198841198991198951

and1198841198991198952

are normalized to11988410158401198991198951

and11988410158401198991198952 and

the distance between119883119895and119872

119902is defined as

119863(119883119895119899) = 119898 lowast 119884

1015840

1198991198951+ (1 minus 119898) lowast 119884

1015840

1198991198952119899 = 1 2 sdot sdot sdot 119896

1015840 (5)

where119898 is the weighted parameter to adjust theproportion of two factors The distance repre-sents the correlation between sensor data andcluster center and it also determines whetherthe data belong to the outlier class Finally findcluster 119862119902 that is closest to119883119895

(b) Assign 119883119895to 119862119902

and update the centroid(11987211990211198721199022) of 119862119902(the centroid of a cluster is the

arithmetic mean of the instances in the cluster)

(3) Repeat steps (2) until the centers no longer changeFinally the algorithm aims at minimizing the squarederror function 119869

119869 =

119873

sum

119895=1

10038171003817100381710038171003817119863 (119883119895

(119894))10038171003817100381710038171003817

2

119895 = 1 2 sdot sdot sdot 1198961015840 (6)

where 119863(119883119895

(119894)) represents the distance between 119883

119895

and the center value of cluster 119862(119894)119895

in which 119883119894is

located

After the process of classification according to the def-inition of outlier the ideal result is that the abnormal datawill be assigned to the same cluster while the normal datawill be assigned to the same cluster because these outliers aredeviated from the normal dataTherefore the less the result 119869is themore precise the classification is In addition comparedto the normal data the number of abnormal data is relativelyless so these data in the cluster where the number of data isthe least is the identified outlier

322 Classification Improvement with AIS As it is knownthat the 119870-Means algorithm depends on the initial centroidsof the clusters and it is easy to fall into local optimum so westill need to solve these problems to make the classificationmore precise The AIS algorithm which is also knownas clonal selection algorithm (CSA) and a global optimalsearching algorithm is considered appropriate to offset thedrawbacks of119870-Means algorithm in the paper

Clonal selection algorithm is an emerging intelligentalgorithmwhich is inspired by the immune system It uses thediversity of immune system to maintain population diversityso that it can avoid the ldquopremature problemrdquo in general opti-mization and get the global optimization [27ndash29] The detaildescription of this algorithm is presented in this paper [30]

According to the defects of the 119870-Means algorithm thepurpose of CSA in this paper is to find the best initialcentroids of the clusters which ensure that the classification

International Journal of Distributed Sensor Networks 5

result is the global optimum and our outliersrsquo detection rateis high The application of CSA applied in our paper can bedescribed as follows

(1) In our 119870-Means algorithm the squared error func-tion 119869 in (6) is the judgment standard of classificationresult so we choose 119869 as the objective function andthe affinity

(2) Because our purpose is to find the best initial cen-troids of clusters we define centroids of the clustersas antibody and randomly initialize multiple array ofcentroids as initial antibody group 119879

119879 =

[[[[[[[[[[[

[

1198721

11198721

2sdot sdot sdot sdot sdot sdot 119872

1

1198701015840

1198722

11198722

2sdot sdot sdot sdot sdot sdot 119872

2

1198701015840

119872119876

1119872119876

2sdot sdot sdot sdot sdot sdot 119872

119876

1198701015840

]]]]]]]]]]]

]

(7)

where (1198721198761119872119876

2sdot sdot sdot119872119876

1198701015840) is the 119876th initial centroids

of clusters(3) For every antibody in 119879 we classify the compressed

data with the improved 119870-Means and record theaffinity 119869 Then we sort the affinity sequence by size

(4) According to the reorder affinity sequence we selectsome antibody whose affinity is at the top of thesequence as parent antibody group which is as theinitial antibody group in the next round because thegood genes are more likely to be propagated to thenext generation according to the genetics Then clonethese antibodies based on the size of affinities

(5) Determine whether or not the classification resultcalculated with the antibody corresponding to theminimum 119869 meets the end condition that the 119869 isenough small If meets this antibody is the bestinitial centroids of clusters which can ensure theclassification result with 119870-Means is best Otherwiseit will continue the following steps

(6) Process these antibodies selected in step (4) with theoperation of clone crossover and mutation to formata new diversity generation of antibodies group

(7) If the number of iteration has been arrived then theprocess is also finished otherwise turn to step (3)

After the above process we can find the more idealinitial cluster heads and get the more accurate classificationthan initial 119870-Means algorithm As a result the normal andabnormal sensor data will be selected to different clustermore effectively so our algorithm can have higher detectionaccuracy while the false alarm is lower However an effectiveoutlier detection algorithm in WSN not only need to havehigh detection rate but also need to satisfy the characteristicof sensor data and constrained resource Compared to otheralgorithms the advantages of our algorithm are shown inTable 1

Table 1 The comparison of characteristics

OnlineBased on

compresseddata

No a prioriknowledgeof data

Our algorithm Yes Yes YesSalem et al [5] Yes No YesBhargava andRaghuvanshi [7] Yes Yes No

Zhang et al [4] Yes No No

Table 2 The experimental datasets

Name ofdataset

Length ofdataset Source of dataset

Ma Data 1200 Synthetic dataset in this paper [10]Keogh Data 1200 Synthetic dataset in this paper [10]chfdb chf01 3600 ECG [11]chfdb chf13 3600 ECG [11]stdb 308 3600 ECG [11]Synthetic control 600 UCI [12]

As shown in Table 1 compared to other algorithms ouralgorithm can satisfy more requirements of outlier detectionin WSN

33 Pseudocode of Our Algorithm Based on the previousintroduction of every part in our algorithm Pseudocode 1 isused to describe the whole process of our algorithm

4 Experimental Evaluation

In order to evaluate the performance of our proposed algo-rithm experiments were carried out based on two syntheticanomaly datasets and some realmedicine datasets commonlyused in anomaly detection The name and detail source ofthese datasets are shown in Table 2 For comparison the 119870-Means algorithm without AIS is used as baseline

41 Experimental Setup and Evaluation Metrics Our simula-tion is conducted inMATLABWe assume that knowledge ofsensor node locations is available at the base station We donot assume any specific routing or medium access protocolin this network ormake any assumptions on the node densityof the network because our algorithm is not for a particularapplication scenario

Detection rate (DR) and false alarm rate (FAR) are used toevaluate the performance of our outlier detection algorithmDetection rate is the ratio of correctly detected outlier datato the total number of outlier data The false alarm rate isthe ratio of the number of normal data which is incorrectlydetected as outlier data to the total number of normal data

The number of abnormal data is very few in general so wewill select the cluster where the number of data is minimumas outlier cluster and calculate the DR and FAR depending onthese data in the outlier cluster

6 International Journal of Distributed Sensor Networks

Input Original data from every sensorOutput abnormal sensor and outlier dataFor original data from every sensor(1) Compress data with improved PAA algorithm in each node and get compressed data (119888

119894119895 Var119894119895)

(2) All nodes send the compressed data of the same time to the base station(3) Initialize the antibody group 119879 and set the initial number of iteration as 0(4) While (The number of iteration has not been reached the specified value)(5) For each antibody in 119879(6) Classify all the compressed data with improved119870-Means algorithm and calculate the affinity 119869(7) End the classification(8) Sort the all affinity by size and choose these antibody at the top of the reorder sequence

as patient antibody group Then clone these antibodies based on the size of affinities(9) Determine whether or not the minimum 119869meets the end condition(10) If (it does)(11) Jump out of the loop of optimization and end the classification(12) Else(13) Process the patient antibody group with the operation of clone crossover and

mutation to format a new diversity generation of antibodies group(14) The number of iteration plus one(15) End the processing of optimization and select the antibody in first line as the best initial centroids of clusters(16) Classify the compressed data with the selected antibody and pick out the abnormal compression data(17) Judge the abnormal type based on the location of the abnormal data

Pseudocode 1 Pseudocode of our algorithm

42 Experimental Results

421 Decision of Related Parameters As shown in the pro-cess of our algorithm in the chapter 3 the compression ratio119870 and the number 1198961015840 of clusters are two key parameters whichhave a great impact on the classification result Thereforerelevant experiments have been completed to determine theappropriate scope of119870 and 1198961015840 whichmeans that the followingdetection results are robust and accurate In order to showthe results clearly and concisely we will only present theresults on database of stdb 308 because the results on otherdatabases are similar The database of stdb 308 is shown inFigure 2 and the data whose absolute value is more than 05is abnormalThe results of DR and FARunder different119870 and1198961015840 are separately shown in Tables 3 and 4 In order to reduce

the error we take the average value of 100 times experimentsunder the same parameters as our finial result

According to the described algorithm in last chapter thesmaller 119870 is the less the data information after compressingloses and the more accurate the detection is On the otherhand the larger 119870 is the better the effect of compression isand the more the energy is saved For the parameter 1198961015840 theless it is the less the calculation amount is but the worsethe effect of classification is and the performance is oppositewhen 1198701015840 increases

As shown in Tables 3 and 4 the results are in accordancewith theoretical analysis However in order to balance theenergy saving and effective outlier detection while consider-ing the calculation amount we determine that 119870 is 4 and 1198961015840is 3

422 Results of Outlier Detection With 119870 and 1198961015840 selectedin the previous subsection relevant experiments were

Table 3 DR () under different 119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 9612 9645 9705 9712 97153 9677 9712 9723 9730 97284 9765 9715 9720 9732 97355 9750 9789 9791 9786 97856 9762 9750 9803 9811 98107 9759 9785 9800 9808 98128 9765 9779 9805 9812 9814

Table 4 FAR () under different119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 512 445 387 312 2983 511 432 358 309 2784 485 421 356 298 2775 490 387 334 301 2656 485 375 298 265 2567 483 368 255 234 2458 475 357 214 198 153

conducted to compare the performance of our algorithm(OA) and the algorithmof119870-Meanswithout AIS (KWA) andthe results are shown in Table 5

As shown in the Table 5 the performance of our algo-rithm with every database is obviously better DR is higher

International Journal of Distributed Sensor Networks 7

0 500 1000 1500 2000 2500 3000 3500

0

05

1

15

minus15

minus1

minus05

Figure 2 The database of stdb 308

Table 5 The experimental results

Name of dataset DRwith OA

DRwith KWA

FARwith OA

FARwith KWA

Ma Data 9750 8112 334 823Keogh Data 9810 8045 315 778Chfdb chf01 9886 8134 322 712Chfdb chf13 9713 7865 197 802Stdb 308 9723 8067 358 784Synthetic control 9834 7812 278 845

and FAR is lower than the algorithmwithout AISThe reasonis that our algorithm adopts the AIS algorithm to ensure thatthe result is global optimization on the contrary the otheralgorithm is depended on the initial cluster centers so thatthe result of every experiment is local optimum and unstableIn addition although the original data is compressed beforeoutlier detection the DR of our algorithm in every line ishigher than 95 while the FAR is lower than 5 so ouralgorithm is an ideal and effective outlier detection systeminWSN which can ensure high detection rate while the falsealarm rate is low

However there is more or less environmental noise in theactual application situation whichmakes the original changeso we artificially add Gaussian white noise in the original sig-nals and conduct the experiments again to test the robustnessof our algorithmWe take the Ma Data (Ma) Stdb 308 (Std)and Synthetic control (Syn) databases as examples to carryout the experiments and results are shown in Figures 3 and 4Figure 3 shows the DA of these databases with different SNRand Figure 4 shows the FRA with different SNR

As shown in Figures 3 and 4 we test the robustness of ouralgorithm under several Gaussian white noise with differentsignal to noise ratio (SNR) In these two pictures DR has aslight downward trend and FAR is on the slight rise with thereducing of the SNR because smaller SNR will have a greaterinfluence on original data However the DR is still biggerthan 85 and the FAR smaller than 15under different noise

so it is obvious that our algorithm has a good robustness onresisting noise

423 Analysis of Energy Saving and Real-Time PerformanceCompared to other outlier algorithms another advantage isour algorithm can prolong the network life because the PAAalgorithm reduces the amount of sent data For example forthe database of stdb 308 the number of sent data will reduceto 1800 from 3600 when 119870 is 4 so the network can save halfof the energy and the network can save more energy when119870is more

Some outlier detection methods which combine withdata fusion algorithm reconstruct the original data beforedetecting However the reconstruct process of a data fusionalgorithm usually need long time so it will lead to poorreal-time performance of algorithmOur detection algorithmdirectly deals with the compressed data so it can have a betterreal-time performance

424 The Comparison of Our Algorithm with Other OutlierDetection Algorithms in WSN In order to show the effec-tiveness of our algorithm more clearly our algorithm willbe compared with these algorithms listed in Table 1 Thedatabases stdb 308 is selected to do the relevant experimentsFor every algorithm we will choose the optimal parameterssettings according to every initial paper In addition we willtake the average value of 100 times experiments as the finalexperiments results to ensure the reliability of the experi-ments The comparison result is shown in Table 6

As shown inTable 6 theDR and FARof our algorithm areboth the best among above algorithms Because we combinethe AIS with 119870-Means to optimize the classification resultsit can detect outliers effectively Also our algorithm can saveenergy consumption best compared with other algorithmsbecause the function of PAA ensures that the lifetime ofnetwork can be prolonged effectively Because there is no theprocess of data compression in the algorithm proposed byZhang et al [4] in which the time consumption is less than inour algorithmHowever because the PAA119870-Means andAISare lightweight methods in our paper the time consumption

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

International Journal of Distributed Sensor Networks 3

NoYes

Original sensor data

Compress data with PAA

Classify with improved

Determine whether meets the

requirement of classification

Output detection result

Optimize by AIS

K-Means algorithm

Figure 1 Flow diagram of our proposed algorithm

algorithm Firstly we utilize the PAA to compress the originalsensor data Then the improved 119870-Means algorithm is usedto complete the classification of compressed data and the AISoffsets drawbacks of 119870-Means so that the detection result isglobal optimal The two phases of compression and outlierdetection will be discussed below in detail

31 Data Compression with PAA To save energy of sensorstechniques of data compression have been used in WSNbecause most energy is consumed in data transmission [17ndash19] However due to resource constraints a suitable compres-sion algorithm for WSN should be efficient and simple Asa dimensionality reduction algorithm PAA is more intuitiveand simple when compared with these techniques such asFourier transforms wavelets and so on [20] What is morethe degree of compression can be changed by adjusting thecompression ratio parameter Thus the PAA technique isadopted to compress the data in this paper

A time series 119862 of length 119899 can be represented in a 119908-dimensional space by vector

119862 = 1198881 119888

119908 (1)

And the 119894th element of 119862 is calculated as follows

119888119894=1

119896times

119896sdot119894

sum

119895=119896(119894minus1)+1

119888119895

119896 =119899

119908

(2)

where the119870 is defined as compression ratio and 119862119895is the 119895th

element in the original data More information about PAA ispresent in this paper [20]

Obviously the compression process of PAA is simple andfast but this way of compression also introduces a problemthat this technique compresses the data directly withoutconsidering the characteristics of data and the correlationbetween data which may lead to some mistakes in thesubsequent handling of compressed data This problem alsohas an influence on the outlier detection behind For examplethe situation in which the value of data is greater than 10 isconsidered abnormal and 119862

1= 2 4 9 and 119862

2= 1 1 13

are two original data series After PAA with 119896 = 3 1198881=

1198882= 3 the difference between two series is hidden and the

outlier data cannot be detected In order to further presentthe differences between sequences the variance of sequenceis added as the second output of PAAThe variance associatedwith the 119894th element of 119862 is defined as

Var119894=1

119896times

119896sdot119894

sum

119895=119896(119894minus1)+1

(119888119895minus 119888119894)2

(3)

Due to the energy consumption on acquiring the neigh-bor data it is not encouraged to compress the data with PAAon the space On the contrary every sensor has a cache tostorage previous data so it is convenient to compress timeseries data with PAA in each node A sliding window is usedto implement the selection of data with 119896 length where 119896represents the compression ratio of PAA According to (3)

4 International Journal of Distributed Sensor Networks

and (4) a binary array119883119895= (119888119894119895Var119894119895) is defined to represent

the 119894th compression data of the 119895th sensor node

32 Outlier Detection on Compressed Data Based on com-pression result of PAA in the preceding subsection wecombine an improved unsupervised detection algorithm of119870-Means and AIS to effectively classify the normal andabnormal compressed sensor data Because reconstructionof the initial data is not needed it will have good real-time performance After all sensors sent their data to theSink node the improved 119870-Means algorithm which needsno prior knowledge of data distribution is used to completethe classification of compressed data and the AIS offsets thedrawbacks of 119870-Means so that the detection result is globaloptimal We will introduce the process of outlier detection indetail in the following

321 Classify by the Algorithm of119870-Means According to thedefinition of outlier in WSN the key of the outlier detectionis how to effectively separate the outliers from normal dataClassification is a process to classify sampled data intodifferent classes or clusters Some classification techniquessuch as SVM and artificial neural network (ANN) have beenutilized in the outlier detection in WSN [21ndash23] As one ofthe simplest classification algorithm the119870-Means techniquealso has been proved to be effective in the outlier detectionin other areas [24ndash26] which groups data with length 119873points into 1198961015840 clusters where 1198961015840 represents the number ofclusters and is used to distinguish the compression ratio 119870In addition compared to other classification techniques the119870-Means algorithm based on outlier detection in WSN hasthe following advantages

(i) The 119870-Means do not need knowledge of the datadistribution and it is suitable for dynamic data

(ii) The calculation is simpler than other classificationtechniques such as SVM and ANN

However there are two elements included in every com-pressed data so we have an improvement on original 119870-Means algorithm to ensure that the 119870-Means algorithm canbe combined with PAA algorithm Suppose that the numberof sensors in a WSN is 119873 and the improved steps of the 119870-Means in our algorithm are as bellow

(1) Select 1198961015840 random instances 1198721 (1198721111987212) 1198722

(1198722111987222) sdot sdot sdot119872

1198961015840 (1198721198701015840111987211989610158402) from the all com-

pressed sensor data as the initial centroids of theclusters 119862

1 1198622sdot sdot sdot 1198621198701015840

(2) For every training data119883119895= (119888119894119895Var119894119895)

(a) Calculate the Euclidean distance

1198841198991198951=10038171003817100381710038171003817119862119894119895minus1198721198991

10038171003817100381710038171003817

2

1198841198991198952=10038171003817100381710038171003817Var119894119895minus1198721198992

10038171003817100381710038171003817

2

119899 = 1 2 sdot sdot sdot 1198961015840

(4)

where 1198841198991198951

is the distance between the firstelement of 119883

119895and the first element of the 119899th

cluster and1198841198991198952

is the distance of the second ele-ment between119883

119895and the 119899th cluster To reduce

the influence of different order of magnitude1198841198991198951

and1198841198991198952

are normalized to11988410158401198991198951

and11988410158401198991198952 and

the distance between119883119895and119872

119902is defined as

119863(119883119895119899) = 119898 lowast 119884

1015840

1198991198951+ (1 minus 119898) lowast 119884

1015840

1198991198952119899 = 1 2 sdot sdot sdot 119896

1015840 (5)

where119898 is the weighted parameter to adjust theproportion of two factors The distance repre-sents the correlation between sensor data andcluster center and it also determines whetherthe data belong to the outlier class Finally findcluster 119862119902 that is closest to119883119895

(b) Assign 119883119895to 119862119902

and update the centroid(11987211990211198721199022) of 119862119902(the centroid of a cluster is the

arithmetic mean of the instances in the cluster)

(3) Repeat steps (2) until the centers no longer changeFinally the algorithm aims at minimizing the squarederror function 119869

119869 =

119873

sum

119895=1

10038171003817100381710038171003817119863 (119883119895

(119894))10038171003817100381710038171003817

2

119895 = 1 2 sdot sdot sdot 1198961015840 (6)

where 119863(119883119895

(119894)) represents the distance between 119883

119895

and the center value of cluster 119862(119894)119895

in which 119883119894is

located

After the process of classification according to the def-inition of outlier the ideal result is that the abnormal datawill be assigned to the same cluster while the normal datawill be assigned to the same cluster because these outliers aredeviated from the normal dataTherefore the less the result 119869is themore precise the classification is In addition comparedto the normal data the number of abnormal data is relativelyless so these data in the cluster where the number of data isthe least is the identified outlier

322 Classification Improvement with AIS As it is knownthat the 119870-Means algorithm depends on the initial centroidsof the clusters and it is easy to fall into local optimum so westill need to solve these problems to make the classificationmore precise The AIS algorithm which is also knownas clonal selection algorithm (CSA) and a global optimalsearching algorithm is considered appropriate to offset thedrawbacks of119870-Means algorithm in the paper

Clonal selection algorithm is an emerging intelligentalgorithmwhich is inspired by the immune system It uses thediversity of immune system to maintain population diversityso that it can avoid the ldquopremature problemrdquo in general opti-mization and get the global optimization [27ndash29] The detaildescription of this algorithm is presented in this paper [30]

According to the defects of the 119870-Means algorithm thepurpose of CSA in this paper is to find the best initialcentroids of the clusters which ensure that the classification

International Journal of Distributed Sensor Networks 5

result is the global optimum and our outliersrsquo detection rateis high The application of CSA applied in our paper can bedescribed as follows

(1) In our 119870-Means algorithm the squared error func-tion 119869 in (6) is the judgment standard of classificationresult so we choose 119869 as the objective function andthe affinity

(2) Because our purpose is to find the best initial cen-troids of clusters we define centroids of the clustersas antibody and randomly initialize multiple array ofcentroids as initial antibody group 119879

119879 =

[[[[[[[[[[[

[

1198721

11198721

2sdot sdot sdot sdot sdot sdot 119872

1

1198701015840

1198722

11198722

2sdot sdot sdot sdot sdot sdot 119872

2

1198701015840

119872119876

1119872119876

2sdot sdot sdot sdot sdot sdot 119872

119876

1198701015840

]]]]]]]]]]]

]

(7)

where (1198721198761119872119876

2sdot sdot sdot119872119876

1198701015840) is the 119876th initial centroids

of clusters(3) For every antibody in 119879 we classify the compressed

data with the improved 119870-Means and record theaffinity 119869 Then we sort the affinity sequence by size

(4) According to the reorder affinity sequence we selectsome antibody whose affinity is at the top of thesequence as parent antibody group which is as theinitial antibody group in the next round because thegood genes are more likely to be propagated to thenext generation according to the genetics Then clonethese antibodies based on the size of affinities

(5) Determine whether or not the classification resultcalculated with the antibody corresponding to theminimum 119869 meets the end condition that the 119869 isenough small If meets this antibody is the bestinitial centroids of clusters which can ensure theclassification result with 119870-Means is best Otherwiseit will continue the following steps

(6) Process these antibodies selected in step (4) with theoperation of clone crossover and mutation to formata new diversity generation of antibodies group

(7) If the number of iteration has been arrived then theprocess is also finished otherwise turn to step (3)

After the above process we can find the more idealinitial cluster heads and get the more accurate classificationthan initial 119870-Means algorithm As a result the normal andabnormal sensor data will be selected to different clustermore effectively so our algorithm can have higher detectionaccuracy while the false alarm is lower However an effectiveoutlier detection algorithm in WSN not only need to havehigh detection rate but also need to satisfy the characteristicof sensor data and constrained resource Compared to otheralgorithms the advantages of our algorithm are shown inTable 1

Table 1 The comparison of characteristics

OnlineBased on

compresseddata

No a prioriknowledgeof data

Our algorithm Yes Yes YesSalem et al [5] Yes No YesBhargava andRaghuvanshi [7] Yes Yes No

Zhang et al [4] Yes No No

Table 2 The experimental datasets

Name ofdataset

Length ofdataset Source of dataset

Ma Data 1200 Synthetic dataset in this paper [10]Keogh Data 1200 Synthetic dataset in this paper [10]chfdb chf01 3600 ECG [11]chfdb chf13 3600 ECG [11]stdb 308 3600 ECG [11]Synthetic control 600 UCI [12]

As shown in Table 1 compared to other algorithms ouralgorithm can satisfy more requirements of outlier detectionin WSN

33 Pseudocode of Our Algorithm Based on the previousintroduction of every part in our algorithm Pseudocode 1 isused to describe the whole process of our algorithm

4 Experimental Evaluation

In order to evaluate the performance of our proposed algo-rithm experiments were carried out based on two syntheticanomaly datasets and some realmedicine datasets commonlyused in anomaly detection The name and detail source ofthese datasets are shown in Table 2 For comparison the 119870-Means algorithm without AIS is used as baseline

41 Experimental Setup and Evaluation Metrics Our simula-tion is conducted inMATLABWe assume that knowledge ofsensor node locations is available at the base station We donot assume any specific routing or medium access protocolin this network ormake any assumptions on the node densityof the network because our algorithm is not for a particularapplication scenario

Detection rate (DR) and false alarm rate (FAR) are used toevaluate the performance of our outlier detection algorithmDetection rate is the ratio of correctly detected outlier datato the total number of outlier data The false alarm rate isthe ratio of the number of normal data which is incorrectlydetected as outlier data to the total number of normal data

The number of abnormal data is very few in general so wewill select the cluster where the number of data is minimumas outlier cluster and calculate the DR and FAR depending onthese data in the outlier cluster

6 International Journal of Distributed Sensor Networks

Input Original data from every sensorOutput abnormal sensor and outlier dataFor original data from every sensor(1) Compress data with improved PAA algorithm in each node and get compressed data (119888

119894119895 Var119894119895)

(2) All nodes send the compressed data of the same time to the base station(3) Initialize the antibody group 119879 and set the initial number of iteration as 0(4) While (The number of iteration has not been reached the specified value)(5) For each antibody in 119879(6) Classify all the compressed data with improved119870-Means algorithm and calculate the affinity 119869(7) End the classification(8) Sort the all affinity by size and choose these antibody at the top of the reorder sequence

as patient antibody group Then clone these antibodies based on the size of affinities(9) Determine whether or not the minimum 119869meets the end condition(10) If (it does)(11) Jump out of the loop of optimization and end the classification(12) Else(13) Process the patient antibody group with the operation of clone crossover and

mutation to format a new diversity generation of antibodies group(14) The number of iteration plus one(15) End the processing of optimization and select the antibody in first line as the best initial centroids of clusters(16) Classify the compressed data with the selected antibody and pick out the abnormal compression data(17) Judge the abnormal type based on the location of the abnormal data

Pseudocode 1 Pseudocode of our algorithm

42 Experimental Results

421 Decision of Related Parameters As shown in the pro-cess of our algorithm in the chapter 3 the compression ratio119870 and the number 1198961015840 of clusters are two key parameters whichhave a great impact on the classification result Thereforerelevant experiments have been completed to determine theappropriate scope of119870 and 1198961015840 whichmeans that the followingdetection results are robust and accurate In order to showthe results clearly and concisely we will only present theresults on database of stdb 308 because the results on otherdatabases are similar The database of stdb 308 is shown inFigure 2 and the data whose absolute value is more than 05is abnormalThe results of DR and FARunder different119870 and1198961015840 are separately shown in Tables 3 and 4 In order to reduce

the error we take the average value of 100 times experimentsunder the same parameters as our finial result

According to the described algorithm in last chapter thesmaller 119870 is the less the data information after compressingloses and the more accurate the detection is On the otherhand the larger 119870 is the better the effect of compression isand the more the energy is saved For the parameter 1198961015840 theless it is the less the calculation amount is but the worsethe effect of classification is and the performance is oppositewhen 1198701015840 increases

As shown in Tables 3 and 4 the results are in accordancewith theoretical analysis However in order to balance theenergy saving and effective outlier detection while consider-ing the calculation amount we determine that 119870 is 4 and 1198961015840is 3

422 Results of Outlier Detection With 119870 and 1198961015840 selectedin the previous subsection relevant experiments were

Table 3 DR () under different 119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 9612 9645 9705 9712 97153 9677 9712 9723 9730 97284 9765 9715 9720 9732 97355 9750 9789 9791 9786 97856 9762 9750 9803 9811 98107 9759 9785 9800 9808 98128 9765 9779 9805 9812 9814

Table 4 FAR () under different119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 512 445 387 312 2983 511 432 358 309 2784 485 421 356 298 2775 490 387 334 301 2656 485 375 298 265 2567 483 368 255 234 2458 475 357 214 198 153

conducted to compare the performance of our algorithm(OA) and the algorithmof119870-Meanswithout AIS (KWA) andthe results are shown in Table 5

As shown in the Table 5 the performance of our algo-rithm with every database is obviously better DR is higher

International Journal of Distributed Sensor Networks 7

0 500 1000 1500 2000 2500 3000 3500

0

05

1

15

minus15

minus1

minus05

Figure 2 The database of stdb 308

Table 5 The experimental results

Name of dataset DRwith OA

DRwith KWA

FARwith OA

FARwith KWA

Ma Data 9750 8112 334 823Keogh Data 9810 8045 315 778Chfdb chf01 9886 8134 322 712Chfdb chf13 9713 7865 197 802Stdb 308 9723 8067 358 784Synthetic control 9834 7812 278 845

and FAR is lower than the algorithmwithout AISThe reasonis that our algorithm adopts the AIS algorithm to ensure thatthe result is global optimization on the contrary the otheralgorithm is depended on the initial cluster centers so thatthe result of every experiment is local optimum and unstableIn addition although the original data is compressed beforeoutlier detection the DR of our algorithm in every line ishigher than 95 while the FAR is lower than 5 so ouralgorithm is an ideal and effective outlier detection systeminWSN which can ensure high detection rate while the falsealarm rate is low

However there is more or less environmental noise in theactual application situation whichmakes the original changeso we artificially add Gaussian white noise in the original sig-nals and conduct the experiments again to test the robustnessof our algorithmWe take the Ma Data (Ma) Stdb 308 (Std)and Synthetic control (Syn) databases as examples to carryout the experiments and results are shown in Figures 3 and 4Figure 3 shows the DA of these databases with different SNRand Figure 4 shows the FRA with different SNR

As shown in Figures 3 and 4 we test the robustness of ouralgorithm under several Gaussian white noise with differentsignal to noise ratio (SNR) In these two pictures DR has aslight downward trend and FAR is on the slight rise with thereducing of the SNR because smaller SNR will have a greaterinfluence on original data However the DR is still biggerthan 85 and the FAR smaller than 15under different noise

so it is obvious that our algorithm has a good robustness onresisting noise

423 Analysis of Energy Saving and Real-Time PerformanceCompared to other outlier algorithms another advantage isour algorithm can prolong the network life because the PAAalgorithm reduces the amount of sent data For example forthe database of stdb 308 the number of sent data will reduceto 1800 from 3600 when 119870 is 4 so the network can save halfof the energy and the network can save more energy when119870is more

Some outlier detection methods which combine withdata fusion algorithm reconstruct the original data beforedetecting However the reconstruct process of a data fusionalgorithm usually need long time so it will lead to poorreal-time performance of algorithmOur detection algorithmdirectly deals with the compressed data so it can have a betterreal-time performance

424 The Comparison of Our Algorithm with Other OutlierDetection Algorithms in WSN In order to show the effec-tiveness of our algorithm more clearly our algorithm willbe compared with these algorithms listed in Table 1 Thedatabases stdb 308 is selected to do the relevant experimentsFor every algorithm we will choose the optimal parameterssettings according to every initial paper In addition we willtake the average value of 100 times experiments as the finalexperiments results to ensure the reliability of the experi-ments The comparison result is shown in Table 6

As shown inTable 6 theDR and FARof our algorithm areboth the best among above algorithms Because we combinethe AIS with 119870-Means to optimize the classification resultsit can detect outliers effectively Also our algorithm can saveenergy consumption best compared with other algorithmsbecause the function of PAA ensures that the lifetime ofnetwork can be prolonged effectively Because there is no theprocess of data compression in the algorithm proposed byZhang et al [4] in which the time consumption is less than inour algorithmHowever because the PAA119870-Means andAISare lightweight methods in our paper the time consumption

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

4 International Journal of Distributed Sensor Networks

and (4) a binary array119883119895= (119888119894119895Var119894119895) is defined to represent

the 119894th compression data of the 119895th sensor node

32 Outlier Detection on Compressed Data Based on com-pression result of PAA in the preceding subsection wecombine an improved unsupervised detection algorithm of119870-Means and AIS to effectively classify the normal andabnormal compressed sensor data Because reconstructionof the initial data is not needed it will have good real-time performance After all sensors sent their data to theSink node the improved 119870-Means algorithm which needsno prior knowledge of data distribution is used to completethe classification of compressed data and the AIS offsets thedrawbacks of 119870-Means so that the detection result is globaloptimal We will introduce the process of outlier detection indetail in the following

321 Classify by the Algorithm of119870-Means According to thedefinition of outlier in WSN the key of the outlier detectionis how to effectively separate the outliers from normal dataClassification is a process to classify sampled data intodifferent classes or clusters Some classification techniquessuch as SVM and artificial neural network (ANN) have beenutilized in the outlier detection in WSN [21ndash23] As one ofthe simplest classification algorithm the119870-Means techniquealso has been proved to be effective in the outlier detectionin other areas [24ndash26] which groups data with length 119873points into 1198961015840 clusters where 1198961015840 represents the number ofclusters and is used to distinguish the compression ratio 119870In addition compared to other classification techniques the119870-Means algorithm based on outlier detection in WSN hasthe following advantages

(i) The 119870-Means do not need knowledge of the datadistribution and it is suitable for dynamic data

(ii) The calculation is simpler than other classificationtechniques such as SVM and ANN

However there are two elements included in every com-pressed data so we have an improvement on original 119870-Means algorithm to ensure that the 119870-Means algorithm canbe combined with PAA algorithm Suppose that the numberof sensors in a WSN is 119873 and the improved steps of the 119870-Means in our algorithm are as bellow

(1) Select 1198961015840 random instances 1198721 (1198721111987212) 1198722

(1198722111987222) sdot sdot sdot119872

1198961015840 (1198721198701015840111987211989610158402) from the all com-

pressed sensor data as the initial centroids of theclusters 119862

1 1198622sdot sdot sdot 1198621198701015840

(2) For every training data119883119895= (119888119894119895Var119894119895)

(a) Calculate the Euclidean distance

1198841198991198951=10038171003817100381710038171003817119862119894119895minus1198721198991

10038171003817100381710038171003817

2

1198841198991198952=10038171003817100381710038171003817Var119894119895minus1198721198992

10038171003817100381710038171003817

2

119899 = 1 2 sdot sdot sdot 1198961015840

(4)

where 1198841198991198951

is the distance between the firstelement of 119883

119895and the first element of the 119899th

cluster and1198841198991198952

is the distance of the second ele-ment between119883

119895and the 119899th cluster To reduce

the influence of different order of magnitude1198841198991198951

and1198841198991198952

are normalized to11988410158401198991198951

and11988410158401198991198952 and

the distance between119883119895and119872

119902is defined as

119863(119883119895119899) = 119898 lowast 119884

1015840

1198991198951+ (1 minus 119898) lowast 119884

1015840

1198991198952119899 = 1 2 sdot sdot sdot 119896

1015840 (5)

where119898 is the weighted parameter to adjust theproportion of two factors The distance repre-sents the correlation between sensor data andcluster center and it also determines whetherthe data belong to the outlier class Finally findcluster 119862119902 that is closest to119883119895

(b) Assign 119883119895to 119862119902

and update the centroid(11987211990211198721199022) of 119862119902(the centroid of a cluster is the

arithmetic mean of the instances in the cluster)

(3) Repeat steps (2) until the centers no longer changeFinally the algorithm aims at minimizing the squarederror function 119869

119869 =

119873

sum

119895=1

10038171003817100381710038171003817119863 (119883119895

(119894))10038171003817100381710038171003817

2

119895 = 1 2 sdot sdot sdot 1198961015840 (6)

where 119863(119883119895

(119894)) represents the distance between 119883

119895

and the center value of cluster 119862(119894)119895

in which 119883119894is

located

After the process of classification according to the def-inition of outlier the ideal result is that the abnormal datawill be assigned to the same cluster while the normal datawill be assigned to the same cluster because these outliers aredeviated from the normal dataTherefore the less the result 119869is themore precise the classification is In addition comparedto the normal data the number of abnormal data is relativelyless so these data in the cluster where the number of data isthe least is the identified outlier

322 Classification Improvement with AIS As it is knownthat the 119870-Means algorithm depends on the initial centroidsof the clusters and it is easy to fall into local optimum so westill need to solve these problems to make the classificationmore precise The AIS algorithm which is also knownas clonal selection algorithm (CSA) and a global optimalsearching algorithm is considered appropriate to offset thedrawbacks of119870-Means algorithm in the paper

Clonal selection algorithm is an emerging intelligentalgorithmwhich is inspired by the immune system It uses thediversity of immune system to maintain population diversityso that it can avoid the ldquopremature problemrdquo in general opti-mization and get the global optimization [27ndash29] The detaildescription of this algorithm is presented in this paper [30]

According to the defects of the 119870-Means algorithm thepurpose of CSA in this paper is to find the best initialcentroids of the clusters which ensure that the classification

International Journal of Distributed Sensor Networks 5

result is the global optimum and our outliersrsquo detection rateis high The application of CSA applied in our paper can bedescribed as follows

(1) In our 119870-Means algorithm the squared error func-tion 119869 in (6) is the judgment standard of classificationresult so we choose 119869 as the objective function andthe affinity

(2) Because our purpose is to find the best initial cen-troids of clusters we define centroids of the clustersas antibody and randomly initialize multiple array ofcentroids as initial antibody group 119879

119879 =

[[[[[[[[[[[

[

1198721

11198721

2sdot sdot sdot sdot sdot sdot 119872

1

1198701015840

1198722

11198722

2sdot sdot sdot sdot sdot sdot 119872

2

1198701015840

119872119876

1119872119876

2sdot sdot sdot sdot sdot sdot 119872

119876

1198701015840

]]]]]]]]]]]

]

(7)

where (1198721198761119872119876

2sdot sdot sdot119872119876

1198701015840) is the 119876th initial centroids

of clusters(3) For every antibody in 119879 we classify the compressed

data with the improved 119870-Means and record theaffinity 119869 Then we sort the affinity sequence by size

(4) According to the reorder affinity sequence we selectsome antibody whose affinity is at the top of thesequence as parent antibody group which is as theinitial antibody group in the next round because thegood genes are more likely to be propagated to thenext generation according to the genetics Then clonethese antibodies based on the size of affinities

(5) Determine whether or not the classification resultcalculated with the antibody corresponding to theminimum 119869 meets the end condition that the 119869 isenough small If meets this antibody is the bestinitial centroids of clusters which can ensure theclassification result with 119870-Means is best Otherwiseit will continue the following steps

(6) Process these antibodies selected in step (4) with theoperation of clone crossover and mutation to formata new diversity generation of antibodies group

(7) If the number of iteration has been arrived then theprocess is also finished otherwise turn to step (3)

After the above process we can find the more idealinitial cluster heads and get the more accurate classificationthan initial 119870-Means algorithm As a result the normal andabnormal sensor data will be selected to different clustermore effectively so our algorithm can have higher detectionaccuracy while the false alarm is lower However an effectiveoutlier detection algorithm in WSN not only need to havehigh detection rate but also need to satisfy the characteristicof sensor data and constrained resource Compared to otheralgorithms the advantages of our algorithm are shown inTable 1

Table 1 The comparison of characteristics

OnlineBased on

compresseddata

No a prioriknowledgeof data

Our algorithm Yes Yes YesSalem et al [5] Yes No YesBhargava andRaghuvanshi [7] Yes Yes No

Zhang et al [4] Yes No No

Table 2 The experimental datasets

Name ofdataset

Length ofdataset Source of dataset

Ma Data 1200 Synthetic dataset in this paper [10]Keogh Data 1200 Synthetic dataset in this paper [10]chfdb chf01 3600 ECG [11]chfdb chf13 3600 ECG [11]stdb 308 3600 ECG [11]Synthetic control 600 UCI [12]

As shown in Table 1 compared to other algorithms ouralgorithm can satisfy more requirements of outlier detectionin WSN

33 Pseudocode of Our Algorithm Based on the previousintroduction of every part in our algorithm Pseudocode 1 isused to describe the whole process of our algorithm

4 Experimental Evaluation

In order to evaluate the performance of our proposed algo-rithm experiments were carried out based on two syntheticanomaly datasets and some realmedicine datasets commonlyused in anomaly detection The name and detail source ofthese datasets are shown in Table 2 For comparison the 119870-Means algorithm without AIS is used as baseline

41 Experimental Setup and Evaluation Metrics Our simula-tion is conducted inMATLABWe assume that knowledge ofsensor node locations is available at the base station We donot assume any specific routing or medium access protocolin this network ormake any assumptions on the node densityof the network because our algorithm is not for a particularapplication scenario

Detection rate (DR) and false alarm rate (FAR) are used toevaluate the performance of our outlier detection algorithmDetection rate is the ratio of correctly detected outlier datato the total number of outlier data The false alarm rate isthe ratio of the number of normal data which is incorrectlydetected as outlier data to the total number of normal data

The number of abnormal data is very few in general so wewill select the cluster where the number of data is minimumas outlier cluster and calculate the DR and FAR depending onthese data in the outlier cluster

6 International Journal of Distributed Sensor Networks

Input Original data from every sensorOutput abnormal sensor and outlier dataFor original data from every sensor(1) Compress data with improved PAA algorithm in each node and get compressed data (119888

119894119895 Var119894119895)

(2) All nodes send the compressed data of the same time to the base station(3) Initialize the antibody group 119879 and set the initial number of iteration as 0(4) While (The number of iteration has not been reached the specified value)(5) For each antibody in 119879(6) Classify all the compressed data with improved119870-Means algorithm and calculate the affinity 119869(7) End the classification(8) Sort the all affinity by size and choose these antibody at the top of the reorder sequence

as patient antibody group Then clone these antibodies based on the size of affinities(9) Determine whether or not the minimum 119869meets the end condition(10) If (it does)(11) Jump out of the loop of optimization and end the classification(12) Else(13) Process the patient antibody group with the operation of clone crossover and

mutation to format a new diversity generation of antibodies group(14) The number of iteration plus one(15) End the processing of optimization and select the antibody in first line as the best initial centroids of clusters(16) Classify the compressed data with the selected antibody and pick out the abnormal compression data(17) Judge the abnormal type based on the location of the abnormal data

Pseudocode 1 Pseudocode of our algorithm

42 Experimental Results

421 Decision of Related Parameters As shown in the pro-cess of our algorithm in the chapter 3 the compression ratio119870 and the number 1198961015840 of clusters are two key parameters whichhave a great impact on the classification result Thereforerelevant experiments have been completed to determine theappropriate scope of119870 and 1198961015840 whichmeans that the followingdetection results are robust and accurate In order to showthe results clearly and concisely we will only present theresults on database of stdb 308 because the results on otherdatabases are similar The database of stdb 308 is shown inFigure 2 and the data whose absolute value is more than 05is abnormalThe results of DR and FARunder different119870 and1198961015840 are separately shown in Tables 3 and 4 In order to reduce

the error we take the average value of 100 times experimentsunder the same parameters as our finial result

According to the described algorithm in last chapter thesmaller 119870 is the less the data information after compressingloses and the more accurate the detection is On the otherhand the larger 119870 is the better the effect of compression isand the more the energy is saved For the parameter 1198961015840 theless it is the less the calculation amount is but the worsethe effect of classification is and the performance is oppositewhen 1198701015840 increases

As shown in Tables 3 and 4 the results are in accordancewith theoretical analysis However in order to balance theenergy saving and effective outlier detection while consider-ing the calculation amount we determine that 119870 is 4 and 1198961015840is 3

422 Results of Outlier Detection With 119870 and 1198961015840 selectedin the previous subsection relevant experiments were

Table 3 DR () under different 119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 9612 9645 9705 9712 97153 9677 9712 9723 9730 97284 9765 9715 9720 9732 97355 9750 9789 9791 9786 97856 9762 9750 9803 9811 98107 9759 9785 9800 9808 98128 9765 9779 9805 9812 9814

Table 4 FAR () under different119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 512 445 387 312 2983 511 432 358 309 2784 485 421 356 298 2775 490 387 334 301 2656 485 375 298 265 2567 483 368 255 234 2458 475 357 214 198 153

conducted to compare the performance of our algorithm(OA) and the algorithmof119870-Meanswithout AIS (KWA) andthe results are shown in Table 5

As shown in the Table 5 the performance of our algo-rithm with every database is obviously better DR is higher

International Journal of Distributed Sensor Networks 7

0 500 1000 1500 2000 2500 3000 3500

0

05

1

15

minus15

minus1

minus05

Figure 2 The database of stdb 308

Table 5 The experimental results

Name of dataset DRwith OA

DRwith KWA

FARwith OA

FARwith KWA

Ma Data 9750 8112 334 823Keogh Data 9810 8045 315 778Chfdb chf01 9886 8134 322 712Chfdb chf13 9713 7865 197 802Stdb 308 9723 8067 358 784Synthetic control 9834 7812 278 845

and FAR is lower than the algorithmwithout AISThe reasonis that our algorithm adopts the AIS algorithm to ensure thatthe result is global optimization on the contrary the otheralgorithm is depended on the initial cluster centers so thatthe result of every experiment is local optimum and unstableIn addition although the original data is compressed beforeoutlier detection the DR of our algorithm in every line ishigher than 95 while the FAR is lower than 5 so ouralgorithm is an ideal and effective outlier detection systeminWSN which can ensure high detection rate while the falsealarm rate is low

However there is more or less environmental noise in theactual application situation whichmakes the original changeso we artificially add Gaussian white noise in the original sig-nals and conduct the experiments again to test the robustnessof our algorithmWe take the Ma Data (Ma) Stdb 308 (Std)and Synthetic control (Syn) databases as examples to carryout the experiments and results are shown in Figures 3 and 4Figure 3 shows the DA of these databases with different SNRand Figure 4 shows the FRA with different SNR

As shown in Figures 3 and 4 we test the robustness of ouralgorithm under several Gaussian white noise with differentsignal to noise ratio (SNR) In these two pictures DR has aslight downward trend and FAR is on the slight rise with thereducing of the SNR because smaller SNR will have a greaterinfluence on original data However the DR is still biggerthan 85 and the FAR smaller than 15under different noise

so it is obvious that our algorithm has a good robustness onresisting noise

423 Analysis of Energy Saving and Real-Time PerformanceCompared to other outlier algorithms another advantage isour algorithm can prolong the network life because the PAAalgorithm reduces the amount of sent data For example forthe database of stdb 308 the number of sent data will reduceto 1800 from 3600 when 119870 is 4 so the network can save halfof the energy and the network can save more energy when119870is more

Some outlier detection methods which combine withdata fusion algorithm reconstruct the original data beforedetecting However the reconstruct process of a data fusionalgorithm usually need long time so it will lead to poorreal-time performance of algorithmOur detection algorithmdirectly deals with the compressed data so it can have a betterreal-time performance

424 The Comparison of Our Algorithm with Other OutlierDetection Algorithms in WSN In order to show the effec-tiveness of our algorithm more clearly our algorithm willbe compared with these algorithms listed in Table 1 Thedatabases stdb 308 is selected to do the relevant experimentsFor every algorithm we will choose the optimal parameterssettings according to every initial paper In addition we willtake the average value of 100 times experiments as the finalexperiments results to ensure the reliability of the experi-ments The comparison result is shown in Table 6

As shown inTable 6 theDR and FARof our algorithm areboth the best among above algorithms Because we combinethe AIS with 119870-Means to optimize the classification resultsit can detect outliers effectively Also our algorithm can saveenergy consumption best compared with other algorithmsbecause the function of PAA ensures that the lifetime ofnetwork can be prolonged effectively Because there is no theprocess of data compression in the algorithm proposed byZhang et al [4] in which the time consumption is less than inour algorithmHowever because the PAA119870-Means andAISare lightweight methods in our paper the time consumption

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

International Journal of Distributed Sensor Networks 5

result is the global optimum and our outliersrsquo detection rateis high The application of CSA applied in our paper can bedescribed as follows

(1) In our 119870-Means algorithm the squared error func-tion 119869 in (6) is the judgment standard of classificationresult so we choose 119869 as the objective function andthe affinity

(2) Because our purpose is to find the best initial cen-troids of clusters we define centroids of the clustersas antibody and randomly initialize multiple array ofcentroids as initial antibody group 119879

119879 =

[[[[[[[[[[[

[

1198721

11198721

2sdot sdot sdot sdot sdot sdot 119872

1

1198701015840

1198722

11198722

2sdot sdot sdot sdot sdot sdot 119872

2

1198701015840

119872119876

1119872119876

2sdot sdot sdot sdot sdot sdot 119872

119876

1198701015840

]]]]]]]]]]]

]

(7)

where (1198721198761119872119876

2sdot sdot sdot119872119876

1198701015840) is the 119876th initial centroids

of clusters(3) For every antibody in 119879 we classify the compressed

data with the improved 119870-Means and record theaffinity 119869 Then we sort the affinity sequence by size

(4) According to the reorder affinity sequence we selectsome antibody whose affinity is at the top of thesequence as parent antibody group which is as theinitial antibody group in the next round because thegood genes are more likely to be propagated to thenext generation according to the genetics Then clonethese antibodies based on the size of affinities

(5) Determine whether or not the classification resultcalculated with the antibody corresponding to theminimum 119869 meets the end condition that the 119869 isenough small If meets this antibody is the bestinitial centroids of clusters which can ensure theclassification result with 119870-Means is best Otherwiseit will continue the following steps

(6) Process these antibodies selected in step (4) with theoperation of clone crossover and mutation to formata new diversity generation of antibodies group

(7) If the number of iteration has been arrived then theprocess is also finished otherwise turn to step (3)

After the above process we can find the more idealinitial cluster heads and get the more accurate classificationthan initial 119870-Means algorithm As a result the normal andabnormal sensor data will be selected to different clustermore effectively so our algorithm can have higher detectionaccuracy while the false alarm is lower However an effectiveoutlier detection algorithm in WSN not only need to havehigh detection rate but also need to satisfy the characteristicof sensor data and constrained resource Compared to otheralgorithms the advantages of our algorithm are shown inTable 1

Table 1 The comparison of characteristics

OnlineBased on

compresseddata

No a prioriknowledgeof data

Our algorithm Yes Yes YesSalem et al [5] Yes No YesBhargava andRaghuvanshi [7] Yes Yes No

Zhang et al [4] Yes No No

Table 2 The experimental datasets

Name ofdataset

Length ofdataset Source of dataset

Ma Data 1200 Synthetic dataset in this paper [10]Keogh Data 1200 Synthetic dataset in this paper [10]chfdb chf01 3600 ECG [11]chfdb chf13 3600 ECG [11]stdb 308 3600 ECG [11]Synthetic control 600 UCI [12]

As shown in Table 1 compared to other algorithms ouralgorithm can satisfy more requirements of outlier detectionin WSN

33 Pseudocode of Our Algorithm Based on the previousintroduction of every part in our algorithm Pseudocode 1 isused to describe the whole process of our algorithm

4 Experimental Evaluation

In order to evaluate the performance of our proposed algo-rithm experiments were carried out based on two syntheticanomaly datasets and some realmedicine datasets commonlyused in anomaly detection The name and detail source ofthese datasets are shown in Table 2 For comparison the 119870-Means algorithm without AIS is used as baseline

41 Experimental Setup and Evaluation Metrics Our simula-tion is conducted inMATLABWe assume that knowledge ofsensor node locations is available at the base station We donot assume any specific routing or medium access protocolin this network ormake any assumptions on the node densityof the network because our algorithm is not for a particularapplication scenario

Detection rate (DR) and false alarm rate (FAR) are used toevaluate the performance of our outlier detection algorithmDetection rate is the ratio of correctly detected outlier datato the total number of outlier data The false alarm rate isthe ratio of the number of normal data which is incorrectlydetected as outlier data to the total number of normal data

The number of abnormal data is very few in general so wewill select the cluster where the number of data is minimumas outlier cluster and calculate the DR and FAR depending onthese data in the outlier cluster

6 International Journal of Distributed Sensor Networks

Input Original data from every sensorOutput abnormal sensor and outlier dataFor original data from every sensor(1) Compress data with improved PAA algorithm in each node and get compressed data (119888

119894119895 Var119894119895)

(2) All nodes send the compressed data of the same time to the base station(3) Initialize the antibody group 119879 and set the initial number of iteration as 0(4) While (The number of iteration has not been reached the specified value)(5) For each antibody in 119879(6) Classify all the compressed data with improved119870-Means algorithm and calculate the affinity 119869(7) End the classification(8) Sort the all affinity by size and choose these antibody at the top of the reorder sequence

as patient antibody group Then clone these antibodies based on the size of affinities(9) Determine whether or not the minimum 119869meets the end condition(10) If (it does)(11) Jump out of the loop of optimization and end the classification(12) Else(13) Process the patient antibody group with the operation of clone crossover and

mutation to format a new diversity generation of antibodies group(14) The number of iteration plus one(15) End the processing of optimization and select the antibody in first line as the best initial centroids of clusters(16) Classify the compressed data with the selected antibody and pick out the abnormal compression data(17) Judge the abnormal type based on the location of the abnormal data

Pseudocode 1 Pseudocode of our algorithm

42 Experimental Results

421 Decision of Related Parameters As shown in the pro-cess of our algorithm in the chapter 3 the compression ratio119870 and the number 1198961015840 of clusters are two key parameters whichhave a great impact on the classification result Thereforerelevant experiments have been completed to determine theappropriate scope of119870 and 1198961015840 whichmeans that the followingdetection results are robust and accurate In order to showthe results clearly and concisely we will only present theresults on database of stdb 308 because the results on otherdatabases are similar The database of stdb 308 is shown inFigure 2 and the data whose absolute value is more than 05is abnormalThe results of DR and FARunder different119870 and1198961015840 are separately shown in Tables 3 and 4 In order to reduce

the error we take the average value of 100 times experimentsunder the same parameters as our finial result

According to the described algorithm in last chapter thesmaller 119870 is the less the data information after compressingloses and the more accurate the detection is On the otherhand the larger 119870 is the better the effect of compression isand the more the energy is saved For the parameter 1198961015840 theless it is the less the calculation amount is but the worsethe effect of classification is and the performance is oppositewhen 1198701015840 increases

As shown in Tables 3 and 4 the results are in accordancewith theoretical analysis However in order to balance theenergy saving and effective outlier detection while consider-ing the calculation amount we determine that 119870 is 4 and 1198961015840is 3

422 Results of Outlier Detection With 119870 and 1198961015840 selectedin the previous subsection relevant experiments were

Table 3 DR () under different 119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 9612 9645 9705 9712 97153 9677 9712 9723 9730 97284 9765 9715 9720 9732 97355 9750 9789 9791 9786 97856 9762 9750 9803 9811 98107 9759 9785 9800 9808 98128 9765 9779 9805 9812 9814

Table 4 FAR () under different119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 512 445 387 312 2983 511 432 358 309 2784 485 421 356 298 2775 490 387 334 301 2656 485 375 298 265 2567 483 368 255 234 2458 475 357 214 198 153

conducted to compare the performance of our algorithm(OA) and the algorithmof119870-Meanswithout AIS (KWA) andthe results are shown in Table 5

As shown in the Table 5 the performance of our algo-rithm with every database is obviously better DR is higher

International Journal of Distributed Sensor Networks 7

0 500 1000 1500 2000 2500 3000 3500

0

05

1

15

minus15

minus1

minus05

Figure 2 The database of stdb 308

Table 5 The experimental results

Name of dataset DRwith OA

DRwith KWA

FARwith OA

FARwith KWA

Ma Data 9750 8112 334 823Keogh Data 9810 8045 315 778Chfdb chf01 9886 8134 322 712Chfdb chf13 9713 7865 197 802Stdb 308 9723 8067 358 784Synthetic control 9834 7812 278 845

and FAR is lower than the algorithmwithout AISThe reasonis that our algorithm adopts the AIS algorithm to ensure thatthe result is global optimization on the contrary the otheralgorithm is depended on the initial cluster centers so thatthe result of every experiment is local optimum and unstableIn addition although the original data is compressed beforeoutlier detection the DR of our algorithm in every line ishigher than 95 while the FAR is lower than 5 so ouralgorithm is an ideal and effective outlier detection systeminWSN which can ensure high detection rate while the falsealarm rate is low

However there is more or less environmental noise in theactual application situation whichmakes the original changeso we artificially add Gaussian white noise in the original sig-nals and conduct the experiments again to test the robustnessof our algorithmWe take the Ma Data (Ma) Stdb 308 (Std)and Synthetic control (Syn) databases as examples to carryout the experiments and results are shown in Figures 3 and 4Figure 3 shows the DA of these databases with different SNRand Figure 4 shows the FRA with different SNR

As shown in Figures 3 and 4 we test the robustness of ouralgorithm under several Gaussian white noise with differentsignal to noise ratio (SNR) In these two pictures DR has aslight downward trend and FAR is on the slight rise with thereducing of the SNR because smaller SNR will have a greaterinfluence on original data However the DR is still biggerthan 85 and the FAR smaller than 15under different noise

so it is obvious that our algorithm has a good robustness onresisting noise

423 Analysis of Energy Saving and Real-Time PerformanceCompared to other outlier algorithms another advantage isour algorithm can prolong the network life because the PAAalgorithm reduces the amount of sent data For example forthe database of stdb 308 the number of sent data will reduceto 1800 from 3600 when 119870 is 4 so the network can save halfof the energy and the network can save more energy when119870is more

Some outlier detection methods which combine withdata fusion algorithm reconstruct the original data beforedetecting However the reconstruct process of a data fusionalgorithm usually need long time so it will lead to poorreal-time performance of algorithmOur detection algorithmdirectly deals with the compressed data so it can have a betterreal-time performance

424 The Comparison of Our Algorithm with Other OutlierDetection Algorithms in WSN In order to show the effec-tiveness of our algorithm more clearly our algorithm willbe compared with these algorithms listed in Table 1 Thedatabases stdb 308 is selected to do the relevant experimentsFor every algorithm we will choose the optimal parameterssettings according to every initial paper In addition we willtake the average value of 100 times experiments as the finalexperiments results to ensure the reliability of the experi-ments The comparison result is shown in Table 6

As shown inTable 6 theDR and FARof our algorithm areboth the best among above algorithms Because we combinethe AIS with 119870-Means to optimize the classification resultsit can detect outliers effectively Also our algorithm can saveenergy consumption best compared with other algorithmsbecause the function of PAA ensures that the lifetime ofnetwork can be prolonged effectively Because there is no theprocess of data compression in the algorithm proposed byZhang et al [4] in which the time consumption is less than inour algorithmHowever because the PAA119870-Means andAISare lightweight methods in our paper the time consumption

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

6 International Journal of Distributed Sensor Networks

Input Original data from every sensorOutput abnormal sensor and outlier dataFor original data from every sensor(1) Compress data with improved PAA algorithm in each node and get compressed data (119888

119894119895 Var119894119895)

(2) All nodes send the compressed data of the same time to the base station(3) Initialize the antibody group 119879 and set the initial number of iteration as 0(4) While (The number of iteration has not been reached the specified value)(5) For each antibody in 119879(6) Classify all the compressed data with improved119870-Means algorithm and calculate the affinity 119869(7) End the classification(8) Sort the all affinity by size and choose these antibody at the top of the reorder sequence

as patient antibody group Then clone these antibodies based on the size of affinities(9) Determine whether or not the minimum 119869meets the end condition(10) If (it does)(11) Jump out of the loop of optimization and end the classification(12) Else(13) Process the patient antibody group with the operation of clone crossover and

mutation to format a new diversity generation of antibodies group(14) The number of iteration plus one(15) End the processing of optimization and select the antibody in first line as the best initial centroids of clusters(16) Classify the compressed data with the selected antibody and pick out the abnormal compression data(17) Judge the abnormal type based on the location of the abnormal data

Pseudocode 1 Pseudocode of our algorithm

42 Experimental Results

421 Decision of Related Parameters As shown in the pro-cess of our algorithm in the chapter 3 the compression ratio119870 and the number 1198961015840 of clusters are two key parameters whichhave a great impact on the classification result Thereforerelevant experiments have been completed to determine theappropriate scope of119870 and 1198961015840 whichmeans that the followingdetection results are robust and accurate In order to showthe results clearly and concisely we will only present theresults on database of stdb 308 because the results on otherdatabases are similar The database of stdb 308 is shown inFigure 2 and the data whose absolute value is more than 05is abnormalThe results of DR and FARunder different119870 and1198961015840 are separately shown in Tables 3 and 4 In order to reduce

the error we take the average value of 100 times experimentsunder the same parameters as our finial result

According to the described algorithm in last chapter thesmaller 119870 is the less the data information after compressingloses and the more accurate the detection is On the otherhand the larger 119870 is the better the effect of compression isand the more the energy is saved For the parameter 1198961015840 theless it is the less the calculation amount is but the worsethe effect of classification is and the performance is oppositewhen 1198701015840 increases

As shown in Tables 3 and 4 the results are in accordancewith theoretical analysis However in order to balance theenergy saving and effective outlier detection while consider-ing the calculation amount we determine that 119870 is 4 and 1198961015840is 3

422 Results of Outlier Detection With 119870 and 1198961015840 selectedin the previous subsection relevant experiments were

Table 3 DR () under different 119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 9612 9645 9705 9712 97153 9677 9712 9723 9730 97284 9765 9715 9720 9732 97355 9750 9789 9791 9786 97856 9762 9750 9803 9811 98107 9759 9785 9800 9808 98128 9765 9779 9805 9812 9814

Table 4 FAR () under different119870 and 1198961015840

1198961015840 119870

6 5 4 3 22 512 445 387 312 2983 511 432 358 309 2784 485 421 356 298 2775 490 387 334 301 2656 485 375 298 265 2567 483 368 255 234 2458 475 357 214 198 153

conducted to compare the performance of our algorithm(OA) and the algorithmof119870-Meanswithout AIS (KWA) andthe results are shown in Table 5

As shown in the Table 5 the performance of our algo-rithm with every database is obviously better DR is higher

International Journal of Distributed Sensor Networks 7

0 500 1000 1500 2000 2500 3000 3500

0

05

1

15

minus15

minus1

minus05

Figure 2 The database of stdb 308

Table 5 The experimental results

Name of dataset DRwith OA

DRwith KWA

FARwith OA

FARwith KWA

Ma Data 9750 8112 334 823Keogh Data 9810 8045 315 778Chfdb chf01 9886 8134 322 712Chfdb chf13 9713 7865 197 802Stdb 308 9723 8067 358 784Synthetic control 9834 7812 278 845

and FAR is lower than the algorithmwithout AISThe reasonis that our algorithm adopts the AIS algorithm to ensure thatthe result is global optimization on the contrary the otheralgorithm is depended on the initial cluster centers so thatthe result of every experiment is local optimum and unstableIn addition although the original data is compressed beforeoutlier detection the DR of our algorithm in every line ishigher than 95 while the FAR is lower than 5 so ouralgorithm is an ideal and effective outlier detection systeminWSN which can ensure high detection rate while the falsealarm rate is low

However there is more or less environmental noise in theactual application situation whichmakes the original changeso we artificially add Gaussian white noise in the original sig-nals and conduct the experiments again to test the robustnessof our algorithmWe take the Ma Data (Ma) Stdb 308 (Std)and Synthetic control (Syn) databases as examples to carryout the experiments and results are shown in Figures 3 and 4Figure 3 shows the DA of these databases with different SNRand Figure 4 shows the FRA with different SNR

As shown in Figures 3 and 4 we test the robustness of ouralgorithm under several Gaussian white noise with differentsignal to noise ratio (SNR) In these two pictures DR has aslight downward trend and FAR is on the slight rise with thereducing of the SNR because smaller SNR will have a greaterinfluence on original data However the DR is still biggerthan 85 and the FAR smaller than 15under different noise

so it is obvious that our algorithm has a good robustness onresisting noise

423 Analysis of Energy Saving and Real-Time PerformanceCompared to other outlier algorithms another advantage isour algorithm can prolong the network life because the PAAalgorithm reduces the amount of sent data For example forthe database of stdb 308 the number of sent data will reduceto 1800 from 3600 when 119870 is 4 so the network can save halfof the energy and the network can save more energy when119870is more

Some outlier detection methods which combine withdata fusion algorithm reconstruct the original data beforedetecting However the reconstruct process of a data fusionalgorithm usually need long time so it will lead to poorreal-time performance of algorithmOur detection algorithmdirectly deals with the compressed data so it can have a betterreal-time performance

424 The Comparison of Our Algorithm with Other OutlierDetection Algorithms in WSN In order to show the effec-tiveness of our algorithm more clearly our algorithm willbe compared with these algorithms listed in Table 1 Thedatabases stdb 308 is selected to do the relevant experimentsFor every algorithm we will choose the optimal parameterssettings according to every initial paper In addition we willtake the average value of 100 times experiments as the finalexperiments results to ensure the reliability of the experi-ments The comparison result is shown in Table 6

As shown inTable 6 theDR and FARof our algorithm areboth the best among above algorithms Because we combinethe AIS with 119870-Means to optimize the classification resultsit can detect outliers effectively Also our algorithm can saveenergy consumption best compared with other algorithmsbecause the function of PAA ensures that the lifetime ofnetwork can be prolonged effectively Because there is no theprocess of data compression in the algorithm proposed byZhang et al [4] in which the time consumption is less than inour algorithmHowever because the PAA119870-Means andAISare lightweight methods in our paper the time consumption

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

International Journal of Distributed Sensor Networks 7

0 500 1000 1500 2000 2500 3000 3500

0

05

1

15

minus15

minus1

minus05

Figure 2 The database of stdb 308

Table 5 The experimental results

Name of dataset DRwith OA

DRwith KWA

FARwith OA

FARwith KWA

Ma Data 9750 8112 334 823Keogh Data 9810 8045 315 778Chfdb chf01 9886 8134 322 712Chfdb chf13 9713 7865 197 802Stdb 308 9723 8067 358 784Synthetic control 9834 7812 278 845

and FAR is lower than the algorithmwithout AISThe reasonis that our algorithm adopts the AIS algorithm to ensure thatthe result is global optimization on the contrary the otheralgorithm is depended on the initial cluster centers so thatthe result of every experiment is local optimum and unstableIn addition although the original data is compressed beforeoutlier detection the DR of our algorithm in every line ishigher than 95 while the FAR is lower than 5 so ouralgorithm is an ideal and effective outlier detection systeminWSN which can ensure high detection rate while the falsealarm rate is low

However there is more or less environmental noise in theactual application situation whichmakes the original changeso we artificially add Gaussian white noise in the original sig-nals and conduct the experiments again to test the robustnessof our algorithmWe take the Ma Data (Ma) Stdb 308 (Std)and Synthetic control (Syn) databases as examples to carryout the experiments and results are shown in Figures 3 and 4Figure 3 shows the DA of these databases with different SNRand Figure 4 shows the FRA with different SNR

As shown in Figures 3 and 4 we test the robustness of ouralgorithm under several Gaussian white noise with differentsignal to noise ratio (SNR) In these two pictures DR has aslight downward trend and FAR is on the slight rise with thereducing of the SNR because smaller SNR will have a greaterinfluence on original data However the DR is still biggerthan 85 and the FAR smaller than 15under different noise

so it is obvious that our algorithm has a good robustness onresisting noise

423 Analysis of Energy Saving and Real-Time PerformanceCompared to other outlier algorithms another advantage isour algorithm can prolong the network life because the PAAalgorithm reduces the amount of sent data For example forthe database of stdb 308 the number of sent data will reduceto 1800 from 3600 when 119870 is 4 so the network can save halfof the energy and the network can save more energy when119870is more

Some outlier detection methods which combine withdata fusion algorithm reconstruct the original data beforedetecting However the reconstruct process of a data fusionalgorithm usually need long time so it will lead to poorreal-time performance of algorithmOur detection algorithmdirectly deals with the compressed data so it can have a betterreal-time performance

424 The Comparison of Our Algorithm with Other OutlierDetection Algorithms in WSN In order to show the effec-tiveness of our algorithm more clearly our algorithm willbe compared with these algorithms listed in Table 1 Thedatabases stdb 308 is selected to do the relevant experimentsFor every algorithm we will choose the optimal parameterssettings according to every initial paper In addition we willtake the average value of 100 times experiments as the finalexperiments results to ensure the reliability of the experi-ments The comparison result is shown in Table 6

As shown inTable 6 theDR and FARof our algorithm areboth the best among above algorithms Because we combinethe AIS with 119870-Means to optimize the classification resultsit can detect outliers effectively Also our algorithm can saveenergy consumption best compared with other algorithmsbecause the function of PAA ensures that the lifetime ofnetwork can be prolonged effectively Because there is no theprocess of data compression in the algorithm proposed byZhang et al [4] in which the time consumption is less than inour algorithmHowever because the PAA119870-Means andAISare lightweight methods in our paper the time consumption

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

8 International Journal of Distributed Sensor Networks

10 15 20 25 30 35 4080

82

84

86

88

90

92

94

96

98

100

SNR (dB)

DR

()

MaStd

Syn

Figure 3 DA with different SNR

10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

18

20

SNR (dB)

FAR

()

MaStd

Syn

Figure 4 FAR with different SNR

of our algorithm is less than the time consumption of theother two algorithms

5 Conclusion and Future Work

In this paper we propose an outlier detection algorithm fordetecting abnormal compressed data in WSN In the firstphase we utilize the PAA algorithm to compress the timeseries data in each node so that the communication overloadcan be reduced and the life of battery is prolonged Based onthe result of PAA we then combine the improved unsuper-vised detection algorithm of 119870-Means and the AIS to effect-ively classify the normal and abnormalThemajor advantagesof this algorithm are that it is based on the compressed data so

Table 6 The comparison of several algorithms

DR FAR Timeconsumption

Saved energyconsumption

Our algorithm 9723 358 115 s 50Salem et al [5] 9345 615 117 s 0Bhargava andRaghuvanshi [7] 9217 728 128 s 35

Zhang et al [4] 9519 562 106 0

that the energy consumption is reduced and our algorithmcan achieve a high detection rate while the false alarmrate is low Relevant experiments on virtual and real data

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

International Journal of Distributed Sensor Networks 9

demonstrate the effectiveness of our algorithm in detectingthe outlier and resisting noise

Apotential limitation of this approach is that the time costincreased obviously when the data volume is huge becauseevery data need be reclassified with the new centroids ofthe clusters until the classification ends and the search timeof optimal initial centroids of the clusters is long Anotherlimitation is that in this paper our detection algorithm isbased on univariate data while the data is more complexin some application of WSN Therefore in the future ourwork is to improve our algorithm so that it has a better real-time performance and can be more suitable for detection ofmultivariate data

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grant no 61271274) key projectof Natural Science Foundation of Hubei Province ofChina (2011CDA069) general project of Natural ScienceFoundation of Hubei Province of China (2010CDB042032011CDB339) and Key Science and Technology of HubeiProvince of China (2012BAA02003 2011BAB042) Theauthors also gratefully acknowledge the helpful commentsand suggestions of the reviewers which have improved thepresentation

References

[1] D Bri M Garcia J Lloret and P Dini ldquoReal deployments ofwireless sensor networksrdquo inProceedings of the 3rd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo09) pp 415ndash423 June 2009

[2] C F Garcia-Hernandez P H Ibarguengoytia J Garcia-Her-nandez and J A Perez-Diaz ldquoWireless sensor networks andapplications a surveyrdquo International Journal of Computer Sci-ence and Network Security vol 7 pp 264ndash273 2007

[3] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys ampTutorials vol 12 no 2 pp 159ndash170 2010

[4] Y Zhang NMeratnia and P JMHavinga ldquoDistributed onlineoutlier detection in wireless sensor networks using ellipsoidalsupport vector machinerdquo Ad Hoc Networks vol 11 no 3 pp1062ndash1074 2013

[5] O Salem Y Liu and A Mehaoua ldquoAnomaly detection inmedical wireless sensor networksrdquo Journal of Computing Scienceand Engineering vol 7 no 4 pp 272ndash284 2013

[6] B Sun X Shan K Wu and Y Xiao ldquoAnomaly detection basedsecure in-network aggregation for wireless sensor networksrdquoIEEE Systems Journal vol 7 no 1 pp 13ndash25 2013

[7] A Bhargava and A S Raghuvanshi ldquoAnomaly detection inwireless sensor networks using S-transform in combinationwith SVMrdquo in Proceedings of the 5th International Conferenceon Computational Intelligence and Communication Networks(CICN rsquo13) pp 111ndash116 IEEE Mathura India September 2013

[8] M Moshtaghi C Leckie S Karunasekera and S RajasegararldquoAn adaptive elliptical anomaly detection model for wirelesssensor networksrdquo Computer Networks vol 64 pp 195ndash2072014

[9] Q-L Zhong and Z-X Cai ldquoSymbolic algorithm for time seriesdata based on statistic featurerdquo Chinese Journal of Computersvol 31 no 10 pp 1857ndash1864 2008

[10] M M Breunig H P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of theACM SIGMOD International Conference on Management ofData (SIGMOD rsquo00) pp 93ndash104 June 2000

[11] httpphysionetorgphysiobankdatabase[12] httparchiveicsuciedumldatasetshtml[13] V Chandola A Banerjee and V Kumar ldquoAnomaly detection a

surveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009[14] M S Sisodia and V Raghuwanshi ldquoAnomaly base network

intrusion detection by using random decision tree and randomprojection a fast network intrusion detection techniquerdquo Net-work Protocols and Algorithms vol 3 no 4 pp 93ndash107 2011

[15] V S Samparthi and H K Verma ldquoOutlier detection of datain wireless sensor networks using kernel density estimationrdquoInternational Journal of Computer Applications vol 5 no 6 pp28ndash32 2010

[16] P K Sahoo ldquoEfficient security mechanisms for mHealth appli-cations using wireless body sensor networksrdquo Sensors vol 12no 9 pp 12606ndash12633 2012

[17] D MacIi A Colombo P Pivato and D Fontanelli ldquoA datafusion technique for wireless ranging performance improve-mentrdquo IEEE Transactions on Instrumentation andMeasurementvol 62 no 1 pp 27ndash37 2013

[18] R Tan G Xing B Liu J Wang and X Jia ldquoExploiting datafusion to improve the coverage of wireless sensor networksrdquoIEEEACM Transactions on Networking vol 20 no 2 pp 450ndash462 2012

[19] C-T Cheng H Leung and P Maupin ldquoA delay-aware networkstructure for wireless sensor networks with in-network datafusionrdquo IEEE Sensors Journal vol 13 no 5 pp 1622ndash1631 2013

[20] J Lin E Keogh S Lonardi and B Chiu ldquoA symbolic rep-resentation of time series with implications for streamingalgorithmsrdquo in Proceedings of the 8th ACM SIGMODWorkshopon Research Issues in Data Mining and Knowledge Discovery(DMKD rsquo03) pp 2ndash11 June 2003

[21] X Cheng J Xu J Pei and J Liu ldquoHierarchical distributed dataclassification in wireless sensor networksrdquo Computer Commu-nications vol 33 no 12 pp 1404ndash1413 2010

[22] Y Li Y Wang and G He ldquoClustering-based distributed sup-port vector machine in wireless sensor networksrdquo Journal ofInformation amp Computational Science vol 9 no 4 pp 1083ndash1096 2012

[23] S Siripanadorn W Hattagam and N Teaumroog ldquoAnomalydetection inwireless sensor networks using self-organizingmapand waveletsrdquo International Journal of Communication vol 4pp 74ndash83 2010

[24] Y Yasser K Siavash and J Arash ldquoAn unsupervised networkanomaly detection approach by K-meansrdquo in Proceedings ofthe IEEE Symposium on Computers and Communications (ISCCrsquo08) pp 398ndash403 2008

[25] H Li ldquoResearch of K-MEANS algorithm based on informationentropy in anomaly detectionrdquo in Proceedings of the 4th Interna-tional Conference on Multimedia and Security (MINES rsquo12) pp71ndash74 November 2012

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

10 International Journal of Distributed Sensor Networks

[26] S R GaddamVV Phoha andK S Balagani ldquoK-Means+ID3 anovelmethod for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learningmethodsrdquo IEEETransactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[27] M Y El-Sharkh ldquoClonal selection algorithm for power genera-torsmaintenance schedulingrdquo International Journal of ElectricalPower and Energy Systems vol 57 pp 73ndash78 2014

[28] Y Li and Z X Sun ldquoGenerative tracking of 3D human motionin latent space by sequential clonal selection algorithmrdquoMulti-media Tools and Applications vol 69 no 1 pp 79ndash109 2014

[29] J Feng L C Jiao X Zhang and T Sun ldquoHyperspectral bandselection based on trivariate mutual information and clonalselectionrdquo IEEETransactions onGeoscience and Remote Sensingvol 52 no 7 pp 4092ndash4115 2014

[30] L N de Castro and F J Von Zuben ldquoLearning and optimizationusing the clonal selection principlerdquo IEEE Transactions on Evo-lutionary Computation vol 6 no 3 pp 239ndash251 2002

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Research Article An Anomaly Detection Based on …downloads.hindawi.com/journals/ijdsn/2015/943532.pdfan anomaly detection algorithm based on -transform and one-SVM (support vector

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of