60
Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD Master of Science Thesis Stockholm, Sweden 2012

Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Embed Size (px)

Citation preview

Page 1: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Lowering False Alarm rates in Motion Detection Scenarios

using Machine Learning

T I M L E N N E R Y D

Master of Science Thesis Stockholm, Sweden 2012

Page 2: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Lowering False Alarm rates in Motion Detection Scenarios

using Machine Learning

T I M L E N N E R Y D

2D1021, Master’s Thesis in Computer Science (30 ECTS credits) Degree Progr. in Computer Science and Engineering 270 credits Royal Institute of Technology year 2012 Supervisor at CSC was Hedvig Kjellström Examiner was Danica Kragic TRITA-CSC-E 2012:024 ISRN-KTH/CSC/E--12/024--SE ISSN-1653-5715 Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.kth.se/csc

Page 3: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Abstract

Camera motion detection is a form of intruder detection that may causehigh false alarm rates, especially in home environments where move-ments from example pets and windows may be the cause. This articleexplores the subject of reducing the frequency of such false alarms byapplying machine learning techniques, for the specific scenario whereonly data regarding the motion detected is available, instead of the fullimage. This article introduces two competitive unsupervised learningalgorithms, the first a vector quantization algorithm for filtering falsealarms from window sources, the second a self-organizing map for fil-tering out smaller events such as pets by way of scaling based on thedistance to the camera.

Initial results show that the two algorithms can provide the func-tionality needed, but that the algorithms need to be more robust to beused well in an unsupervised live situation. The majority of the resultshave been obtained using simulated data rather than live data due toissues with obtaining such live data at the time of the project, with livedata tests to be done as future work.

Page 4: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

ReferatReducering av falsklarm i rörelsedetektering genom

användande av maskininlärning

Rörelsedetektering med kamera är en form av inbrottslarm som kange upphov till en hög frekvens av falsklarm, speciellt i hemmiljöer dåhusdjur och fönster kan vara bidragande orsaker. Denna artikel utforskarmöjligheten till reducering av falsklarmsfrekvensen genom användningav maskininlärningstekniker. Den specifika situationen som undersöksär den där endast data om den detekterade rörelsen används, iställetför hela bilden. Denna artikel introducerar två algoritmer baserade påkompetitiv inlärning utan tillsyn. Den första algoritmen är en vektor-kvantiseringsalgoritm för filtrering av falsklarm från fösterkällor och denandra är en self-organizing map för filtrering av händelser baserat påhändelsernas storlek där storleken skalas beroende på distansen frånkameran.

Inledande resultat visar att algoritmerna kan tillhandahålla denfunktionalitet som önskas, men att algoritmerna behöver vara mer ro-busta för att kunna användas väl utan tillsyn i verkliga situationer.Majoriteten av resultaten har erhållits från simulerad data snarare änreell data eftersom det har varit svårigheter att få fram reell data underprojektets gång. Därför ligger tester med reell data som en viktig punkti framtida arbete med projektet.

Page 5: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Contents

1 Introduction 11.1 The Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Theory 92.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Deriving the Distance between the Pet and Camera . . . . . 92.1.2 Deriving the Diagonal Length of the pet’s Bounding Box as

a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Competitive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Method and Implementation 153.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Visualizing the Results . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Keeping track of windows using Vector Quantization . . . . . . . . 193.4 A Self-Organizing Map as a Height map for pet size Thresholds . . . 24

4 Results and Conclusions 274.1 Window Adjustment Filter . . . . . . . . . . . . . . . . . . . . . . . 274.2 Pet Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Future Work 39

Bibliography 41

Appendices 44

A Other Considered Methods 45A.1 One Class and Two Class Support Vector Machines . . . . . . . . . . 45A.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

A.2.1 Computational Complexity . . . . . . . . . . . . . . . . . . . 48

Page 6: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

A.2.2 Advantages and Disadvantages . . . . . . . . . . . . . . . . . 48A.3 Nearest Neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

A.3.1 Computational Complexity . . . . . . . . . . . . . . . . . . . 50A.3.2 Advantages and Disadvantages . . . . . . . . . . . . . . . . . 50

A.4 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.4.1 Supervised and Semi-supervised Neural Networks . . . . . . . 51A.4.2 Computational Complexity . . . . . . . . . . . . . . . . . . . 52A.4.3 Advantages and Disadvantages . . . . . . . . . . . . . . . . . 52

Page 7: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Chapter 1

Introduction

1.1 The Scenario

A company is providing intrusion detection alarms for houses and apartments.These alarms are motion based, with cameras taking pictures and using motiondetection algorithms to provide a bounding box around the detected motion. Thisbox together with some supporting information is then sent to the algorithms devel-oped and presented in this article. The constraint below, decided on by the authorand the company in cooperation, limits the focus of the algorithms to be developed.

By not using the full picture, the algorithms has to make do with less informationthan if the picture was available, something that influences both choice of algorithmsand results. There are a number of reasons as to why this decision was made,but the most important of those were to get a lower dimensional feature space,reduce privacy concerns about machine analysis of private pictures and to reducethe computational complexity.

Constraint 1 The algorithms may not assume that they have access to any of thepictures taken by the camera, the only data available will be the surroundingdata and the bounding box information.

Currently the company only uses very simplistic filtering within the routers con-nected to the cameras to avoid the most obvious false alarms. This filter consistsof a few tests, as can be seen below:

• Disregard movement if the object has very inconsistent movements, when thevelocity of the detected movement changes very rapidly between images.

• Disregard movement if the size of the object changes rapidly and inconsistentlybetween images.

• Disregard movement if the velocity or size of the object is far too small to beanything but a false alarm.

1

Page 8: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 1. INTRODUCTION

They hope with this project to include more intelligent detection of false alarmsby applying learning algorithms on the different situations present in the cameraenvironments. They currently do not use nor log the bounding box motion dataon the server, but by sending the motion data on to the server together with thepictures taken they will have data for the learning algorithms to work on.

Since this is a situation where mis-classifying a break-in attempt as a false alarmis very damaging, the system generated will need to minimize the number of suchmis-classifications while still lowering the amounts of false alarms from the mainfactors. More formally, the task is to minimize the type I error (false positive),that is, minimize the number of false alarms, while keeping the type II errors (falsenegatives) on a reasonable level. Since the current filters used by the companyare so basic, this condition holds for those filters. There is very little risk of mis-classifying a break-in attempt as a false alarm with the above filters, since there isvery little risk of humans appearing small enough to be disregarded, or by havinginconsistent movements between camera pictures. The current filters however, onlymanage to catch very specific types of false alarms, such as quick light effects orcamera distortions and the like. By applying more filters with more computingpower the hope is that this can be vastly improved upon.

The company has noted that there are two separate sources providing highnumbers of false alarms, with a third possibly providing a lower but still significantamount. The first, windows within the vision of a camera, provide false alarmsdue to the fact that any movement outside of the window will register as detectedmovement, and there is currently no way for the camera to distinguish betweenthese movements and the ones that would be within the protected house, thus ithas to raise an alarm.

Pets make up the second source of false alarms, due to the fact that whateverdetected movement of the pet might very well cause a false alarm, depending on thevelocity and the size. As such, the company cannot yet sell their product to clientswith pets, since the false alarm rate will be too high.

The third, lesser source is that of movements from outside windows translatinginto movements inside the room due to shadows and light. Examples of this includesshadows from trees moving due to wind, cars driving by outside which bounces lightinto the room and other quick light phenomenas. The actual movement of the sun,clouds and such do not fit in here, due to the fact that such movements are too slowto get past the threshold mentioned above.

While the prototype algorithms realistically will not completely solve the aboveissues, the prototype should strive to minimize false alarms from these sources whilestill keeping the type II errors minimized.

Since some algorithms cope better with certain types of data and others havebig problems, it is useful to spend some time considering the nature of the datathe scenario provides. The system provides some data, such as time, what camera,start position, end position and velocity vector of the bounding box containing themovement. From this, it is possible to derive certain other values that can be ofimportance, such as size, by calculating the area of the bounding box using the

2

Page 9: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

1.2. ANOMALY DETECTION

start and the end positions. This size value is then important in the considerationof whether the value is a false alarm or not, since small boxes could indicate petswithin the house, or something else that is just too small to be a human.

All cameras will be trained individually, so there is no reason to add informationabout which camera sent the event to the anomaly detection algorithm, it will notgive the algorithm any more information to work with. However, the rest of theinformation can be of importance. The data is multivariate, since one data instanceholds a number of different values and both scalars and vectors.

Time Time of detection (Scalar)

Start position The top left corner of the bounding box (2D Vector)

End position The bottom right corner of the bounding box (2D Vector)

Size The size of the box, either as a diagonal or the area (Scalar)

Velocity The current velocity of the detected object (2D Vector)

The scalars are one-dimensional and the vectors above are on a two-dimensionalplane, that of the picture taken, so the total number of dimensions used by onedata instance is eight. This is regardless of how many dimensions the algorithmspresented in this article actually use, since there is always the option of completelyignoring some dimensions in the feature space.

1.2 Anomaly DetectionAnomaly detection, also called outlier detection, is a heavily researched subject withmany widely differing proposed algorithms for both general use and very specificsituations. There are also a couple of notable definitions quoted by Hodge andAustin, [21] that were first presented by Grubbs (1969). An extension was alsopresented by Barnett and Lewis (1994).

Grubbs: “An outlying observation, or outlier, is one that appears todeviate markedly from other members of the sample in which it occurs.”

Barnett and Lewis: “An observation (or subset of observations) whichappears to be inconsistent with the remainder of that set of data.”

By using a few simple assumptions seen below, the defined task in section 1.1 canbe defined as an anomaly detection problem.

1. The probability of a break-in is much smaller than that of a false alarm,regardless of the source of the false alarm.

2. Any false alarm due to movement seen outside of the window will by necessitybe confined to the edges of the window.

3

Page 10: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 1. INTRODUCTION

3. Movements by pets generally conform to certain patterns, for example petsgenerally keep to the floor or certain preferred furniture while at home alone.

If the class of normal everyday events that should be considered false alarms arebased on these assumptions, then movements differing in perceivable ways can befound with algorithms that detect anomalies. This article will refer both to anomalydetection and outlier detection, but within the context of this article we do not inany way differentiate between the definition or the function of the two expressions.

To help with separating the different algorithm classes, Hodge and Austin [21]define three approaches, defined below, depending on what is to be modeled andwhat knowledge is available.

Type 1

“Determine the outliers with no prior knowledge of the data. Thisis essentially a learning approach analogous to unsupervised clus-tering. The approach processes the data as a static distribution,pinpoints the most remote points and flags them as potential out-liers.”

It is noted also that this approach requires that all data is available before pro-cessing. As part of a type 1 approach, two different techniques diagnosis and ac-commodation are commonly employed. Diagnosis detects the outlying points in thedata and may remove them from future iterations, gradually pruning the data andfitting the model until no outliers are found. Accommodation incorporates outliersand employs a robust classification method that can withstand such isolated outliers[21].

Type 2

“Model both normality and abnormality. This approach is ana-logous to supervised classification and requires pre-labeled data,tagged as normal or abnormal.”

Hodge and Austin [21] continues by referring to this type of approach as a nor-mal/abnormal classification, either using one normal class or several depending onwhat is needed. They also note that these classifiers are best suited to static dataunless an incremental classifier such as for example an evolutionary neural networkis used, since the classification needs to be rebuilt if the distribution shifts.

Type 3

“Model only normality or in few cases model abnormality. Authorsgenerally name this technique novelty detection or novelty recogni-tion. It is analogous to a semi-supervised recognition or detectiontask and can be considered semi-supervised as the normal class istaught but the algorithm learns to recognize abnormality. The app-roach needs pre-classified data but only learns data marked normal.”

4

Page 11: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

1.2. ANOMALY DETECTION

Figure 1.1: Example of Point Anomalies. Points noted by circles have been classifiedas normal, while points noted by a square are classified as point anomalies.

While type 3 approaches may seem similar to type 2 approaches, the differencelies in the fact that by only labeling the normal class, one can avoid the cornercases where it is uncertain whether a data instance belongs to the normal class ornot. Instead of a normal/abnormal separation, type 3 approaches may present aseparation between the normal class and those data instances the approach cannotreliably classify as normal.

While the above definition of approaches is very useful, there is also a need todefine different types of anomalies. Chandola et al. [13] defines and refers to threedifferent categories of anomalies, which we will describe briefly below:

Point Anomalies

“If an individual data instance can be considered as anomalous withrespect to the rest of the data, then that instance is termed a pointanomaly.”

Point anomalies are the simplest types of anomalies and the focus of much anomalydetection research. This is also the type of anomaly detection we will be focusingon in this article, with the scenario and the prototype. Figure 1.1 shows a simplepoint anomaly example where the squares are classified as anomalous since whenlooking at the whole dataset, they are the few points that are markedly different inposition from the rest.

Contextual Anomalies

“If a data instance is anomalous in a specific context, but nototherwise, then it is termed a contextual anomaly (or conditionalanomaly).”

5

Page 12: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 1. INTRODUCTION

Figure 1.2: Example of Collection Anomalies. Points noted by circles have beenclassified as normal, while points noted by squares are classified as anomalies. If asingle point had been at the position of the squares the point would not have beendeemed a collection anomaly, but since there are several of them on a line, this isdeemed anomalous.

If time is a contextual attribute (an attribute orienting the data instance withinthe dataset) in a dataset, then an event or occurrence at an unusual time mightbe a contextual anomaly, if the occurrence would be normal at other times. Forexample, if the safe of a bank is opened in the middle of the night when the bankis closed, as opposed to very specific times during the day when the bank is openand procedures are followed.

Collective Anomalies

“If a collection of related data instances is anomalous with respectto the entire data set, it is termed a collective anomaly. The individ-ual data instances in a collection anomaly may not be anomalies bythemselves, but their occurrence together as a collection is anoma-lous.”

The easiest way to show collective anomalies is with an example. Figure 1.2shows how a single point in the center would not be classified as anomalous, butwith the concentrated distribution of points in the center differing from the rest ofthe dataset, the classifier classifies it as anomalous.

In this article, the focus lies almost exclusively on point anomalies, contextualand collective anomalies will be completely disregarded when choosing algorithms.The reason for this is that the point anomaly definitions fit well with what thescenario is looking to achieve. While collective or contextual anomaly detectioncould very well be used to find and distinguish between false alarms and real alarms,the algorithms easily become more complex without necessarily finding the simplerpoint anomalies.

6

Page 13: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

1.3. CLASSIFICATION

1.3 ClassificationThe process of classification uses a model (classifier) that takes labeled data in-stances as training data, and adjusts the model to correctly classify as many of thetraining instances as possible into one of the available data classes [13]. After theseadjustments have been made, similar data to that used for training is used to testhow well the system generalizes on data the system has not seen. Anomaly de-tection by classification operates similarly, by first training the model using one orseveral normal classes and then testing the system by asking it whether particulardata instances can be classified as one of the normal classes, or are anomalous.

Chandola et al. [13] presents an assumption that anomaly detection algorithmsbased on classification operate under:

“Assumption: A classifier that can distinguish between the normal andanomalous classes can be learned in the given feature space.”

Within multi-class anomaly detection it is assumed that only if the data instancecannot be reliably placed in one of the available normal classes will it be definedas anomalous. In one-class anomaly detection a boundary around the normal classis formed within the given feature space, and any data instance that does appearwithin that boundary is classified as an anomaly. It is essentially the same in themulti-class case, except that the data instance is deemed anomalous only if it doesnot appear within the boundary for any of the classes.

There exists a reduction from the outlier detection problem to that of classifi-cation [1], which allows the use of active learning techniques with outlier detectionproblems. While a formal reduction is in many cases not needed to apply traditionalmachine learning techniques as well as those detailed later in this article, it is inany case useful to note the existence.

1.4 Related WorkResearch done on subjects of anomaly detection can be separated into a numberof sections, based on their focus. There are several surveys, articles and booksthat discuss a number of different techniques with widely differing basics. Thesebroader reviews consider a large number of algorithms together with a number ofdomains. Chandola et al. touches on Classification, Clustering, Nearest Neighbor,Statistical, Information Theoretic and Spectral algorithms, and considers them foreach of the domains Cyber-Intrusion, Fraud Detection, Medical Anomaly Detection,Image Processing, Textual Anomaly Detection and Sensor Networks [13].

Hodge and Austin is a survey similar to the above, but with a slightly slimmerscope, in that it does not go through the various application domains for anomalydetection techniques, but focus instead on the techniques themselves and variousvariants [21]. Hodge and Austin defines three fundamental approach types to theproblem of outlier detection, based on what knowledge is available as well as whatis being modeled.

7

Page 14: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 1. INTRODUCTION

Markou and Singh has published a very extensive review of statistical approachesthat introduces a number of principles useful in novelty detection and related prob-lems. Among the considered statistical approaches are Hidden Markov Models(HMM), k-Nearest Neighbor (kNN) and k-means clustering [32]. Markou and Singhhave also reviewed Neural Networks extensively, where they discuss Multi-LayerPerceptrons, Support Vector Machines, Auto-Associator, Hopfield Networks andRadial Basis Function approaches among others to give a good outline of availablealgorithms within the neural network class of algorithms [33].

Naturally, there are also a number of articles focusing on the individual tech-niques mentioned by the broader surveys, many of those used as sources by thesurveys. Stefano et al. considers the use of an added reject option to a one-classneural classifier, with the reject option depending on a reliability evaluator depend-ing on the classifiers architecture [36]. This reject option allows the system to rejectthe sample rather than classifying it with low reliability (essentially refusing tochoose rather than chancing it). Abe et al. has reviewed the idea of reducing theproblem of outlier detection to a classification problem, that can then be solvedusing active learning techniques [1].

Gwadera et al. considers the use of machine learning together with sliding win-dows to detect any suspicious sequences of events in an event stream, where theyset up dynamic thresholds for the number of suspicious events that are allowed be-fore an alarm is raised [17]. Ma and Perkins also considers temporal sequences, asthey present an on-line novelty detection framework for temporal sequences usingSupport Vector machines [29] [30]. Also on the subject of SVMs, Mika et al. dis-cuss how to use SVMs to create a boosting algorithm, and showing by equivalentmathematical programs that such can be done [34].

Kohonen has presented a very extensive book focusing on Self-Organizing Mapsthat details the variants of the algorithm and mathematical considerations amongother things, that has been well received and referred to by all the wider surveysconsidering self-organizing maps [28].

Ando has presented an information theoretic analysis on the subject of minorityand outlier detection [5]. This analysis is abstract for the most part, and focuseson clustering. They also present an algorithm that is also evaluated in the analysis.Aggarwal and Yu discuss challenges specific to high dimensional data, such as dis-tance measures not being meaningful, and presents some solutions to the problemspresented [3].

There are a number of articles dealing in the domain of Wireless Sensor Networks(WSNs). Much of the research done by these articles discuss the specific challengesof the WSNs. Branch et al. [9] and Janakiram et al. [24] for example discusseslimited battery power, computational power and high error probability and howsuch things influence the choice of algorithms.

8

Page 15: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Chapter 2

Theory

2.1 PreliminariesThe preliminary theory consists of deriving a constant that we may call d0 and aformula that can be used to scale the diagonal of a bounding box depending on wherein the image the box occurs. These derivations depends on the assumption thatthere exists a horizontal base-plane in the image that acts as a floor. By assumingthis base-plane exists all movements detected will follow this floor plane and thesize-changes in the bounding box and the diagonal will therefore be predictable.

Assumption 1 There exists a ground-plane within the image defined as the floor,on which any movement will occur.

2.1.1 Deriving the Distance between the Pet and CameraFigure 2.1 shows in detail our efforts to derive the distance d between the cameraand the pet, or in other words, to find the ratio h/d to use as a scaling factor. Thebelow values are assumed to have been provided, either by the camera or by theuser through some interface outside of the scope of this article.

α camera tilt angle

β camera field of view

h camera height from the floor, in cm

l pet length in cm

p picture height in pixels (vertical resolution)

Pd height from bottom to detected movement box in pixels

9

Page 16: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 2. THEORY

Figure 2.1: The defined angles used in the derivation of a bounding box scalingfactor, with the variables defined in section 2.1.1. The camera image plane can beseen, as well as how the pet is projected onto the plane.

To find the distance d, the angle v can be used. But then the angle must first bederived. If Pd = 0, that is if the pet is detected at the very bottom of the picture,then the angle v is simply that of α + (β/2). But whenever Pd is more than zero,the angle v will need an appropriate value subtracted to account for the small slicethat should not be counted. This angle that should be subtracted will then beβ × (Pd/p). Pd/p is the ratio of where the β angle should be divided to give v thecorrect angle. This can be easily visualized if one considers Pd to be 1/2 of p. Thiswill mean that β should be split up in two pieces, exactly as is being done by thefocus line shown in the picture above and v would then be identical to α. By thisreasoning, the angle v will be:

v = α+ β

2 −(β × Pd

p

)= α+ β ×

(12 −

Pd2

)(2.1)

By using the definition of the sinus function, equation 2.2 can be defined as below.This is done by using equation 2.1 for v as the angle and the height h as the oppositeside in the triangle, leaving the distance d between the pet and the camera as thehypotenuse.

d = h

sin(v) = h

sin(α+ β(

12 −

Pdp

) (2.2)

10

Page 17: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

2.1. PRELIMINARIES

The ratio between height and distance (h/d) then becomes:

f = h

d= sin(α+ β

(12 −

Pdp

)(2.3)

This scaling factor f can then be used to scale the diagonals for the respectiveposition in the image, and we do not need to perform any further calculations here.

2.1.2 Deriving the Diagonal Length of the pet’s Bounding Box as aLimit

The given values will remain the same as in the previous section and figure 2.1 willagain be of interest. The height h given, together with a field of view length we callb, will allow for a formulation of the below equation 2.4 by way of figure 2.2. Thedefinition of the tangens function is used with h being the adjacent side, b/2 beingthe opposite side and β/2 as the given angle.

tan(β

2

)= b

2× h ⇒ b = 2× h× tan(β

2

)(2.4)

Under on the assumption that we are using a camera based on the pinhole principle[12], the ratio of the pet length and the field of view length b will remain the sameboth within the picture and outside in the real room space. This can be used to ouradvantage by defining Ph to be the length of the pet in pixels, giving us equation2.5:

Length of PetField of View Length = l

b= Ph

p⇒ Ph = l × p

b(2.5)

Since we have already defined a formula for b in 2.4, we can simply plug this formulain to get equation 2.6:

Figure 2.2: Represents the relation-ship between the height of the cam-era and the horizontal length that canbe seen with the field of view angle β.The camera is pointed straight down,placing the image plane parallel to theground.

11

Page 18: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 2. THEORY

Ph = l × p2× h× tan(β2 )

(2.6)

Assumption If the pet length is l, then the diagonal of the box can be approxi-mated as

√2 ∗ l. While this assumption is not a good one, it gives a starting

point. This approximation can later be modified if it is deemed to be toocrude.

With this approximation and by using the expression for Ph instead of l, the diagonalof the box in pixels within the picture can be expressed as:

d0 =√

2× l × p(2× h× tan

(β2

)) (2.7)

Given that the box is positioned directly under the camera, regardless of if thecamera can see the box or not. If the box could be seen, it would have a diagonalof d0 pixels. This gives a constant value d0, that can be used to scale the diagonalto any height in the picture, provided we know enough to use the derivation in theprevious section, finding the distance between the camera and the pet.

When the position Pd > 0, the diagonal will be scaled by multiplying expression2.7 with the ratio found in formula 2.3 in the previous section, giving equation 2.8:

dr = d0 × f = d0 × sin(α+ β

(12 −

Pdp

))(2.8)

Any calculated diagonals can then be compared with dr to see whether they aresmall enough to be considered pets and therefore be ignored, or if they should because for alarm. That is:

f(d) ={

1 if d ≥ dr−1 if d < dr

(2.9)

2.2 Competitive LearningThe competitive learning paradigm is something generally used with artificial neu-ral networks. It can be used for any of the approaches described in 1.2. In thisparadigm, nodes compete for the right to represent a particular input, and whichevernode is closest earns the right to learn from the input. The learning in this caseusually consists of moving the winning node slightly closer to the input in terms ofthe feature space.

In two or three dimensions this means simply that the winning node will bemoved closer to the x, y, z position of the input. In non-categorical data, theEuclidean Distance (equation 2.10) below is also often used to measure distance be-tween the input x and the node y and by then comparing the distances identifyinga winner. There are also a number of other distance measures used for different

12

Page 19: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

2.2. COMPETITIVE LEARNING

situations, such as the computationally expensive Mahalanobis distance (equationA.2) mentioned in section A.3 and the Manhattan distance measure. Manhattandistance is also often called taxicab geometry since it measures distance along carte-sian axes, just as a taxicab would measure distance between a point x and a pointy in a city.

distance =

√√√√ n∑i=1

(~xi − ~yi)2 (2.10)

Two of the most widely used algorithms, the Vector Quantization algorithm for neu-ral networks and the Self-Organizing Map operates unsupervised and can thereforebe classified as a type 1 approach. The Self-Organizing Map was first introduced byTeuovo Kohonen using the vector quantization algorithm with unsupervised learn-ing to produce a low-dimensional representation of the input space [28]. Hastieexplains the behavior of the Self-Organizing Map as follows [20]:

“Constrained version of K-means clustering, in which the prototypes areencouraged to lie in a one- or two-dimensional manifold in the featurespace”

Since Self-Organizing maps uses a neighborhood function, that is, allows the nodesclose to the winning nodes to learn a little from the input as well, the map created bythe SOM algorithm preserves the topology of the input data. If this neighborhoodfunction is not used, that is, if the winner takes it all strategy is used, then thesystem according to Hastie will be analogous to a k-means clustering system [20].

The look of the neighborhood function determines how the topology is preservedand which nodes get updates. In many cases, the neighborhood function will returna wide neighborhood to start with, to give the whole map the general shape. Bystarting with a wide neighborhood, the chance is small that a part of the map iscompletely void of updates and remain in its start state. Gradually as learninggoes on the function returns a smaller neighborhood, which translates to a morefinely tuned topology over a smaller section of the map at the time. This can beeasily visualized by considering a three-dimensional surface, where billowing hillsare results of a wider neighborhood being used and smaller sharper peaks are theresults of a smaller neighborhood.

The concept of a learning rate δ that is often used in machine learning controlsthe time the system takes to converge when it comes to learning. If using a highδ value such as for example δ ≥ 1.0 fluctuations and divergence may occur, butthe lower the δ value the slower the convergence. In essence, the δ value controlshow much the system may learn from a single training pattern. Depending on thecomplexity of the network, as well as the availability of data, a system might berun for a number of iterations (epochs) through the training set to get convergence.In the case of on-line classification, there might be no need to run through severaliterations since convergence might not be needed.

13

Page 20: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 2. THEORY

Kohonen [28] mentions some ways of speeding up the SOM calculations usingpointers to tentative winners, that will reduce the number of comparison operationsfrom quadratic when performing learning through exhaustive search, to linear whenusing the pointers. While that would speed up the SOM, it is still not as quick asthe Hopfield net due to the learning procedure involved in learning a SOM, as wellas querying the system [21].

14

Page 21: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Chapter 3

Method and Implementation

3.1 SimulationThere are several ways of obtaining data needed for a learning system, the mosteffective method from the system’s point of view is to use real-world data. In manycases this is infeasible however, due to the impossibility of collecting the amount ofdata needed as well as the cost of acquiring such data. Simulating data is a cheapersolution if real world data is not available, but the actual simulation requires somework if the generated data is to be accurate to any degree.

To provide data in any way useful for the scenario outlined in this article, thesimulation needs to be defined both by a number of parameters and also by a numberof rules and assumptions. Any of these assumptions used will separate the generateddata somewhat from the real world data, but these assumptions also lower the timeand complexity of programming the simulation, which is direly needed in this caseto allow the focus of the article to lie on the learning algorithm rather than thesimulation.

The actual programming of this simulation will happen in several steps, consist-ing first of defining the basic assumptions used and thereafter stepwise improvingthe assumptions to provide a better modeling of the data.

Assumption 1 The three dimensional space used will have a right-handed base,meaning when x points to the right, y will point straight up and z will pointstraight out of the paper.

Assumption 2 Input parameters to the simulation will be given in three dimen-sions, with the camera placed in the Cartesian coordinate system position(0,0,0) and it will be point straight ahead along the z axis for simplified cal-culations.

Assumption one and two are mainly to define how the conversion from three di-mensional space to two dimensional screen space will occur. Carlbom and Paciorekpresents information about how to project the three dimensions down to two di-mensions in the same way that certain cameras do [12]. The fact that cameras use

15

Page 22: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 3. METHOD AND IMPLEMENTATION

similar techniques allows the simpler to define three dimensions while still achievingthe important perspectives. That is, something being smaller further away from thecamera despite being the same size in three dimensions and how shapes change whenprojected down depending on their position relative to the camera. Similar calcu-lations to those presented by Carlbom and Paciorek can be found in many booksand lecture notes dealing with computer graphics, since the perspective projectiontransformation is so vital to that field of computer science.

Since the cameras used in the live situation will provide coordinates in twodimensions with proper perspective and positioning, the simulation will need toprovide two dimensional points as well, otherwise the differences between the sim-ulation and the live situation will be too large to present any meaningful data.

Assumption 3 Movements outside defined windows will primarily be parallel tothe window with few exceptions. The size of the shapes moving will be arbi-trary.

Assumption three is mainly an assumption to provide a starting point for the learn-ing algorithms. The assumption is that the better part of any movements recordedare of people and cars driving and walking by the window, and thus these move-ments usually are parallel to the window. With this basic assumption made, changescan be made later on to provide a more accurate representation of such movements.

Assumption 4 Movements defined by pets within the room will have an arbitrarydirection and velocity. The movement events will be defined by the size of thepet.

There are a number of ways that pets can move within a house, with varyingspeed, positions of rest and general movement. These can not all be simulated,and even simulating a single one of these continuous movements realistically is timeconsuming and complex. Therefore the first basic assumption is that movementsrecorded from pets are not connected, and they move more or less randomly. Thisdoes not fit very well with reality, but if such arbitrary movements can be classifiedwith some reliability, then more reliable movements should hopefully be easier toclassify. Regardless, this assumption can be improved upon at a later date if thesimulation is kept.

The simulation generates events used by the system, and it employs some pro-grammatic techniques (mainly inheritance) that provides easier implementation ofevents, both events needed by the system and ones not within the scope of thisarticle. By employing a time step and querying a normal distribution for a ran-dom value, the simulation checks whether a new event should be generated. Eachevent type to be simulated has a slightly different distribution connected to it, andthat distribution decides the the ratio of events and at what times during the daythat the events should be focused. When an event has been generated, it will beappended to an output file that will be used as input by the learning system.

16

Page 23: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

3.2. VISUALIZING THE RESULTS

3.2 Visualizing the ResultsData from the live scenario consists of either scalars or two-dimensional data, whichmeans that they can be easily visualized using two-dimensional graphics in anymathematical program. The simulation can also return data with three dimensions,the points before they have been projected into the two-dimensional screen space.Therefore it can be useful to provide a top down view of a defined room for thesethree-dimensional points. In the top-down projection the y axis is simply ignored,allowing a R3 → R2 projection. Figure 3.1 shows an example of how the top-downprojection looks with some example data. Since the reverse projection, R2 → R3 isnot easily done, top-down only works for simulated data.To provide a useful data visualization both for the simulation and live scenario wemight wish to present the two-dimensional screen projection of the bounding boxesdefining a detected movement event. Adding windows to this screen space projectionis done by simply projecting the three-dimensional coordinates of the window for thesimulation case or using given two-dimensional screen space coordinates (see section3.4). This visualization, as shown by figure 3.2, is especially useful for reviewingthe effect of window event classifications, since it shows windows within the two-dimensional screen space used by the live scenario. Other events, such as pet eventsmay not be as apparent since they are not as constrained by the room geometry.

Figure 3.1: Top-Down view example using only the window classifier. Data pointsare the center points of detected movement bounding boxes, with black circles beingconsidered normal and red squares as anomalous.

17

Page 24: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 3. METHOD AND IMPLEMENTATION

Figure 3.2: 2D perspective projection example using only the window classifier. Redsquares show unrelated events that are there merely to sidetrack the window filter,black circles are successfully classified window false alarm events, and red circlesare window events that the window filter has not managed to classify as such.

While the above visualizations lay the foundations by showing training and testdata and the classifications for those sets, additions to these foundations allow fora more informative visualization. To appropriately show the effect and behavior ofthe competitive learning algorithm used by the window classifier (see section 3.3),the nodes used in the algorithm will need to be visualized. Since the nodes workonly in two dimensions, the screen space projection visualization above is useful foradding the nodes. Figure 3.2 also shows the starting positions of the competitivelearning nodes as a dot and the end positions of the nodes after training on a setas a star.

For the window classifier visualizing the individual nodes makes sense, but insection 3.4 the nodes used by the Self-Organizing Map for pet sizes will never movein the two dimensions visualized by the figures above. The nodes do however keep avalue of the bounding box diagonal that can be used as a third dimension, makinga height map a powerful tool of visualization for the pet classifier. If the scale ofthe map is chosen to be the same as in the screen space projection, the previousvisualization (figure 3.2) and the height map can be presented together to show amore complete picture, such as figure 3.3.

18

Page 25: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

3.3. KEEPING TRACK OF WINDOWS USING VECTOR QUANTIZATION

Figure 3.3: 2D Projection perspective with corresponding SOM heightmap. Theheightmap shows how the scaled pet sizes have been modified from the default scalingby the algorithm to accommodate pets above the floor plane. Lighter areas allow alarger pet.

3.3 Keeping track of windows using Vector QuantizationAssumption 1 Windows are the only sources of detected movement that can cause

false alarms, and only from movements outside the windows such as pedestri-ans and cars.

Assumption 2 Initial window coordinates are given to the system either by theuser or by the system in some way not within the scope of this article.

For the sake of discussion, let us assume that the above assumptions hold. Then theeasiest option for eliminating any false alarms would be to simply ignore any eventsthat have been constrained by any windows, provided that the window positionsare known. By having the user fill in where in the image the windows are, eventswithin the defined area could then be ignored, and as long as the camera angle andposition remains the same, the system would know which events to ignore.

19

Page 26: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 3. METHOD AND IMPLEMENTATION

Figure 3.4: The windowevents have no offset, as canbe seen by the fact that thewindows have not moved, andmost events are defined asfalse alarms (black circles).Red squares represent eventsthat cannot reliably classifiedas false alarms.

Figure 3.5: The events havebeen offset by a change in cam-era angle, and due to thisthe window filters have movedto accommodate this change.The start positions of the fil-ters are represented by a reddot, and the end positions bya blue star.

There is a major problem with this naive approach, namely the assumption thatthe camera angle will remain constant. While cameras may not move drasticallyfrom day to day, the company has supplied that they will turn the lens toward theroof when deactivated to preserve privacy. The angle they return at may then differsomewhat, which in turn will mean that the filter initially provided by the user maybe slightly misplaced, possibly resulting in false alarms from the windows. If anidentical event occurs before and after a camera adjustment, then with the filterproperly positioned the event will be properly classified as false alarm and ignored.But after the adjustment, the filter may be slightly misplaced and the system mayclassify the event as a real alarm, despite the event being identical to a previouslyclassified event.

To solve this inconsistency with camera movement, one option is to try and trackthe position of the window using the events generated by it. To effectively tracka given window, only movement events close to the windows should be considered,

20

Page 27: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

3.3. KEEPING TRACK OF WINDOWS USING VECTOR QUANTIZATION

something that can be done by applying some distance limit parameters. Thisshould automatically remove events from windows other than the chosen window,provided that the distance limits are small enough and that the camera movementis not too large.

To perform a simple tracking, the filters may be pulled somewhat in the rightdirection to always try and keep the filter in the center of the closest window,provided one filter for every window exists. Since only the filter closest to anyspecific window should be moved for an input, by thinking of the windows as isolatednodes many of the different competitive learning algorithms can be applied to thisproblem. This works since competitive learning algorithms, as mentioned in section2.2, compete for the right to process and learn from a subset of the possible inputspace. By using Competitive learning with a winner-takes-it-all strategy, as hasbeen done in figures 3.4 and 3.5, the nodes will only learn from events generated bytheir own specific window if the distance limit parameters are appropriate.

An issue with tracking the window in this way occurs if the first assumptionabove does not hold. If there are unrelated events, say from pets, they have theability to influence the behavior of the window tracking. Such influence would causethe window filter to become unreliable, since any event, at any distance from thewindow or the current nodes, would pull the nodes away from the window, and thusspreading and warping the filter, as can be seen in figure 3.6. The severity of thisissue can be reduced in various ways, the simplest being to create some additionalrequirements for when the window nodes may learn from an event.

The system should always be able to classify an event, regardless of it is allowedto learn from it or not. By setting a maximum Euclidean distance value (definedin equation 2.10) allowed between the original window center and the event allowsthe lets the filter stay reasonably close to the original window position, while stillallowing flexibility and limiting the effects of unrelated event interference.

Obviously there are other solutions available for this particular problem, forexample to filter using other filters beforehand so that the assumption above does

Figure 3.6: Without constrainingthe filter using some maximum dis-tance limits, the filters may use in-puts from too many unrelated eventsand as such become unstable andunreliable. The middle window fil-ter has moved far away from it’sposition. Black circles as falsealarms, red squares as unrelated orreal alarms.

21

Page 28: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 3. METHOD AND IMPLEMENTATION

Figure 3.7: If the limiting parame-ters are used, then the results af-ter training on the data set usedin figure 3.6 are greatly improved.The filters stay reasonably close tothe original positions, and while thetrainer still has some problems withthe rightmost window, this is ex-pected due to the event density.

hold for most reasonable cases, but the solution proposed above is far simpler andless computationally expensive. If the results are acceptable in this situation, thenusing a simpler solution is often the best choice.

Following is a list of parameters used in the implementation of algorithm VectorQuantization algorithm defined in pseudo-code below (algorithm 1).

numNodes Number of nodes used by the learning system. In a winner-takes-it-allstrategy, each node will represent a window within the image. If learning isallowed for nodes close to the winner as well, a number of nodes could togetherrepresent a window.

delta The learning rate of the system. This changes how much a single data pointinfluences the system.

singleWinner If the winner-takes-it-all strategy should be used. If the single win-ner strategy is not used, all nodes in an area may very well all converge at aspecific point, which may or may not be useful depending on the situation.

neighborhoodSize Only relevant if the single winner strategy is not used. Thisvariable then describes the size of the neighborhood around any winning nodethat also get updates.

maxDist The maximum euclidean Distance from a node from which a data pointcan affect it, even if the node is the winner. Used together with maxDistWinto constrain the filter to a window.

maxDistWin The maximum euclidean Distance from a window centroid, that iscalculated by the initial positions defined by the user, that a node can move.Used together with maxDist to constrain the filter to a window.

22

Page 29: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

3.3. KEEPING TRACK OF WINDOWS USING VECTOR QUANTIZATION

Algorithm 1 Vector Quantization algorithm{Initiating the nodes}for all node in nodes dowindow ←random(windows)node←randomPosWithin(window)

end for{Learning and classifying events}for all event in events doclosestNode←minEucDist(event, nodes)D ←eucDist(closestNode, event)closestWin←minEucDist(event, windows)Dwin←eucDist(closestWin, event)diff ← event− closestNodeif D < maxDist and Dwin < maxDistWin thenclosestNode← closestNode+ diff × deltaif not singleWinner then

for all node in neighborhood(closestNode) donode← node+ diff×delta×D

eucDist(node,event)end for

end ifend if{Classifying events}if event isWithinWindow(closestNode) then

return trueelse

return falseend if

end for

23

Page 30: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 3. METHOD AND IMPLEMENTATION

3.4 A Self-Organizing Map as a Height map for pet sizeThresholds

After the window events have been dealt with, it is of interest to consider pet eventssince they make up the second largest false alarm source in some situations. Petevents are where pets are moving within the vision of the cameras and thereforeget detected. A reasonable starting assumption is that only pet events will cause afalse alarm, same as was assumed in section 3.3.

Assumption 1 The only sources of false alarms are from pets being detectedwithin the vision of the camera.

Assumption 2 Pets generally move along the floor plane, but they may move inan arbitrary but predictable manner. They may for example have favoritespots diverging from the floor plane, such as for example on top of a sofa ortable depending on the pet.

Assumption 3 The length of the pet is given to the system either by the user orby the system in some way not within the scope of this article.

In general the main difference between pets and their owners is the size. Pets dohave a different shape than humans, but this shape may not always be visible onthe picture due to angles and positions, and therefore the bounding box may not bethat different from a bounding box of a moving human. Pets are in general smaller,which is something that can be used to differentiate between humans and pets. Anaive approach to filtering out pet related events would then be to simply classifyany events where the bounding box has a size smaller than or equal to a pet as falsealarms.

While this would work for some events, it would cause more problems than itsolves due to the simple fact that without applying any scaling, a box only justfitting a cat could just as well be a human further away from the camera. Thereforethe problem becomes two-fold, provided that the user supplies some data aboutthe camera and the pet. First the event needs to be scaled depending on where inthe image the movement takes place. After the scaling has been done, the valuesreceived can be compared with threshold values, to decide whether they should beclassified as pet related false alarms or if they should be classified as real alarms.

Sections 2.1.1 and 2.1.2 detail the math behind the scaling operation. Thescaling operation uses information about the length of the pet, the height of thecamera as well as the tilt angle and the field of view angle of the camera. With thisinformation a scaled diagonal of the bounding box can be calculated depending onat which height in the picture the bottom corners of the box are positioned.

Since self-organizing maps are topology preserving they can create height mapswhere the height corresponds to the the threshold values. Further, since scaling theevent position coordinates (x,y) to fit the map can be done in constant time, thetime complexity of classifying an event will be O(1). During training, if the pet has

24

Page 31: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

3.4. A SELF-ORGANIZING MAP AS A HEIGHT MAP FOR PET SIZE THRESHOLDS

Figure 3.8: Training the SOM using 200 simulated training points. Black circlesclose to the lower edge are already small enough to be allowed due to the scaling,and as such do not cause any changes to the map. The red squares represent datapoints not conforming to the scaling and as such the map tries to accommodate theseanomalies.

a spot it likes on for example a couch, the corresponding nodes in the map will learnto allow larger diagonal sizes to accommodate the difference from the floor plane,which is the norm. The self-organizing map will in essence become a heightmap, ascan be seen in figure 3.8 where the height value used is the scaled diagonal allowedat that position.

The initial normal class consist the scaled diagonals allowed at the differentnodes, but after training and accommodation the normal class also includes thechanges to the map to accommodate the anomalies.

Below, pseudo-code of the proposed algorithm has been included for complete-ness.

25

Page 32: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 3. METHOD AND IMPLEMENTATION

Algorithm 2 Self-Organizing Map for pet size thresholds{Initiate Map with default threshold}for all col in columns do

for all row in rows dox← scale(col)y ← scale(row)matrix(x, y)←scaledDiagonal(x, y)

end forend for{Learning and Classification phase}for all event in events dodiagonal← event.diagx← scale(event.x)y ← scale(event.y)storedDiag ← scaledDiag(x, y)diff ← diagonal − storedDiagif diff > 0 then

if not singleWinner thenfor all node in neighborhood(x, y) dodist←

√(x− node.x)2 + (y − node.y)2 + 1

change← diff×deltadist+ε

matrix(node.x, node.y)← matrix(node.x, node.y) + changeend for

elsematrix(x, y)← storedDiag + diff × delta

end ifend ifif diff > 0 then

return trueelse

return falseend if

end for

26

Page 33: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Chapter 4

Results and Conclusions

Before going further into the individual and combined results of this project, onething must be clearly mentioned. Due to factors outside of the authors directcontrol, no live data was available for the author to use for training and testing,something that was not originally intended. All of the results and conclusions aretherefore based on data provided by the simulation detailed in section 3.1. This hasa number of effects both on the available results and the discussion regarding them,as well as what conclusions can be drawn and the suggestions for future work.

4.1 Window Adjustment FilterEven though the window filter has some naive elements in the implementation, it canbe seen in the following figures that the filter can handle both skewed distributionsand fairly large window offsets fairly well, despite the naive distance limits explainedpreviously. Datasets that consist of only window related events, with no unrelatedevents such as pet events or other random events, can be seen in the figures 3.4 and3.5. It can there be seen that the error rate is close to zero for the training case,and that the test case for that distribution has similar results. What is of mostimportance is the results on the testing set, that is, the result on data that hasnot yet been seen by the system. To accurately measure the results, the test andtraining sets should have few discernible differences in terms of distribution. Withthis in mind we will mostly be considering the testing sets.

Favored Distributions A distribution favoring a classifier is one where unrelatedevents make up a smaller part of the dataset than related events, allowing forless interference from such events. If a distribution not favored, the oppositeis true. Then the classifier has to work with less relevant events and has tocope with more interference from unrelated events.

Figures 4.1b and 4.1c shows that that in general, on distributions favoring thewindow classifier, a learning rate (δ) between 0.1 ≥ 0.2 seems to be most effective,with a success rate for the slightly offset windows being ≈ 97%. Something that

27

Page 34: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 4. RESULTS AND CONCLUSIONS

(a) No window offset (b) Slight window offset

(c) Heavy window offset

Figure 4.1: Showing success rates using data sets favoring the window classifier. Thethree diagrams show how the learning rate influences the resulting classifications. Insets where the windows are offset, like 4.1c, a very low learning rate will cause alow success rate since the system cannot react to the change quickly enough. Notethe difference in scales in the different diagrams.

also fits well with what is generally known, that the learning rate should not be toohigh and the default learning rate that many people use is within 0.1 ≥ 0.2. Sincefor this scenario, the only time there is a need for learning is if the windows havebeen offset, it makes sense that figure 4.1a shows that the best value for delta isvery low (0.02). This simply means that the system is already in the best state itcan be, and further learning will only cause the system to overtrain.

Even with heavily offset windows, such as those in figure 4.2, the system managesa degree of success, topping out at 72%. As can be seen in the figure, the two leftmostwindows have less problems than the rightmost window, something that has to dowith the fact that the rightmost window is on another wall in the simulated house.

28

Page 35: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

4.1. WINDOW ADJUSTMENT FILTER

Figure 4.2: A classification using δ = 0.14, the best choice of learning rate fromfigure 4.1c showing heavily offset windows. Red squares show unrelated events thatare there merely to sidetrack the window filter, black circles are successfully classifiedwindow false alarm events, and red circles are window events that the window filterhas not managed to classify as such.

By being on the other wall, the angle to the camera is different, and therefore eventsmay be positioned differently in the two-dimensional space. Add to that the factthat since the window is on the other wall, it is thinner than the other windows,which also reflects on the filter. This is something that is reflected in similar datasetsas well, when there is a heavy offset in that direction.

When using neutral or non-favoring distributions with the window classifier, itcan be seen that the filters are negatively affected in some cases. Without using anylimiting constraints, the filter may end up in the situation shown in figure 3.6, butthis is a very extreme case when there are no constraints active. A more realisticexample would be figures 4.4 and 4.5, showing the projection view and the topdown view of a non-favored distribution classified by the window classifier. In thefirst figure, it can be seen at the rightmost window that the filter has problemsclassifying the rightmost points. This is most likely a result of close by unrelatedevents influencing the node coupled with the difficulty of the rightmost window.

While the result of classifying the distribution discussed above is good (87%)for the chosen learning rate (δ = 0.08), figure 4.3 shows that this result is highlydependent on the learning rate chosen. The success rate takes a sharp dive rightafter the top value before finding a very stable success rate of ≈ 75%.

29

Page 36: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 4. RESULTS AND CONCLUSIONS

Figure 4.3: Diagram showing the effect of different learning rates (δ) on a datasetnot favoring the window classifier. There is a heavy offset, that can be seen in thetwo figures 4.4 and 4.5 also using the same dataset.

Figure 4.4: For visibility, the unrelated events have been removed form this plot.The events are still there to affect the classifier, and can be seen in figure 4.5, whichis the top down view of the same data set. Black circles represent correctly classifiedwindow events, and red circles represents incorrectly classified window events.

30

Page 37: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

4.1. WINDOW ADJUSTMENT FILTER

Figure 4.5: A top down view showing the window classification on a heavily offset,unfavored dataset. Here black circles correspond to successfully classified events,and red squares correspond to what has been classified as unrelated events.

To avoid completely cluttering this section of the article with figures, the best re-sults from the various datasets and classifications have been combined into table4.1. As can be seen, a high success rate can be achieved for both slight and heavyoffsets, depending on the chosen learning rate. The table does however show thefact mentioned above, that the window filter has difficulties with unfavorable dis-tributions in certain situations. These problems tend toward certain windows andwindow configurations, such as the one used in all the figures in this section.

Some possible solutions to these problems will be discussed in the later sections4.3 and 5.

Data Set Distribution δ-value Success Rate (%)Window favored, no offset 0.00 100Window favored, no offset 0.20 90Window favored, slight offset 0.16 97Window favored, heavy offset 1.40 72Pet favored, no offset 1.40 90Pet favored, slight offset 0.06 66Pet favored, heavy offset 0.08 87Pet favored, heavy offset 0.12 75

Table 4.1: Table showing window classifier specific results from various datasetdistributions at a specific δ value.

31

Page 38: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 4. RESULTS AND CONCLUSIONS

4.2 Pet Filtering

Due to the random nature of how the simulation generates pet-related events, it ishard to create scenarios that are really lifelike. Therefore the focus of this sectionwill be to show how the filter looks after training without deeply considering thedataset the filter is training on. Depending on whether the calculated default valueis used or if the height map starts at zero the resulting map varies widely.

For all the figures within this section, a learning rate of δ = 0.3 has been used.This decision was made to allow the filter to learn somewhat speedily, to lower theneed for an extended training period and large datasets. Since the system learns inonly one direction (raising the map), there is no chance of a situation where a nodefluctuates between two points. The only adverse effect a high learning rate mighthave is that the system might learn too much from a single occurrence. While thisis an important factor, as it might lead to over-training, with the lack of real andlife-like data we are considering other factors and leaving this as something to beconsidered in the future, when real data has been obtained.

All the simulated pet favored datasets have a high concentration of events alongthe lower edge, as seen in figure 4.6, with a lower concentration in the rest of theimage. This is an effect of the camera projection and some simplifications made forthat in the simulation. At the bottom of the image the largest diagonal sizes areallowed, since the closer a pet is the larger the bounding box becomes, and as seenalong the lower edge these events are allowed (classified as false alarms) after onlyone iteration. As the iterations continue, the hight difference between these normalcases and the anomalies higher up in image increase to the point where there areonly a few anomalies shown and only one point left that cannot yet be classified asa pet related false alarm.

There are few cases in the live scenario where forty iterations would be used,since using so many iterations would cause a very high coupling between the specifictraining set and the SOM output, which would degrade the ability to generalize onunseen but similar data. Therefore, for further tests and comparisons only fiveiterations are used, which might hurt the test with no default values slightly interms of raw success rate, but but will show how the different cases perform underidentical conditions.

If the default values are used during training, convergence will be much quickersince in theory there is no need to teach the system about what diagonal lengthsshould be allowed along the floor plane. Instead, only anomalies need to be learned,when pets move on top of furniture or stairs and as such leave the normality of thefloor plane.

The default values may also add robustness since the resulting map will mostlikely be much smoother than a map starting from a flat default value of zero.Examples of this can be seen by comparing the two figures 4.7 and 4.8, where thelater is much more jagged, with obvious dips wherever the training data pointsdo not reach. The former, by contrast, is very smooth with only a few peaks foranomalous points. The two figures have been created with the same axis values

32

Page 39: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

4.2. PET FILTERING

(a) Iterations: 1 (b) Iterations: 40

Figure 4.6: At early iterations much of the figure appears light, since the differencebetween the peaks are small. The figure darkens considerably with only a few lighterpeaks as the iterations continue, meaning that those events are highly anomalous.Black circles represent events classified as false alarms, and red squares representreal alarms.

to allow for a better comparison. By comparing the curve in figure 4.7 created bythe default values and the more jagged curve in figure 4.8 it can be seen that theyare somewhat similar, showing that the default values can in fact provide improvedgeneralization on datasets similar to those that have been used for the pet filter.Further views on this similarity can also be seen in figure 4.9, where the raisedimages have been placed in a ninety degree sideways view for ease of comparison.

While this result is by no means a complete proof of the effectiveness of thisfilter, it can be seen as a proof of concept for certain situations, and it remains tobe shown whether the default value can provide similar results in a live situationwith a camera in any reasonable position. This initial result is made with thoseassumptions mentioned in 3.1, and as such cannot be taken for fact by the companyuntil they have been verified in the various live situations that can occur. Thereforethe author has decided not to add any table showing exact results for differentiteration values, since the values would have little meaning in the live scenario.

33

Page 40: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 4. RESULTS AND CONCLUSIONS

Figure 4.7: A raised view of the map shown in figure 4.6, showing the points of themap raised beyond the normal case after five iterations.

Figure 4.8: If a flat default value of zero is used, the filter can still create a workablemap, but this map will take longer to reach similar success rate and will most likelyend up with a much more jagged look. This figure uses the same dataset as 4.6, butwith five iterations.

34

Page 41: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

4.3. CONCLUSIONS

(a) Default values used (b) Flat default value of zero.

Figure 4.9: A 90◦ sideways view of the map, showing the effect of the curving createdby the use of the calculated default values versus the use of a flat default value ofzero. Five iterations have been used on the same dataset as in previous figures. Alsonote the differing y axis scaling.

4.3 ConclusionsBoth filters show some positive results, such as they are. The lack of data froma live situation unfortunately prevents many concrete conclusions regarding theeffectiveness of the filters. This also means there are no effective ways to measurethe type II error rate (real alarms being miss-classified), something that was wishedfor (mentioned early on in section 1.1). This is something that needs to be remediedif either of the algorithms are to be used in the live situation described. Integratingany code of some complexity into a working application without thorough testingmay very well lead to unforeseen consequences like for example a higher type IIerror rate than expected. Below the author has provided a (possibly incomplete)list of possible consequences:

• Higher type II error rate than expected due to miss-classifying real alarms asfalse alarms.

• Long training phase due to low event detection frequency in the live situation.

• Overlearning or lack of ability to generalize after either a long training periodor continuous learning.

• Incorrect assumptions about the nature of the live scenario, for example suchas the derived formulas in sections 2.1.1 and 2.1.2.

• Camera and image related issues affecting the ability to provide the neededvalues for the filters, such as for example the α angle.

• Time and memory requirements on live data to successfully train might behigher than expected.

35

Page 42: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 4. RESULTS AND CONCLUSIONS

The window filter is simple to implement and simple to modify, but the versatility ofthe window filter is relatively low. For the moment the filter only detects rectangles,but this is something that can easily be changed in the future to include arbitraryshapes. Without these arbitrary shapes, the versatility of the window filter is too lowto provide all the functionality that one might need in a filter for motion detection,such as for example the ability to define an area that should not be included in thedetection.

While the window filter will try to keep track of these shapes through the eventsgiven to the filter, if many unrelated events occur or if the camera movement is toolarge for the filter to accommodate it, the filter will not know how to handle thesituation. For a more robust filter more fitting for the live scenario, the filter wouldneed to have contingency plans in place for dealing with such situations, some ofwhich may be discussed in chapter 5.

The pet filter, with its more advanced design, is much slower in terms of trainingsince the filter needs to learn what is a normal state for the environment, even if thedefault values help the filter out by giving some information about normality. Withthe default values, the pet filter goes from the slow convergence of most SOMs (andneural networks) to a much quicker convergence since only anomalies need to betrained. However, if the default values are faulty, their introduction may add typeII errors to the system, and as such the default values will need thorough testing atthe least to make sure this is not the case.

The fact that the pet filter, with correct training and choice of parameters mayperform the same work as the window filter, as well as performing the pet filterspecific functionality makes it a more attractive choice for continued development.By putting more time into developing the pet filter, there might be no reason toimplement the window filter at all, focusing developing efforts and saving both timeand money.

Regarding choice of algorithms for the two filters, the author feels that thecompetitive learning algorithms has many attractive features for these types ofimplementations. Many other available algorithms, such as those present in theappendix, also have attractive features and standard implementations of these couldalso yield favorable results. The main feature considered by the author at the timewas the ability to handle unsupervised learning, since while there are situationswhere supervised or semi-supervised learning could be used in the live scenario,in many cases this would require human interaction to decide whether or not anyparticular event is a false alarm. This would obviously make training the system avery tedious chore, making the training speed a much more important factor thanit otherwise would be.

As the project continued, it became more and more apparent that the initialconstraint mentioned in section 1.1 should have been considered more thoroughly.

“The algorithms may not assume they have access to any of the picturestaken by the camera, the only data available will be the surroundingdata and the bounding box information. ”

36

Page 43: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

4.3. CONCLUSIONS

Constraining the system in this way was thought to provide both benefits and lim-itations to the system’s capabilities. Among the benefits considered were quickerlearning and smaller space requirements due to smaller data instances, easier sim-ulation of data instances, no risk of sidetracking deep into the subject of imageanalysis and use of code already developed by the company. The primary limitingfactor that was considered as the decision was made was that without image data,there was no way to make shape-specific choices when it comes to event detectionand classification. While this was true, as the project continued it was found thatthis was not at all the most problematic issue with excluding the image from thedata instances.

Without images, the data instances became so simplistic that it was very hardto distinguish between a valid movement and a break-in attempt, and as a result thelearning became more erratic. This was something that was explored when testinga support vector solution (more info in A.1), and with only the information given inthe scenario there was no way for the SVM to pinpoint the combination of featuresthat should be considered anomalous. That is, the classifier could not distinguishbetween the normal and anomalous classes in the given feature space.

After the failed SVM implementation, the choice was made to focus on lessfeatures which allows for more basic assumptions to be made about the data. Theresulting implementations are those described in this article, and as can be seenthey make very basic assumptions such as for example all events happening withinan area defined as a window are to be considered false alarm events, without anyconsideration for size or velocity. While this means that the window might beopened without an alarm being raised, if the intruder moves away form the windowthe alarm will trigger since the detected movement is no longer close to the window.

To conclude, the algorithms have shown some positive results that the companymight wish to explore and improve upon, but the introductory constraint not toallow image data should have been more carefully considered before such a decisionwas made. With such image data, the company might have gotten more immediatepractical uses from this project. However, since the simulation would likely havebeen impossible to create with any accuracy if image data was used, this would haverequired some available live data to have been collected either during the project orbeforehand.

37

Page 44: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis
Page 45: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Chapter 5

Future Work

As have already been mentioned in detail above and in earlier sections of this article,the most important task for the future is to run tests on live data. With availablelive data the algorithms can be expanded upon and as such taken in any directionthe company wishes. The company could then decide whether they wish to gotoward several quick and simple filters that focus on one aspect of the featurespace, or if they wish to improve upon one or two larger algorithms to provide thefull functionality they feel they need from the feature space.

From a more theoretical standpoint, under the assumption that live data isavailable, there should be consideration put into measuring training and test per-formance and error rates. Not only for the company’s specific needs, but to havecomparable numbers for use in comparisons with implementations with similar func-tionality but differing machine learning techniques. Since competitive learning wasone option among many it does make sense to consider alternatives since with livedata available the choice might have been different. To further widen the scope,image data could be added to the feature space. Image data however would likelyneed reconsideration of machine learning techniques as well, as the current imple-mentations are specifically designed not to have access to such features.

For the window filter, general robustness is of importance for future considera-tions. The window filter can perform, but the filter lacks the robustness that wouldbe needed for long continuous and unsupervised use. By implementing measuressuch as scaling of the limits described in section 3.3, some problematic situationscould be prevented such as a window far away being moved far out of it’s way, sincethe allowed movement limit does not take into account the distance between thecamera and the window. Implementation of such scaling in similar ways to the petsize scaling described in this article has possibilities, but this has not been exploredfurther. To further increase robustness, the window filter needs contingency plansfor situations such as larger movements than the filter can handle. If for examplethe camera is intentionally moved by the owner to use a different viewing angle andposition.

39

Page 46: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

CHAPTER 5. FUTURE WORK

To increase functionality of the window filter arbitrary shapes for windows could beintroduced. Most likely such shapes would be defined as a two-dimensional polygon,with points and a line connecting them, but they could also be defined by way ofpainting a surface on the camera image. Introduction of arbitrary shapes wouldallow the window filter to provide similar functionality to the pet filter, if on a morebasic level. For example, if an animal has a favorite spot above the floor plane,simply remove that position from the motion detection and apply the pet diagonalscaling test on the rest.

While the window filter is fairly straight forward in it’s filtering process (if it iswithin an area, ignore it), the pet filter is not. This means that live data is even moreimportant for the pet filter than for the window filter, to be able to visualize thesizes that the pet filter will consider false alarms at any given position in the image.Such a visualization would be beneficial to the company, as it could take the formof an overlay to a presented camera image, and with a touch anywhere in the imagethe allowed pet size at that position could be displayed on the screen. Creating thiswould need both graphical user interface development and improvements to the petfilter implementation, but it would be a useful tool for the company commerciallyand for the study of the implementation effectiveness.

Currently the pet filter never lowers the height map created by the SOM. In-troducing lowering of the height map, if done at a flat rate based on time, can beconsidered adding a very basic decay to the system. Adding decay is worth explor-ing, since systems without decay may become over-saturated and by introducingdecay some temporal information is retained, since older information has bearingon decisions, but to a lesser degree than new [33]. Furthermore there is also theoption of allowing correctly classified false alarms to lower the closest node(s) tofind a better fit, but doing this may have unintended side effects that will need tobe explored if this option is to be considered.

Given available technology, the assumptions present in the scaling mechaniccould be replaced by a camera with depth capabilities. That way the distance toany object can be reliably calculated and as such scaled properly without needingto rely on the floor plane assumptions. Depth could then be added to the featurespace, as could other technology such as color data, infrared data and sound if thoseare available.

40

Page 47: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Bibliography

[1] Naoki Abe, Bianca Zadrozny, and John Langford. Outlier detection by activelearning. Proceedings of the 12th ACM SIGKDD international conference onKnowledge discovery and data mining, 2006.

[2] Shigeo Abe. Multiclass Support Vector Machines. Springer, London, secondedition, 2010.

[3] Charu C. Aggarwal and Philip S. Yu. Outlier detection for high dimensionaldata. Proceedings of the 2001 ACM SIGMOD international conference on Man-agement of data, pages 37 – 46, 2001.

[4] S. Albrecht, J. Busch, M. Kloppenburg, F. Metze, and P. Tavan. Generalizedradial basis function networks for classification and novelty detection: self-organization of optional bayesian decision. Neural Networks, Volume 13 Issue10, pages 1075 – 1093, 2000.

[5] Shin Ando. Clustering needles in a haystack: An information theoretic analysisof minority and outlier detection. ICDM 2007, Seventh IEEE InternationalConference on Data Mining, pages 13–22, 2007.

[6] Stephen D. Bay and Mark Schwabacher. Mining distance-based outliers innear linear time with randomization and a simple pruning rule. Proceedings ofthe ninth ACM SIGKDD international conference on Knowledge discovery anddata mining, pages 29–38, 2003.

[7] C.M. Bishop. Novelty detection and neural network validation. IEEE Proceed-ings - Vision, Image and Signal Processing Volume 141 Issue 4, 1994.

[8] Rafal Bogacz, Malcolm W. Brown, and Cristophe Giraud-Carrier. High capac-ity neural networks for familiarity discrimination. ICANN’99 Ninth Interna-tional Conference on Artificial Neural Networks Volume 2, pages 773 – 778,1999.

[9] Joel Branch, Boleslaw Szymanski, Chris Giannella, Ran Wolff, and Hillol Kar-guptai. In-network outlier detection in wireless sensor networks. Proceedingsof the 26th IEEE International Conference on Distributed Computing Systems(ICDCS’06), 2006.

41

Page 48: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

BIBLIOGRAPHY

[10] Tom Brotherton, Tom Johnson, and George Chadderdon. Generalized ra-dial basis function networks for classification and novelty detection: self-organization of optional bayesian decision. Proceedings of the 1998 IEEE In-ternational JOint Conference on Neural Networks Volume 2, pages 876 – 879,1998.

[11] Simon Byers and Adrian E. Raftery. Nearest neighbor clutter removal for esti-mating features in spatial point processes. Journal of the American StatisticalAssociation Issue 93, pages 577–584, 1998.

[12] Ingrid Carlbom and Joseph Paciorek. Planar geometric projections and viewingtransformations. ACM Computing Survey Volume 10 No 4, 1978.

[13] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection:A survey. ACM Computing Survey Volume 41 Issue 3, 2009.

[14] Chih-Chung Chang and Chih-Jeh Lin. Libsvm: A library of support vectormachines. 2011. Available online at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

[15] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vectormachines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[16] Paul A. Crook, Stephen Marsland, Gillian Hayes, and Ulrich Nehmzow. A taleof two filters - on-line novelty detection. IEEE International Conference onRobotics and Automation Volume 4, pages 3894 – 3899, 2002.

[17] Robert Gwadera, Mikhail J. Atallah, and Wojciech Szpankowski. Miningdistance-based outliers in near linear time with randomization and a simplepruning rule. Third IEEE International Conference on Data Mining, pages 67– 74, 2003.

[18] Greg Hamerly and Charles Elkan. Alternatives to the k-means clustering thatmight find better clusterings. Proceedings of the eleventh international confer-ence on Information and Knowledge management, pages 600–607, 2002.

[19] John A. Hartigan. Clustering Algorithms. Wiley, New York, London, 1975.

[20] Trevor Hastie. The Elements of Statistical Learning. Springer, New York, 2009.

[21] Victoria J. Hodge and Jim Austin. A survey of outlier detection methodologies.Artificial Intelligence Review Volume 22 Issue 2, 2004.

[22] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jeh Lin. A practical guide tosupport vector machines. 2010. Available online at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

42

Page 49: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

[23] Byungho Hwang and Sungzoon Cho. Characteristics of autoassociative mlpas a novelty detector. IJCNN’99, International Joint Conference on NeuralNetworks, volume 5, pages 3086–3091, 1999.

[24] D. Janakiram, Adi Mallikarjuna Reddy V, and A V U Phani Kumar. Outlierdetection in wireless sensor networks using bayesian belief networks. FirstInternational Conference in Communication System Software and Middleware,2006.

[25] T. Joachims. Making large-scale svm learning practical. Advances in Ker-nel Methods - Support Vector Learning, 1998. Software available at http://svmlight.joachims.org/.

[26] Thorsten Joachims. Training linear svms in linear time. Proceedings of the12th ACM SIGKDD international conference on Knowledge discovery and datamining, 2006.

[27] Tapas Kanungo, David M. Mount, Ruth Silverman, Nathan S. Netanyahu, An-gela Y. Wu, and Christine Piatko. The analysis of a simple k-means clusteringalgorithm. Proceedings of the sixteenth annual symposium on Computationalgeometry, pages 100 – 109, 2006.

[28] Teuovo Kohonen. Self-Organizing Maps. Springer 3rd edition, 2001.

[29] Junshui Ma and Simon Perkins. Online novelty detection on temporal se-quences. Proceedings of the ninth ACM SIGKDD international conference onKnowledge discovery and data mining, pages 613–618, 2003.

[30] Junshui Ma and Simon Perkins. Time series novelty detection using one-classsupport vector machines. Proceedings of the International Joint Conference onNeural Networks, Volume 3, pages 1741 – 1745, 2003.

[31] Larry M. Manevitz and Malik Yousef. One-class svms for document classifica-tion. Journal of Machine Learning Research Issue 2, pages 139–154, 2001.

[32] Markos Markou and Sameer Singh. Novelty detection: a review - part 1:statistical approaches. Signal Processing Volume 83 Issue 12, 2003.

[33] Markos Markou and Sameer Singh. Novelty detection: a review - part 2: neuralnetwork approaches. Signal Processing Volume 83 Issue 12, 2003.

[34] Sebastian Mika, Gunnar Rätsch, Bernhard Schölkopf, and Klaus-RobertMüller. Constructing boosting algorithms from svms: An application to one-class classification. IEEE Transactions on Pattern Analysis and Machine In-telligence, pages 1184 – 1199, 2002.

[35] Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, andRobert C. Williamson. Estimating the support of a high-dimensional distribu-tion. Neural Computation Issue 13, pages 1443 – 1471, 2001.

43

Page 50: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

BIBLIOGRAPHY

[36] Claudio De Stefano, Carlo Sansone, and Mario Vento. To reject or not to reject:That is the question - an answer in case of neural classifiers. IEEE Transactionson Systems, Man and Cybernetics - Part C: Applications and Reviews, Vol 30No 1, 2000.

[37] Yanxin Wang, Johnny Wong, and Andrew Miner. Anomaly intrusion detectionusing one class svm. Proceedings from the Fifth Annual IEEE SMC, pages 358– 364, 2004.

[38] Kai Zhang, James T. Kwok, and Bahram Parvin. Prototype vector machinefor large scale semi-supervised learning. Proceedings of the 26th InternationalConference on Machine Learning, 2009.

44

Page 51: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

Appendix A

Other Considered Methods

A.1 One Class and Two Class Support Vector Machines

Support Vector Machines (SVMs) are powerful constructs, and they can can beused for both supervised learning by using two-class or multi-class SVMs [2], andunsupervised learning using one-class SVMs[35][31]. SVMs utilize the idea thateven if a dataset is not linearly separable within the current dimensional space, itwill become linearly separable if transformed into a higher dimensional space [22].The transformation into that high-dimensional space is performed using one of anumber of available Kernel functions. These Kernels provide a somewhat differentresult and behavior depending on the kernel used, and they are still very a veryactive area of study. There are however a few kernels that are usually provided bymore introductory articles and books, and as such are generally quite widely used.

Given a training set containing training instances and labels such as (xi, yi)i = 1, ..., l where the training instance xi ∈ Rn and y ∈ {1,−1}l, the SVM willneed to solve the following quadratic optimization problem in order to maximizethe margin between the two classes of data instances [22] [14] [35].

minw,b,ξ 12wTw + C

∑li=1 ξi

subject to yi(wTφ(xi + b) ≥ 1− ξi,ξi ≥ 0

(A.1)

While equation A.1 is complex and for many hard to understand, full understandingregarding the theory of SVMs is not required to be able to utilize the algorithms,especially if libraries such as those mentioned below are available for use. In theequation, φ(xi) transforms the training instance xi into the higher dimensions usingthe chosen kernel, generally one of those shown below. φ is therefore a feature mapfrom the general space to an inner product space like Rn → F . The Kernel functionK(xi,xj) = (φ(xi) · φ(xj)) can then utilize what is commonly known as the Kerneltrick to avoid explicitly mapping the values by only using the dot product [2]. Theconstant C in A.1 is the penalty parameter, since it induces a penalty on the systemfor each error the system allows. This is done simply by summing all the errors ξi

45

Page 52: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

APPENDIX A. OTHER CONSIDERED METHODS

and multiplying it with C, before adding it to the expression that the quadraticoptimization problem solver will try to minimize.

• Linear: K(xi,xj) = xTi xj

• Polynomial: K(xi,xj) = (γxTi xj + r)d, γ > 0

• Radial Basis Function: K(xi,xj) = exp(−γ ‖ xi − xj ‖2), γ > 0

• Sigmoid: K(xi,xj) = tanh(γxTi xj + r)

In the formulas above, γ, r and d are kernel parameters for the specific Kernels. Iwill not further explain the reasoning behind why these Kernels are the most widelyused. In general the Radial Basis Function (RBF) is the most widely used, sincethe RBF kernel resembles a gauss clock in the number of dimensions requested andfor reasons outside of the scope of this article the gauss clock is a very useful toolin many situations.

There exists a number of libraries containing implementations that can be usedfor convenience, such as LibSVM [15] and SVM-Lite [25]. These libraries are widelyused, since they help alleviate the complexity of solving the quadratic optimizationproblem effectively and correctly.

The LibSVM library was first developed in the year 2000, and since then thelibrary has grown with a number of different algorithm implementations and a greatamount of extensions and ports to different programming languages. It containsimplementations for the following: C-support vector classification (C-SVC), nu-Support vector classification (nu-SVC), epsilon-Support Vector Regression (epsilon-SVR), nu-Support Vector Regression(nu-SVR), distribution estimation (one-classSVM) and multi-class classification. Due to the large algorithm support as well asthe test and training sets in this library, many academic articles and projects use itin a variety of situations [37][31][26][38].

The one-class SVM is an unsupervised approach (type 1) and the C-SVC is asupervised approach (type 3) where the classes are defined as normal and anomalous.It is generally known that it is easier to construct an accurate learner if a supervisedapproach can be taken, since then the learner has information regarding how the userwants the output to look like (that is, what is the correct classification for any givenpoint). The learner can then go back and change parameters and values and tryto provide a better percentage of accuracy. This is not the case with unsupervisedapproaches, since the system has no way of knowing if it has classified somethingcorrectly or not. To put it in the scope of this scenario, the one-class SVM has aparameter that is proportional to the number of percent that the system will classifyas outliers. This parameter will decide whether 1% of the dataset will be classifiedas outliers, or 10%, or 50% depending on the number used for the parameter [35].Which points will be classified as outliers depends on other parameters, such as thekernel used, any kernel parameters, and the distribution of the training data.

For the One-class SVM to perform well, the outliers detected must be the eventsthat constitute a real alarm, rather than a movement outside a window or any

46

Page 53: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

A.2. CLUSTERING

other sort of false alarm. More mathematically, there needs to be a significantenough difference between these outliers and other more normal events. Due tothe small number of dimensions available, there may very well be occasions wherethe differences are too small, something that will greatly hinder the applicationof a mathematical model such as the One-class SVM. The Two class SVM faressomewhat better, but still has similar problems depending on the parameters usedand the peculiarities of the training data.

At the start of the project, when the SVM approach was first considered, theidea of scaling the diagonals had not yet been thought of nor implemented. Assuch, the implementation of SVMs within this project was focused toward detectinganomalous motion detection events rather than detecting anomalous diagonal sizes.An option for future work can therefore be to test the performance of SVMs on theupdated scope of the article, that of detecting anomalous diagonal sizes as in 3.4.As implementation and testing of the support vector machines continued, both theone-class and the two-class case, another problem was noticed. When training thesystem, if there exists no data instances showing real alarms, that is, the events thesystem should find, then the learners might very well find other outliers and thuscompletely missing the most important events.

The most likely reason for this behavior is that the boundary created by thealgorithm has the maximized margin between a number of support vectors, and inthe one-class SVM these support vectors will be defined as outliers [35]. Dependingon the parameter mentioned above, the system will find a number of supportingvectors that will be defined as outliers by virtue of being closer to the origin than thegeneral population of data points. Since the support vectors placed during trainingmaintains the boundary and the margin, if there are no real alarms present in thatdata, then those points will not be among those chosen as support vectors. Thehyperplane will then be placed elsewhere, perhaps within what should be consideredthe normal class, and the classification will not be what we wish for in this specificscenario.

A.2 ClusteringApproaching the problem of anomaly detection by way of clustering can be calleda type 1 approach, if one follows the definition mentioned in section 1.2 Hodgeand Austin [21] claim that unsupervised clustering algorithms need all data to beavailable when training is to be done, and that the data needs to be static, sincethe system is analogous to a batch-processing system. Chandola et al. presents twoassumptions that help us grasp the nature of clustering-based algorithms [13].

“Assumption 1: Normal data instances belong to a cluster in the data,while anomalies do not belong to any cluster.”“Assumption 2: Normal data instances lie close to their closest clustercentroid, while anomalies are far away from their closest cluster cen-troid.”

47

Page 54: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

APPENDIX A. OTHER CONSIDERED METHODS

While the first assumption seems more intuitive than the second at first glance, thefirst assumption requires that all normal data instances belong to a cluster, some-thing that can, depending on the algorithm chosen, cause problems with clusters ofvery low density being formed. Many of the more well-known algorithms fall underthe second assumption, such as for example Self-Organizing Maps (can be consid-ered both a clustering algorithm and a neural network) and k-means Clustering[27][19]. Kohonen has thoroughly explored the subject of Self-Organizing Maps as asolution to detecting anomalies in a semi-supervised mode [28]. By first clusteringthe data, and then measuring the distance to the closest cluster centroid and usingthat distance as an anomaly score, these algorithms make use of assumption two toprovide a ranking of possible outliers.

To use these types of algorithms in a semi-supervised mode, the training datais first clustered as mentioned above using the two steps. After that, instances aretaken from the test data, and the anomaly score is calculated in the same way andcompared with the anomaly scores of the clusters, giving the system a comparativevalue to decide whether the data instance should be considered anomalous or not.Since the training data is labeled as normal, defining normal clusters, this meansthat the system operates within a semi-supervised mode.

A.2.1 Computational Complexity

The achievable computational complexity of training an anomaly detection algo-rithm based on clustering is very dependent on the underlying clustering algorithm,since that is where all the essential work lies. If pairwise distance computationbetween all the data pairs is required, then the algorithm will have a quadraticcomplexity in terms of training. There are however a number of heuristic tech-niques that can be used to achieve linear or near linear complexity, for examplek-means or algorithms that use approximations [19] [27] [18].

A.2.2 Advantages and Disadvantages

Techniques that are based on clustering can operate in an unsupervised mode (type1) as well as semi-supervised mode, the later allowing for a faster algorithm as wellas a better result in general. The testing phase is also quick with clustering basedalgorithms due to the fact that any data instance need only test for membershipin a small number of clusters. However, if the clustering algorithm chosen is notable to capture the clusters within the data, then the performance of the anomalydetection algorithm will suffer greatly, due to the high dependency the anomalydetection algorithm has on having such clusters.

As mentioned above, some clustering algorithms require that every normal datainstance belong to a cluster, which can lead to large clusters with low density andlarge false positive errors. Chandola et al. notes that the complexity of performingthe clustering can be expensive, with a O(n2 ∗m) complexity where n as number ofdata instances and m represents the dimensionality.

48

Page 55: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

A.3. NEAREST NEIGHBOR

A.3 Nearest Neighbor

According to Chandola et al. nearest neighbor based anomaly detection techniqueswork under an assumption similarly to the other algorithms discussed in this article[13].

“Assumption: Normal data instances occur in dense neighborhoods,while anomalies occur far from their closest neighbors.”

Based on this assumption, it is intuitive to consider some way of measuring distancebetween two data instances, to determine whether a data instance is far from anotherinstance. This distance can also be thought of as a similarity measure betweentwo instances, and if the distance between them is small, then so should also thedifferences between the two data instances. In one-dimensional data, this can simplybe the difference between the value of the first point and the second point, but inhigh dimensional space, there is a need for a more complex calculation to determinehow similar one instance is to another. For continuous attributes that are not toocomplex, Euclidean distance (see equation 2.10) is a well-used distance measure,but there are other choices if the data is a bit more complex, for example theMahalanobis distance (see equation A.2) mentioned by Hodge and Austin [21].√

(~x− µ)TC−1(~x− µ) (A.2)

The Mahalanobis distance A.2 above calculates a distance from a point to a centerµ defined by correlations within the dataset. Such correlations are given by a co-variance matrix C, and since correlations are taken into account the Mahalanobisdistance can give a better result in some cases where the Euclidean distance hasproblems. The Mahalanobis distance is however much more computationally ex-pensive than the Euclidean distance, since it requires a pass through the dataset tobuild the covariance matrix and identify any attribute correlations. The Euclideandistance, as a comparison, only compares one point with another using the vectordistance.

The algorithm often termed K-nearest Neighbor is often viewed as a black boxprediction engine, since it is highly unstructured, and require little knowledge aboutthe data. Hastie describes the nearest-neighbor technique to be one of the bestperformers in real data problems, and considers them to work reasonably well onlow-dimensional problems, but it is also mentioned that the k-nearest neighboralgorithm does not manage high-dimensional space well, due to a bias-variance tradeoff and the time complexity [20]. The basic nearest neighbor algorithm functionsas a type 3 approach, that is, semi-supervised approaches where the training setcontains the normal class, and any subsequent queries are either close enough to itsneighbors to be classified as normal, or too far, in which case the point is classifiedas an outlier. Hodge and Austin[21] does however refer to type 1 approaches usingvariations of the nearest neighbor algorithms [11].

49

Page 56: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

APPENDIX A. OTHER CONSIDERED METHODS

A.3.1 Computational Complexity

The Nearest Neighbor class of algorithms, being similar to the clustering class, sufferfrom similar problems when it comes to their computational complexity, where thecomplexity is directly proportional to both the dimensionality m and the numberof data instances in the training set n [21]. Many algorithms based on the k-nearestneighbor algorithm will end up having a complexity of O(n2 ∗ m) when training,and O(n ∗ k) when querying if a point is part of the class or not. There has been anumber of optimizations to the k-nearest neighbor algorithm however, that claimsto bring down the average case to close to linear [6] using pruning. Both Chandolaet al. [13] and Hastie [20] also mention other optimizations based on pruning, whichwould allow for optimizing the speed when training and when querying.

Such pruning also eases the second problem of the nearest neighbor algorithm,the need to keep the training set in memory or otherwise available to answer queries.Since the decision boundary in the basic k-nearest neighbor algorithm depends solelyon the underlying training set and their distances to the query point, there is a needto keep this training set in memory, which can take considerable space. Pruningthe training set would allow the system to keep only the points needed to maintainthe boundary.

A.3.2 Advantages and Disadvantages

The basic nearest-neighbor algorithms are very effective in many low-dimensionalcases, able to function and predict well as a black box without much informationand with few parameters to set. The algorithm itself is very simple and it has anumber of optimizing pruning rules that can be applied. The downside is that thealgorithm has a high time complexity both while training and querying without anyoptimization, as well as having a high space requirement due to the need to keep thetraining set in memory even after training. The algorithm does not support on-linelearning out of the box, but when a sufficient training set has been collected, queriesand additions to the training set can be performed in an on-line fashion.

A.4 Neural networks

Neural networks come in many different shapes, and as such have very varied uses.They are in general non-parametric, and they generalize well on patterns they havenot seen [21]. There are options for both supervised and unsupervised approaches,thus various neural networks can be used as any of the approaches mentioned insection 1.2. Some supervised and semi-supervised networks will be discussed below,as well as some notes regarding the self-organizing map. The self-organizing map isalso discussed in section 2.2.

50

Page 57: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

A.4. NEURAL NETWORKS

A.4.1 Supervised and Semi-supervised Neural Networks

The most basic of the supervised neural network approaches, the Multi-LayeredPerceptron (MLP), has a number of issues that in many cases make it unfit foranomaly detection. The MLP is a feed-forward network that uses hyper-planes toseparate data into classes, it does interpolate well, but cannot do extrapolation verywell, and as such can have problems classifying new data in a region where no datahas been seen previously [21], [20].There are some variations that use the fact thatMLPs do not extrapolate well to detect novel data, such as Bishop [7], when heuses it to monitor oil pipeline flows. The MLP classifier also requires a number ofruns through the test data for the weights in the system to settle before being readyto accept queries, something that can make the system unfit for situations wheretraining performance requirements are high or where the training data set is large.

As an extension to the MLP classifier, there exist a classifier called the RadialBasis Function (RBF) classifier, that uses hyper-ellipsoids instead of hyper-planesto separate the data [4], and therefore tends to have a quicker convergence dueto the more powerful hyper-ellipsoids. The RBF classifier can be used as a type3 approach on novelty detection, and can also be adapted to provide incrementaladditions to classes and data[10].

Chandola [13] mentions using a Replicator neural network, which has been usedfor one-class anomaly detection and which is a multi-layer feed forward neural net-work. By using the same number of input and output nodes as there are features(dimensions in our case) in the data, and by training the system by compressing thedata within three hidden layers, the data can then be reconstructed by the systemwhen similar data is sent to the input during testing. If the data is similar enoughto a previously trained data instance, then the output value will be the previouslytrained data instance, and by using the reconstruction error between the input andthe output as an anomaly score, it can be decided whether a data instance is closeenough to the previously trained data to be considered normal.

This replicator network is similar to the auto-associative neural network (AANN)explored by Hwang and Cho [23] and mentioned by Hodge and Austin[21] amongothers, another feed forward perceptron-based network that can be used where type3 approaches are needed. The AANN functions by decreasing the number of avail-able hidden nodes during training, which introduces a bottleneck. The bottleneckforces the system to rely on as few hidden nodes as possible, which reduces theredundancy, making the system focus on the key features in the data. The AANNthen tries to recreate the inputs during testing, and if the input given is far from thepreviously trained data instances the recreation error will be high and an anomalywill have been found.

A third supervised neural network that also has the auto-associative propertyis the Hopfield network, a network using only +1/-1 weights and that is a fullyconnected recurrent network [21][32]. Hopfield nets are based on the way the humanbrain stores memory and discern familiarity. Hopfield nets differ from the above netsin that they apply training inputs to all nodes simultaneously, instead of spreading

51

Page 58: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

APPENDIX A. OTHER CONSIDERED METHODS

slowly through several iterations, which shows the computational efficiency of aHopfield net where training is concerned. Due to this, Hopfield nets can also dealwell with high-dimensional inputs as well as when a large training set is to bestored. [16] presents a novelty detection algorithm fit for type 3 approaches byusing a Hopfield net, where the energy calculated from the net is used to determinewhether a data instance is novel or not, and if so it is classified as anomalous.This works since Bogacz et al. [8] shows that the energy for a pattern which hasbeen learned by the Hopfield network is −(N/2) + noise, where N is the numberof neurons in the network, and the energy for a novel random pattern is zero plussimilar noise term. [16] mentions that based on this, a threshold of E < −(N/4) isnormally used for classification of patterns in this way.

A.4.2 Computational Complexity

The running time during training for neural networks depend on various parameters,but mainly the learning rate. The learning rate dictates how much a single inputduring training influences the weights. If the learning rate is high, then fluctuationsmay occur, but if the learning rate is low, then the system will be slow to converge.Depending on the type of neural network and the number of nodes and layers,the ideal value of learning rate and number of runs (epochs) for convergence willvary. The MLP is generally relatively slow to converge, the RBF helps the casesomewhat with the use of the more powerful hyper-ellipsoids instead of the rigidhyper-planes [20]. The Hopfield is, as mentioned above, very fast at learning newpatterns, since all the weights of the Hopfield network is updated simultaneouslywhen a new pattern is learned.

Kohonen [28] mentions some ways of speeding up the SOM calculations usingpointers to tentative winners, that will reduce the number of comparison operationsfrom quadratic when performing learning through exhaustive search, to linear whenusing the pointers. While that would speed up the SOM, it is still not as quick asthe Hopfield net due to the learning procedure involved in learning a SOM, as wellas querying the system [21].

A.4.3 Advantages and Disadvantages

All of the mentioned neural networks above have had novelty detection or anomalydetection algorithms presented, and as such all can provide a measure of success.Many of the auto-associative networks have similar advantages, the fact that theycan present a piece of data when given a similar, or incomplete input mapping tothe data to be presented. Replicator networks, Hopfield networks and similar auto-associative networks map certain key features to a specific output, and when giventhose key features, or similar, the system remembers it has seen the features beforeand will output the remembered data, thus giving a good measure for testing thedistance between previously trained data and the newly given input. This makesauto-associative networks useful for novelty and anomaly detection, given that one

52

Page 59: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

A.4. NEURAL NETWORKS

has information about the normal class (type 3 approach). Neural networks haverelatively few parameters to consider, however those parameters might need to befine-tuned to provide the best performance.

In many cases, the entire training data set has to be iterated through a numberof times for the neural network to converge, one of the exceptions being the Hop-field network, which for this reason is deemed one of the better novelty detectionsolutions. The fact that the energy calculation of the Hopfield network can be usedto perform queries makes Hopfield networks even better in this regard.

Self-Organizing Maps has the big advantage that they can be used as an unsu-pervised approach (type 1), where they provide a powerful tool due to their abilityto reduce the dimensions of high-dimensional inputs into one or two dimensions.SOMs can therefore be used in combination with other algorithms after the dimen-sion reduction has be done. Regardless of this, their computational complexity canlimit their uses, especially if a complex distance measure is used.

53

Page 60: Lowering False Alarm rates in Motion Detection Scenarios ... · Lowering False Alarm rates in Motion Detection Scenarios using Machine Learning TIM LENNERYD 2D1021, Master’s Thesis

TRITA-CSC-E 2012:024 ISRN-KTH/CSC/E--12/024-SE

ISSN-1653-5715

www.kth.se