21
Mining anomalous events against frequent sequences in surveillance videos from commercial environments Fahad Anwar a , Ilias Petrounias b,, Tim Morris c , Vassilis Kodogiannis d a Wolfson Molecular Imaging Centre, University of Manchester, Manchester M20 3LJ, UK b Manchester Business School, University of Manchester, Manchester M15 6PB, UK c School of Computer Science, University of Manchester, Manchester M13 9PL, UK d School of Electronics and Computer Science, University of Westminster, London W1W 6UW, UK article info Keywords: Knowledge discovery Data mining Sequential pattern mining Periodicity mining Surveillance videos Business intelligence Video mining Anomalous events mining abstract In the UK alone there are currently over 4.2 million operational CCTV cameras, that is virtually one cam- era for every 14th person, and this figure is increasing at a fast rate throughout the world (especially after the tragic events of 9/11 and 7/7) (Norris, McCahill, & Wood, 2004). Security concerns are not the only factor driving the rapid growth of CCTV cameras. Another important reason is the access of hidden knowledge extracted from CCTV footage to be used for effective business decision making, such as store designing, customer services, product marketing, reducing store shrinkage, etc. Events occurring in observed scenes are one of the most important semantic entities that can be extracted from videos (Anwar & Naftel, 2008). Most of the work presented in the past is based upon find- ing frequent event patterns or deals with discovering already known abnormal events. In contrast, in this paper we present a framework to discover unknown anomalous events associated with a frequent sequence of events (A EASP ); that is to discover events, which are unlikely to follow a frequent sequence of events. This information can be very useful for discovering unknown abnormal events and can provide early actionable intelligence to redeploy resources to specific areas of view (such as PTZ camera or atten- tion of a CCTV user). Discovery of anomalous events against a sequential pattern can also provide busi- ness intelligence for store management in the retail sector. The proposed event mining framework is an extension to our previous research work presented in Anwar et al. (2010) and also takes the temporal aspect of anomalous events against frequent sequence of events into consideration, that is to discover anomalous events which are true for a specific time interval only and might not be an anomalous events against frequent sequence of events over a whole time spectrum and vice versa. To confront the memory expensive process of searching all the instances of multiple sequential patterns in each data sequence an efficient dynamic sequential pattern search mechanism is introduced. Different experiments are con- ducted to evaluate the proposed anomalous events against frequent sequence of events mining algo- rithm’s accuracy and performance. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction The proliferation of TV channels and video-based surveillance systems has enabled us to store almost every activity that mirrors our world. This generates huge volumes of data, too much for human operators to process; therefore there is a great need for automated multimedia content analysis (Norris, McCahill, & Wood, 2004). Motivated by the success of sequential pattern mining ap- proaches in analysing transactional data, a significant amount of research effort has been devoted to apply these techniques to multimedia data for unearthing hidden interesting information. In multimedia surveillance videos the sequential patterns are frequent sequences of events with temporal/sequential ordering. We can define the sequential pattern as below. A sequential pattern is a set of events SP = (e 1 , e 2 , e 3 ... e n ) with temporal and sequential ordering having more representation in a database than the user-defined parameter of min_supp (mini- mum support) (Agrawal et al., 1995). Support for a sequential pat- tern is the fraction of data-sequences from the database supporting the given sequential pattern SP ¼he 1 ; e 2 ... e n ; min suppi h\Car in entrance area ! Car in parking road ! Car turn left"; 60%i 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.09.134 Corresponding author. E-mail addresses: [email protected] (F. Anwar), Ilias.Petrounias@- manchester.ac.uk (I. Petrounias), [email protected] (T. Morris), [email protected] (V. Kodogiannis). Expert Systems with Applications 39 (2012) 4511–4531 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Mining anomalous events against frequent sequences in surveillance videos from commercial environments

Embed Size (px)

Citation preview

  • nt

    aWolfson Molecular Imaging Centre, University of Manchester, Manchester M20 3LJ, UKbManchester Business School, University of Manchester,c School of Computer Science, University of Manchester,d School of Electronics and Computer Science, University

    a r t i c l e i n f o

    Keywords:Knowledge discoveryData miningSequential pattern miningPeriodicity miningSurveillance videosBusiness intelligenceVideo mining

    systems has enabled us to store almost every activity that mirrorsour world. This generates huge volumes of data, too much forhuman operators to process; therefore there is a great need forautomated multimedia content analysis (Norris, McCahill, & Wood,2004). Motivated by the success of sequential pattern mining ap-proaches in analysing transactional data, a signicant amount of

    frequent sequences of events with temporal/sequential ordering.We can dene the sequential pattern as below.

    A sequential pattern is a set of events SP = (e1,e2,e3 . . .en) withtemporal and sequential ordering having more representation ina database than the user-dened parameter of min_supp (mini-mum support) (Agrawal et al., 1995). Support for a sequential pat-tern is the fraction of data-sequences from the database supportingthe given sequential pattern

    SPhe1;e2 . . .en;min suppih\Car in entrance area!Car in parking road!Car turn left"; 60%i

    Corresponding author.E-mail addresses: [email protected] (F. Anwar), Ilias.Petrounias@-

    manchester.ac.uk (I. Petrounias), [email protected] (T. Morris),

    Expert Systems with Applications 39 (2012) 45114531

    Contents lists available at

    Expert Systems w

    [email protected] (V. Kodogiannis).against frequent sequence of events over a whole time spectrum and vice versa. To confront the memoryexpensive process of searching all the instances of multiple sequential patterns in each data sequence anefcient dynamic sequential pattern search mechanism is introduced. Different experiments are con-ducted to evaluate the proposed anomalous events against frequent sequence of events mining algo-rithms accuracy and performance.

    2011 Elsevier Ltd. All rights reserved.

    1. Introduction

    The proliferation of TV channels and video-based surveillance

    research effort has been devoted to apply these techniques tomultimedia data for unearthing hidden interesting information.In multimedia surveillance videos the sequential patterns areAnomalous events mining0957-4174/$ - see front matter 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.09.134Manchester M15 6PB, UKManchester M13 9PL, UKof Westminster, London W1W 6UW, UK

    a b s t r a c t

    In the UK alone there are currently over 4.2 million operational CCTV cameras, that is virtually one cam-era for every 14th person, and this gure is increasing at a fast rate throughout the world (especially afterthe tragic events of 9/11 and 7/7) (Norris, McCahill, & Wood, 2004). Security concerns are not the onlyfactor driving the rapid growth of CCTV cameras. Another important reason is the access of hiddenknowledge extracted from CCTV footage to be used for effective business decision making, such as storedesigning, customer services, product marketing, reducing store shrinkage, etc.Events occurring in observed scenes are one of the most important semantic entities that can be

    extracted from videos (Anwar & Naftel, 2008). Most of the work presented in the past is based upon nd-ing frequent event patterns or deals with discovering already known abnormal events. In contrast, in thispaper we present a framework to discover unknown anomalous events associated with a frequentsequence of events (AEASP); that is to discover events, which are unlikely to follow a frequent sequenceof events. This information can be very useful for discovering unknown abnormal events and can provideearly actionable intelligence to redeploy resources to specic areas of view (such as PTZ camera or atten-tion of a CCTV user). Discovery of anomalous events against a sequential pattern can also provide busi-ness intelligence for store management in the retail sector. The proposed event mining framework is anextension to our previous research work presented in Anwar et al. (2010) and also takes the temporalaspect of anomalous events against frequent sequence of events into consideration, that is to discoveranomalous events which are true for a specic time interval only and might not be an anomalous eventsFahad Anwar a, Ilias Petrounias b,, Tim Morris c, Vassilis Kodogiannis dMining anomalous events against frequefrom commercial environments

    journal homepage: wwwll rights reserved.sequences in surveillance videos

    SciVerse ScienceDirect

    ith Applications

    lsevier .com/locate /eswa

  • The signicance of the above mentioned sequential pattern dependson the time it takes to complete thegiven sequential pattern. This fea-ture is dened by the user-dened parameter of sequence duration(SD). Therefore, for a sequential pattern to be supported by a data-sequence it has to occur within the user-dened sequence duration.

    SPhe1;e2 . . .en;min supp;SDih\Car in entrance area! Car in parking road!Car turn left";60%;2minutesi:

    Most of the work presented in the past was based upon nding

    When we see a video, we are basically interested in the visual

    ers have developed methods to detect events in such videos. Themain objectives of these effects are to yield unknown knowledge,such as vehicle classication/identication, trafc ow and thespatio-temporal relationships of different objects in the eld ofview. The techniques presented in Cucchiara, Piccardi, and Mello(2000), Dailey, Cathey, and Pumrin (2000), Huang et al. (1994),Kamijo, Matsushita, Ikeuchi, and Sakauchi (2000) and Dieter etal. (1994) follow a two-step process; rstly objects from the videoswere segmented through low level vision algorithms. Then thebehaviour of these objects was analysed to gain information forimportant decision marking. Most of the work to analyse trafc vi-deo in the past was based on low level feature extraction. However,in recent years researchers have focused on unsupervised imagesegmentation and object modelling to capture spatio-temporalrelationships among objects as well (Shu-Ching et al., 2000,2001a, 2001b). Shu-Ching et al. in Chen et al. (2001) proposed aframework to discover and capture the spatio-temporal

    4512 F. Anwar et al. / Expert Systems with Asequential patterns to provide a hierarchical structure of eventsfor video retrieval and indexing. Research into surveillance videosmainly deals with discovering already known abnormal events. Incontrast, the work presented in this paper is focused on the problemof discovering unknown anomalous events against known sequen-tial patterns (AEASP). That is to discover events that are unlikely tofollow a frequent sequence of events. This information can be veryuseful for discovering unknown abnormal events and can provideearly actionable intelligence to redeploy resources to specic areasof view (such as PTZ camera or the attention of a CCTV user). Dis-covery of anomalous events against a sequential pattern can alsoprovide business intelligence for store management in the retailsector. The importance of anomalous events against a frequent se-quence of events can be seen from the following examples.

    Suppose we have the frequent sequence of events: a vehicle onthe road? a vehicle on the parking road? a vehicle in the park-ing place hsequence duration 03 minutesi. This sequence becomesmore interesting if it is followed by an unlikely event such as vehi-cle on the parking road again, this might be a hit and run accident.Therefore, the attention of the CCTV user needs to be diverted tothe specic video stream. It is important to note that an anomalousevent isolated from the specic sequence of events could be just anormal event (Fig. 1).

    In a retail store environment, suppose we have following fre-quent sequence of events: section D is crowded? high customeractivity in section A? long queues on tills 05 and 06. Althoughthis information provides business intelligence to depute resourcesto tills 05 and 06, it does not provide information regardingwhat re-sources can be spared to supply tills 05 and 06. But, if we had addi-tional information that the above sequence of events is unlikely tofollow high activity on tills 1 & 2 then information about the anom-alous event can be used for effective store management (Fig. 2).

    In a till scanning process at a retail store a well known sequentialpattern of events is (scanning of all the items? End Transac-tion? Total). This normal frequent sequence of events will becomeabnormal (an interesting event) if it is followed by event (scanningan item), as it can be an attempted fraud. Here, again it is importantto note that scanning an item is very much a normal event in iso-lation to the above mentioned sequential pattern of events.

    We can dene the AEASP as an event that is unlikely to follow aspecic frequent sequence of events (a sequential pattern).

    Denition 1

    AEASP SP ! Event:

    Road Parking Road Parking Place

    Anomalous event against frequent sequence of event (sequence pattern ) - A EASP

    VehicleFrequent sequence of events (sequential pattern)

    Fig. 1. Anomalous event against frequent sequence of events.and audio parts of the video; such as one or more entities in the vi-deo that are acting, or some sound that is created by some entitieseither saying or making some sounds. Moreover, when somethingis written on the screen, we are interested in the textual part of thevideo as well. All of these visual, audio and textual strings formmultimedia events. The main focus of multimedia event miningapproaches is to apply data mining techniques to explore the char-acteristics and relationships of these different multimedia contentsand to discover interesting events in multimedia data (ztarak andYazici, 2006).

    Due to the ever-growing number of installations of visual sur-veillance systems to monitor trafc ow on roads many research-The remaining of the paper is structured as follows. Section 2 pre-sents related work and identies how it differs from the work pre-sented in this paper. In Section 3 a comprehensive problemdenition framework is presented. Section 4 focuses on the discov-ery process of anomalous events against the given sequential pat-terns, which is followed by algorithm implementation andvalidation in Section 5. Section 6 summarises the research effortscarried out in this paper, followed by a discussion on future work.Section 7 lists of all the abbreviations in presentation order forthe ease of reader. A list of the abbreviations and acronyms usedin the paper can be found in Appendix.

    2. Related work

    Retail Store

    Section A

    Section C

    Entrance

    A EASP

    Section D

    Section B

    sequential pattern

    Fig. 2. Anomalous event against a frequent sequence of event.

    pplications 39 (2012) 45114531relationships among vehicles through unsupervised video segmen-tation and tracking. Multimedia augmented transition network(MATN) and multimedia input strings were used to model these

  • relationships and explore the hidden information, such as accidentevents, vehicles making a U-Turn, etc. Fig. 3 provides the MATNand multimedia string based modelling example. Here, G repre-sents a target object and C means car. Hence, G1&C13&C10 repre-sents that the rst car is on the top left of G and C10 meanssecond car is on the left of the target object/vehicle.

    Oh and Bandi (2002) identify the importance of data miningactivity for video indexing and propose a framework for indexingunstructured video contents. In the rst step of their proposedindexing framework a background frame is extracted from a givensequence for preprocessing and its colour histogram is computed.Then for each subsequent frame the foreground region is identiedvia background subtraction. Next, these frames are categorisedaccording to the level of motion detected in them. In the last steptemporal information is used to index frames within each category.This process provides a hierarchical structure from a sequencebased on these categories, which are not independent from eachother. Work presented in Oh and Bandi (2002) was later extended

    F. Anwar et al. / Expert Systems with Ain Jung-Hwan et al. (2003) by introducing a technique for auto-matic measurement of the overall motion in not only two consec-utive frames but also of the entire shot (collection of frames). Intheir proposed technique, the location of motion in each frame isused for better categorisation of video contents. The Average Mo-tion Matrix (AMM) was calculated for each shot, then a multi-levelhierarchical clustering approach to group segments in terms of cat-egory and motion of segments is implemented. The algorithm isimplemented in a top-down fashion, where the feature categoryis utilised at the top level. In other words, the algorithm groupssegments into K1 clusters according to the categories. Ghanemet al. proposed a system for mining surveillance video in Ghanemet al. (2004). The main focus was to dene a high level query lan-guage for reasoning about spatial and temporal relations of back-ground regions and moving entities, and about human activities.Moreover another useful contribution was to provide a powerfulgraphical interface where users can formulate visual queries.

    In Turaga et al. (2007) a linear dynamic system (LDS) is pre-sented on optical ow features for surveillance action events. TheLDS iterates between model learning and sequence segmentation.Their proposed algorithms learn the model parameters from a vi-deo stream and then segment a single video sequence into differ-ent clusters where each cluster represents an event; moreoverthey also present a technique to build afne, view and rate invari-ance of the activity into the distance metric for clustering. InHamid et al. (2005, 2007) n-grams and sufx trees are used to minemovement patterns from the CCTV footage collected using ceiling-Fig. 3. MATN and multimedia input strings for modelling the key frames of trafcvideo shots (Chen et al., 2001).mounted cameras. They proposed the representation for an activityas bags of event n-grams that can capture the global structure of anactivity using its local event statistics. The detected patterns enablethe detection of frequent events such as Fedex delivery as wellanomalous events such as truck driving away with its back dooropen. Petrushin in Valery (2005) proposed an effective approachto detect frequent and rare events in a multicamera surveillanceenvironment by using multilevel self organizing map (SOM) clus-tering on the foreground pixel distribution in color and spatiallocation. Another important element of their approach was theanchoring of visualisation and event browsing with the summaryframe. Hanning and Kimber (2006) presented a multi-cameraframework for detecting unusual events in surveillance videos.Their proposed framework uses a two-stage training to generatea probabilistic model for the usual events, and extract the unusualevent by thresholding the likelihood of a test event generated bythe usual event model (Xie, Sundaram, & Campbell, 2008).

    In Toshev et al. (2006), Alexander et al. proposed an Aprioribased model to discover frequent complex events from surveil-lance video. Simple events, such as a man standing or a car parked,were used as input to the model. Two main challenges, which wereconfronted in their model, were: how to determine the similaritybetween two events/classes of events and how to handle uncer-tainty in video data. To calculate the similarity between two pat-terns they introduce the concept of similarity measure. Here,sim(m)(p1,p2) means the similarity measure of order m betweentwo m-patterns p1 and p2. To confront the problem of uncertaintyin video data, they introduce a Weak-Apriori property which de-creases the support threshold for shorter patterns in order to pre-vent losing sub-patterns of frequent patterns. A manually createddescription of the data and results are presented in Fig. 4.

    Each ow displays a sequence of primitive events of vehicle orperson in a zone with the zone name and object type given. Theoccurrence refers to data descriptions. The discovered complexevents are marked bold with their rank to the right (Toshevet al., 2006).

    Due to the importance of news broadcasts there has been muchresearch effort on news video analysis. Zhang, Tan, Smoliar, and Yi-hong (1995) base their work on the anchorperson position in orderto split the news into independent subjects. This model ts a typeof news where the anchorperson and camera position do notchange much. Mohan (1996) proposes to segment TV news by syn-chronising images with the associated close-captions or tele-text.Chen and Faudemay (1997) presents multi-criteria video segmen-tation based on image and sound analysis. In Chen et al. (2003), aneffective data mining framework for automatic extraction of goalevents in soccer video was presented. The proposed frameworkfully exploits the rich semantic information contained in visualand audio features for soccer video data and incorporates the datamining process for effective detection of soccer goal events. In Zhuet al. (2005), recognise the signicance of semantic information forthe video summarisation process. A framework was introduced toconfront the challenges due to undened relationships among vi-deo contents. In their approach different processing techniqueswere used to nd visual and audio in sports videos (e.g., court eld,camera motion activities, and applause) and then associationsamong those clues were explored. Discovery of associations werebased on the domain expert knowledge or by manual observation,such as after a match point score there are applause.

    In Chen et al. (2007) confronted the problem of video eventdetection by proposing a hierarchical temporal association miningmodel. Their proposed model consists of three main components:feature extraction for video and audio streams, hierarchical

    pplications 39 (2012) 45114531 4513temporal association mining and sequential pattern mining.Firstly, video is divided into shots then visual and audio featuresfrom these shots are extracted. Extracted shot level features are

  • th A4514 F. Anwar et al. / Expert Systems withen provided as input to the extended association rule miningalgorithm to discover the temporal patterns (important for charac-terising the events). In the last step, already discovered patternsare used for the sequential pattern mining process to discoverthe events of interest. Another important contribution in theirwork was the adaptive mechanism to determine the thresholdsof minimum support and condence level during the discoveryof association rules and sequential patterns.

    Lawrence and Yi-Jen (2008) proposed a video content manage-ment system for sports related videos. The system consists of threemain modules: shot ontology denition, feature extraction and vi-deo indexing. For the classication of shots into two main catego-ries of long shot and short shot they utilised the ratio of the skincoloured and court coloured portions of video shots. After extract-ing low level features they applied statistical analysis on motion,score board change and shot type to segment the videos intoevents such as goal, foul, free throw, etc. They validate their systemon different basketball related videos.

    In the Informedia project (Haupmann and Witbrock, 1998;Hauptmann and Smith, 1995) speech recognition and image anal-ysis were used to extract content information and to build indicesand a summary. Xingquan and Xindong (2003) identify thatassociations among video contents can provide the basis for videosummarisation. Their approach consists of three steps: video pre-processing, association mining and summary creation. The mainconcept was that in structured video contents (such as sport broad-casts, movies and news videos), certain sequential patterns exist.As shown in the dialog scene of Fig. 5, if we denote the actor byA, the actress by B and the shot that contains both of themby C, then all shots in the rst row of Fig. 5 form a sequenceABACAB and AB is a sequential pattern; therefore by exploringthis sequential pattern, effective summarisation can be done.

    Fig. 4. Manually created description of thepplications 39 (2012) 45114531In Shirahama et al. (2007) a sequential pattern mining ap-proach was used for extracting semantic events in structured vid-eos (movie). Two types of temporal constraints (semantic eventboundaries and temporal localities) were introduced for effectiveltering of sequential patterns, which are unlikely to be semanticevents. The main contribution of their approach was to transformthe raw video data into multi-stream metadata, which not onlycharacterises the semantic events, but can also be used as inputto traditional sequential pattern mining algorithms designed fortransactional data. In their proposed multi-stream approach, dif-ferent aspects of multimedia contents were captured in order tobe used for sequential pattern mining process. The representationof these meta-string is given below (as described in Shirahamaet al. (2007)).

    CH, CS, CV: reect the background information of key frame inhue, saturation, and intensity axis.LN: represents the number of object visible in each key frame.LL: reects the feature set of shape of objects (other thanhumans) in key frame.LB: represents the dominant direction of straight lines in keyframe.SA: represents the size of the main character in the key frame.LA: reects if a weapon is present in the key frame of the shot.SL: stores information about the duration of the shot.MS: represents the movement of objects or background in ashot.MV: is the direction of moment (object or background).SM: stores the sound type information (such as speech, musicor no sound).AM: reects the loudest sound in a shot (such as gunshots,screams, etc.).

    data and results (Toshev et al., 2006).

  • models are utilised for video annotation. The approach presented

    Fig. 5. Transforming a video into a relational dataset (Xingquan and Xindong, 2003).

    F. Anwar et al. / Expert Systems with Applications 39 (2012) 45114531 4515in Teredesai et al. (2006) combined low level image features (col-our, orientation, intensity) with corresponding text annotationsto generate association rules across multiple tables using multi-relational association rule mining. The multi-relational algorithmis basically an extension of the FP-Tree algorithm to discoverassociation rules more effectively. Motivated by the presence of in-ter-concept association relationships and inter-shot temporaldependency Ken-Hao, Ming-Fang, Chi-Yao, Yung-Yu, and Ming-Syan (2008) presented a post-ltering framework for semanticconcept detection in videos. The proposed framework applies asso-ciation mining algorithms for the discovery of inter-concept asso-ciation rules from annotation. Furthermore, a temporal lter wasproposed to explore the inter-shot temporal dependency for im-proved detection accuracy (Ken-Hao et al., 2008).

    2.1. Related research contributions from the data mining domain

    Mining sequential patterns is one of the most extensively re-searched areas in the data mining research community. Agrawaland Srikant rst introduced the problem of discovering sequentialpatterns in 1995 (Agrawal & Srikant, 1995). The problem was tond all frequent sequential patterns from a given customer trans-action database. They proposed three different algorithms: Aprior-iAll, AprioriSome and AprioriDynamic. Since then, a lot of work andimprovements to the original algorithm have taken place (Heikki,Hannu, & Verkamo, 1997; Jiawei et al., 2000; Pei et al., 2001;Srikant and Agrawal, 1996; Zaki, 2001). Most of the previous workAn example of a multi-dimensional categorical stream S whereonly the occurrence of a 4-pattern p4 = (A2, nil), (C3, parallel), (C4,serial), (E1, serial) from the time point t = 1 to t = 4 satises bothSEB and T DT time constraints (see Fig. 6) (Shirahama et al., 2007).

    In Tseng et al. (2006) applied association rules and sequentialpattern mining approaches on both low-level feature and high-le-vel semantic rules for effective video annotation. Later on in Tseng,Ja-Hwung, Jhih-Hong, and Chih-Jen (2008) they extend their workby incorporating the speech features into the model. The modelconsists of three stages: preprocessing, training and prediction.During the training stage four kinds of models for video annota-tions were constructed namely: ModelCRM, ModelVasso, ModelVSeqand ModelSasso. In the prediction stage these already constructedFig. 6. An example of a multi-dimensional cat2.2. Evaluation and discussion

    Most of research efforts presented in Cucchiara et al. (2000),Dailey et al. (2000), Huang et al. (1994), Kamijo et al. (2000), Dieteret al. (1994), Shu-Ching et al. (2000, 2001a, 2001b), and Chen et al.(2001) focused on mining trafc related data and were mainly con-cerned with modelling and retrieving different known events, suchas accidents, U-Turns, line crossings, etc. In Jung-Hwan et al. (2003)and Oh and Bandi (2002) research efforts were concentrated onindexing unstructured video contents. However, only low levelinformation (motion intensity and location) was used for indexingand summarisation processes. The event mining approaches pre-sented in Turaga et al. (2007), Hamid et al. (2005, 2007), Valery(2005) and Hanning and Kimber (2006) used unsupervised eventdiscovery methods to discover usual/unusual events from singleand multicamera environments. Different computational modelsare used, such as clustering algorithms, HMM and coupled HMMmodels, dynamic graphical models. The event mining approachespresented in Turaga et al. (2007), Hamid et al. (2005, 2007), Valery(2005) and Hanning and Kimber (2006) mainly concentrated onpresented in the area of sequential patterns is based upon ndingthe frequent sequential patterns having Positive Behaviour, i.e.they predict what will be the next event/event-set in a sequence.For example, Customers who purchase a HD-TV are likely tobuy a SkyHD box within the next 30 days, (HD-TV? SkyHDBox, 30 days) is a sequential pattern with positive behaviour.However, an aspect of sequential patterns, which can be very help-ful in multimedia mining system has not been fully explored. Thisaspect is to nd the events, which have anomalous propertiesagainst the given sequential pattern, i.e. discovering the eventswhich are unlikely to follow a sequence of events. There have beensome research efforts to mine both positive and negative associa-tion rules together (Cornells et al., 2006; Maria-Luiza and Osmar,2004; Xindong, Chengqi, & Shichao, 2004). However they ignorethe temporal aspects of events (temporal order of events in a pat-tern). The work presented in Kazienko (2008), concentrated on dis-covering the negative conclusion for a given sequential pattern inpost mining environment. The concept is that if there is a fre-quently observed sequence fS then elements of set X usually donot occur after that sequence.egorical stream (Shirahama et al., 2007).

  • sequences of events, which exist on the boundary of two data se-quences. Furthermore, the algorithms presented in Kazienko(2008) and Anwar and Petrounias (2005) discover the items/item-sets having negative conclusion for only one given sequentialpattern at a time, whereas, the work presented in this paper can

    we introduce the dynamic sequential pattern search mechanism

    th Applications 39 (2012) 45114531discovering the usual/unusual events from raw multimedia sur-veillance footage; whereas, the work presented in this paper fo-cuses on mining the already detected events to extract hiddeninformation from them. Similar to our work, the approach pre-sented in Toshev et al. (2006) falls into the category of a post-eventdetection model; however it does not explain how temporal as-pects can be modelled during problem denition of the miningprocess. Furthermore, that approach discovers frequent complexevents, whereas in surveillance videos the unusual events are morevaluable.

    The mining efforts presented in Zhang et al. (1995), Mohan(1996), Chen and Faudemay (1997), Chen et al. (2003), Zhu et al.(2005), Chen et al. (2007), Lawrence and Yi-Jen (2008), Haupmannand Witbrock (1998), Hauptmann and Smith (1995) and Xingquanand Xindong (2003) are mainly based on already known relation-ships and are tightly domain specic as well; hence it is not feasi-ble to use these frameworks for any other domain videos. Workpresented in Shirahama et al. (2007), Tseng et al. (2006, 2008),Teredesai et al. (2006) and Ken-Hao et al. (2008) provides excellentframeworks for indexing and mining video information by utilisingsemantic information; however, these approaches are based purelyon structured data. The transformation of raw multimedia datainto metadata streams presented in Shirahama et al. (2007) is a so-lid concept as this data can be used with any association rules/sequential pattern mining algorithm developed for a transactiondatabase. However the metadata presented in that approach isvery much focused on structured data (more specically on mov-ies). Moreover, the experiment shows that discovery of semanticevents was mainly due to the structured nature of movie contents.Although the work presented in Ken-Hao et al. (2008) applied min-ing algorithms on processed data, it is mainly focused on video re-trieval and discovering frequent events.

    Most of the work presented above is based upon nding the fre-quent association patterns to provide a hierarchical structure ofthe events in structured videos to be used for video retrieval andindexing. The research efforts dedicated to unstructured videosmainly deal with discovering the already known abnormal events.In contrast, the work presented in this paper is focused on theproblem of discovering unknown anomalous events associatedwith a frequent sequence of events (AEASP); that is to discoverevents which are unlikely to follow a frequent sequence of events.Furthermore, we also propose the event mining framework, whichexplores the relationship between entity feature-sets and associ-ated text strings to generate appearance models of event entitiesautomatically. The research approach suggested in our work differsfrom the Pre-event detection approach used in most previous re-search work as it is based upon a Post-event detection environ-ment. In a Pre-event detection environment, the miningmethods are not reliant on previous knowledge about the data,whereas in a Post-event detection environment, mining algo-rithms utilise already detected events results and perceptions ofdomain experts or already discovered patterns. An advantage ofsuch an approach is that the proposed method can be integratedwith any surveillance system or multimedia event mining method.

    The work presented in Anwar et al. (2010), Kazienko (2008) andAnwar and Petrounias (2005) is similar to our concept of discoveryof anomalous event against the given frequent sequence of events(presented in this paper); however, these approaches are specicto transactional databases as compared to the multimedia data ad-dressed in our work. Due to the nature of multimedia data, the dis-covery process of AEASP will be more intensive. Unlike intransaction data, where support is counted for each customer datasequence, with multimedia data all frequent sequences of events

    4516 F. Anwar et al. / Expert Systems wineed to be searched and then events with anomalous propertiesare to be discovered and ranked. Multimedia data also introducechallenges of generating data sequences and handling frequent(DSPS_SM). DSPS_SM facilitates the discovery of all anomalous eventsagainst all the given sequential patterns in one database scan andoptimises the AEASP discovery process considerably. In addition tothis our proposed mining framework also takes Anwar et al.(2010), the temporal aspect ofAEASP into consideration, that is to dis-cover anomalous events which are true for a specic time intervalonly andmight not be an AEASP over awhole time line and vice versa.

    3. Problem denition framework

    One of the most important ingredients of any multimedia min-ing application is the provision of a exible and comprehensiveproblem denition framework in which users can express theproblem statement comprehensively and easily. Due to the tempo-ral nature of AEASP and sequential patterns it is important that be-fore discussing the problem denition framework in depth, weintroduce the two basic temporal entities that of chronon andinterval. These temporal entities will be used for dening the tem-poral aspects of AEASP and sequential pattern.

    A chronon is an application dependent, non-decomposable timeinterval of some xed minimal duration, at which an event takesplace. The granularity of a chronon varies with each application,for example, it can be a millisecond, second, a minute. Whereas,an interval is a non-empty set of contiguous chronons; the granu-larity of intervals in multimedia mining applications varies due tothe user preference. For example, interval granularity can be ofminute, hour, shift (set of hours), or a day (Chen, 1999).

    In a broader view, the problem of discovering all the AEASP for allof the given sequential patterns can be dened as:

    Table 1Search windows for each sequential pattern.

    SP Search windows No. of SW

    1 {(13), (2,4), (35), (46), (57), (68), (79), (810)} 8discover anomalous events against multiple sequential patternssimultaneously. Confronting multiple sequential patterns simulta-neously increases the complexity of the AEASP discovery processconsiderably. The complexity of the algorithm is based upon thefact that the suggested algorithm has to nd the existence of eachgiven sequential pattern within the specic sequence duration (SD)(SD is a time limit during which the sequential pattern must exist).Therefore, the process of searching the existence of each sequentialpattern and then discovering the anomalous events can span intomultiple search spaces, with each given sequential pattern havingits own different search spaces to be confronted, these searchspaces can also be known as search windows (SW). By SW wemean the limited search space in which we have to nd theexistence of each given sequential pattern. For example if we havethree given sequential patterns SP1, SP2, SP3 with associated SD of3, 8, 5 min, respectively, and the data-sequence length is 10 min,we will have the following multiple SW for each given sequentialpattern (Table 1).

    With reference to the above mentioned consideration, it isimportant to devise amechanismwhich canminimise the expensiveprocess of searching the existence ofmultiple sequential patterns ineach data sequence. To confront the abovementioned complexities,2 {(18), (2,9), (310)} 33 {(15), (2,6), (37), (48), (59), (610)} 6

  • CL

    Denition 4.

    order to be considered as an anomalous event against the given

    th AAEASP fe1; e2 . . . engAs we are dealing with two concepts, sequential pattern and

    AEASP, the suggested problem denition framework needs to beexible enough to dene both these concepts comprehensivelyand easily. Let us rst discuss the frequent sequence of events(sequential patterns) and see how we can dene the problem ofsearching for all instances of given sequential patterns.

    A sequential pattern can be described as a set of eventsSP = (e1,e2,e3 . . ..en) with temporal/sequential ordering satisfyingthe sequence duration (SD). SD is the time limit during which thesequential pattern must exist.

    SPSET fSP1; SD1; SP2; SD2 . . . SPn; SDng:Since given sequential patterns can have different SD and theirgranularity can differ as well, the problem denition is further ex-pended as

    SPSETfSP1;SD1;GR1;SP2;SD2;GR2 . . .SPn;SDn;GRngh\Carinentrancearea!Carinparkingroad!Carturnleft"; 2; minutesi

    For a better understanding of problem denition of AEASP we intro-duce four user-dened parameters, namely: time period (TP), DataSequence Temporal Granularity (DSGR), Maximum Time Interval(MTITVL) and Maximum Tolerance (MTOL).

    3.1. Time period (TP)

    It is quite possible that the user is very much interested in dis-covering AEASP against a given sequential pattern which may onlyexist during a particular time period rather than in the completetime spectrum. Hence the user-dened parameter TP can be usedto reect this concept. TP represents the time period during whichAEASP needs to be discovered. For example, we can say an event E isAEASP against a sequential pattern X during the time period of 1stJan. 2009 to 30th March 2009. This information not only reducesthe extra burden on the mining process by concentrating only onthe user-dened segmented event database, but TP can also beused to observe the changes in the existing patterns by comparingthe mining results based upon different time-periods.

    Denition 3.

    hTP; SPSET; CLAEASPi:

    3.2. Data sequence temporal granularity (DSGR)

    Unlike a transactional database where each data sequence isnormally related to a specic customer or uniqueID, in a videoevents database there is no specic segmentation of video eventsinto data sequences. Detected video events are normally storedsequentially with temporal information, such as event E at 7:00pm on 10th July, 2009. Hence, the logical segmentation of detectedevents can be to divide them in some user-dened time granular-We are given a time-stamped database of events D over a timedomain T and a set of sequential patterns SPSET and a list of candi-date AEASP (CLAEASP). The goal is to discover all AEASP with referenceto each given sequential pattern. CLAEASP can be all the events in thedatabase or a user-dened list of events.

    Denition 2.

    < SPSET;CLAEASP >SPSET fSP1; SP2; SP3 . . . SPng

    F. Anwar et al. / Expert Systems wiity, which we call the Data Sequence Temporal Granularity (DSGR).One important property of the user-dened DSGR is that it shouldbe higher then the granularity of given sequential patterns. Forsequential pattern (that is the average support of specic eventin all data sequence is less than or equal to MTOL).

    3.5. Final problem denition

    We have a database of multimedia events D spanning over atime domain T, each record is a tuple of hEventID, Event start time,sequential pattern.

    Denition 6.

    TP;SPSET;CLAEASP ;DS;;;MTOL

    SPSET SP1;hSD;GRi;hMT ITVL;GRi;MTOL ... SP;hSD;GRi; MT ITVL;GRh i :Since the video event database will be segmented according to

    DSGR, MTOL needs to deal with the percentage of an event presenceafter the given sequential pattern during the user-dened MTITVL.Hence, if the MTOL parameter is set to 10%, this means for an eventto be considered as AEASP, its support for a given sequential patternmust not exceed 10%. By support we mean presence of the eventafter the given SP within the user dened MTITVL. For example, ifa given SP is found 10 times in a data sequence and event E supportagainst the given SP is found 3 times within the user-denedMTITVL, then it cannot be considered as AEASP for this data sequencesince it exceeds the user dened parameter of MTOL (10%). How-ever, it is possible that an event with more support than MTOL isstill discovered as overall anomalous event against that specicTP; SPSET; CLAEASP;DSGR

    :

    3.3. Maximum time interval (MTITVL)

    The maximum time interval (MTITVL) is a time interval after thesequential pattern during which we have to nd the presence ofAEASP. This parameter is imperative since we are not interested inan event, which is unlikely to follow the given sequential patternafter a relatively long period of time. For example, the informationthat an event Vehicle on the parking road does not normally fol-low the sequential pattern of events (a vehicle on the road? avehicle on the parking road? a vehicle in the parking place) is onlyinteresting within 02 minutes after the given sequential pattern. Aseach sequential patterns nature can be different, MTITVL needs tobe dened for each given sequential pattern.

    Denition 5.

    TP; SPSET; CLAEASP;DSGR

    SPSET SP1; hSD;GRi; hMT ITVL;GRi . . . SPn; hSD;GRi; hMT ITVL;GRi:

    3.4. Maximum tolerance (MTOL)

    Anomalous events against a given sequential pattern mean thatthey have no or very limited existence between the end time of thesequential pattern and the end time of the user-dened parameterofMTITVL. The user-dened parameter ofMTOL is introduced here todetermine the maximum presence allowed for a detected event inexample, if the given sequential pattern granularity is Minutesthen DSGR cannot be set to seconds or minutes it has be hours orany other higher granularity.

    pplications 39 (2012) 45114531 4517Event end timei; a known set of sequential patternsSPSET = {SP1,SP2,SP3. . .SPn) where SP(e1,e2,e3. . .ei. . .en) is a set ofevents in sequential/temporal order (e1,e2. . . < ej. . . < en) along with

  • sequence duration (SD); a list of candidate events for AEASP (CLAEASP)and the user-dened parameters of Time Period (TP), Data Se-quence Granularity (DSGR), Maximum Tolerance (MTOL), and Maxi-mum Time Interval (MTITVL). The problem to investigate here is tond all the events, which are unlikely to follow each given sequen-tial pattern satisfying the user-dened parameters of MTOL andMTITVL.

    Denition 7.

    TP; SPSET; CLAEASP;DSGR; ;MTOL

    SPSET SP1; hSD;GRi; hMTITVL;GRi . . . SP; hSD;GRi; hMTITVL;GRihTP; SP1; hSD;GRi; hMTITVL;GRi . . .

    CL

    along with a pointer to A and user-dened parameters ofMTITVL

    the gi

    data sequences. Firstly, we lter the data according to the user-dened parameter of Period Time (TP). For example, if TP is denedas 1st January, 2009 to 30th March, 2009 then all the events de-tected during this period will be ltered out for onwards data pro-cessing. Secondly, we divide the events database into user-denedData Sequence Temporal Granularity (DSGR). For example, if the

    However, during the rest of the day it is not an anomalous event

    4518 F. Anwar et al. / Expert Systems with Applications 39 (2012) 45114531the support of all CLAEASP during the user-dened time limit(MTITVL).

    4.1. Data pre-processing

    The data preprocessing phase consists of two steps: data lter-ing and data segmentation, that is to divide the event database into

    AEASP Discovery Process

    Data Preprocessing

    Dynamic sequential patterns search

    mechanism (DSPS_SM)

    Support discovery process for all

    CLAEASP

    Discovered AEASP for each given sequential patternven sequential patterns in each data sequence and then ndthe support of all CLAEASP during the user-dened time limit(MTITVL). Any sequential pattern can occur multiple times in anydata sequence, therefore we may have to nd all the instances ofEASP

    and MTOL to the next phase in which it attempts to discover. . . SPn; hSD;GRi; hMTITVL;GRi;MTITVL; AEASP;DSGR;MTOLiThe problem denition illustrates that every discovered AEASP

    must have the following properties:

    AEASP start time should always be greater than the given sequen-tial pattern end-time (SPET). SPET is the time of the last event inthe given sequential pattern.

    AEASP ET > SPET

    AEASP end time should always be less than or equal to the user-dened parameter of MTITVL.

    AEASP ST MT ITVL

    4. AEASP discovery process

    The discovery process of AEASP consists of three main phases;Data Pre-Processing, Searching the instances of multiple sequentialpatterns and Discovery of the presence of CLAEASP (Fig. 7). The lasttwo phases work alternatively. The algorithm nds the instanceof a given sequential pattern in the data-sequence. Then the algo-rithm passes the searched sequential pattern instance end time

    CLFig. 7. AEASP discovery process.against SPn; moreover dividing the data into DS also reduces theamount of memory/processing power required for the proposedalgorithm.

    A sequential pattern is a set of events with temporal andsequential ordering. Therefore, if we divide data into different timeintervals according to the user-dened temporal granularity(DSGR), it is possible that we may nd a sequential pattern instance,which exists in two data sequences. To capture this property ofsequential patterns, we generate a ying data sequence by takinga portion of the current and previous data sequences (Fig. 9). Thelength, starting position and end position of the ying data se-quence can be calculated using the following equations:

    FDS SPPSD P PSD ET SD 1;FDS EPCSD P CSD ST SD 1 MTITVL CSD ET ;FDS L PDS ET FDS SP FDS EP:

    PDS = Previous data sequenceCDS = Current data sequenceFDS_SP(PSD_P) = Flying data sequence start position from PDSFDS_EP(CSD_P) = Flying data sequence end position from CDSFDS_L = Flying data sequence lengthPSD_ET = Previous data sequence end timeCSD_ST = Current data sequence start timeCSD_ET = Current data sequence end time

    For example, if PSD_ET and CSD_ST is 60 and the given sequentialpattern sequence duration is 10 along with an MTITVL of 5 then:

    FDS SP 60 10 1 51;FDS EP 60 10 1 5 60 14;FDS L 60 51 14 23:

    4.2. Efcient dynamic sequential patterns search mechanism (DSPS_SM)

    Any sequential pattern can exist multiple times during eachdata sequence (SD). Therefore, it is likely that the algorithm willhave to nd multiple instances of SP in each SD. As the algo-rithm tries to discover anomalous events against multiple

    Database of eventsData sequence according to

    user define granularitygranularity is dened as hours, then we will divide the lteredevents database into different data sequences (DS), each will con-tain all the events detected during that specic hour (Fig. 8).

    This segmentation of event data into data sequences is impor-tant as it enables us to discover AEASP which are true for specictime intervals only. For example, it is possible that event X is ananomalous event for SPn between 10 am and 11 am every day.Time Time

    Fig. 8. Data transformation.

  • Current Event (CEvent). CEvent is a sequential pattern event forwhich DSPS_SM is currently searching the support.

    Left Hand Side Events (ELHS). All the events of the givensequential pattern, which are on the left side of CEvent areknown as ELHS.Right Hand Side Events (ERHS). All the events of the given

    The evend ev

    CSW

    Time Line

    Flyingdata sequence

    12074

    nce (suppose SD is 10 min).

    F. Anwar et al. / Expert Systems with Applications 39 (2012) 45114531 4519sequential patterns, it increases the complexity of searching allthe instances of each sequential pattern in one database scan.The complexity of the algorithm is based upon the fact thatthe suggested algorithm has to nd the instances of each givensequential pattern within the specic sequence duration. There-fore, the process of searching the instances of each sequentialpattern can span into multiple search spaces, with each givensequential pattern having its own different search spaces to beconfronted. We call these search spaces search windows (SW).By SW we mean the limited search space in which we have tond the instance of each given sequential pattern. For example,if we have three given sequential patterns SP1, SP2, SP3 withassociated SD of 3, 8, 5 min, respectively and the data-sequencelength is 10 min, we will have the following multiple SWs foreach given sequential pattern (Table 2).

    The length of any SW is equal to the given SD and the number ofSWs for a specic sequential pattern in a data sequence can be cal-culated from following equation:

    DSLEN SDLEN 1:

    DSLEN = Data sequence lengthSDLEN = Sequence duration (SD) length

    For example if SD for a given sequential pattern is 28 min and thelength of the data sequence is equal to 1 h (60 min) then the totalnumber of SWs for this specic SP is equal to 33 in each datasequence.

    DSLEN SDLEN 1;60 28 1 33:In light of the above mentioned challenges, it is important to formu-late a mechanism, which can minimise the expensive process ofsearching for instances of multiple sequential patterns in each datasequence. To confront the above mentioned complexities, we intro-duce a dynamic sequential patterns search mechanism (DSPS_SM).DSPS_SM facilitates the discovery of all anomalous events againstall the given sequential patterns in a single database scan and opti-mises the AEASP discovery process considerably. To elaborate theconcept of DSPS_SM, we rst need to discuss how we will perceivethe status of sequential pattern events during the DSPS_SM and whatwe mean by a valid discovered event.

    1 6051

    Fig. 9. Flying data seque4.3. Sequential pattern events status during DSPS_SM

    In DSPS_SM the given sequential pattern events are divided intothree status:

    Table 2Search windows for each sequential pattern.

    SP Search windows No. of SW

    1 {(13), (2,4), (35), (46), (57), (68), (79), (810)} 82 {(18), (2,9), (310)} 33 {(15), (2,6), (37), (48), (59), (610)} 6CSW(ET) = End time of current search windowELHS_LDT = Last discovered left hand side event time

    4.4. Principles of dynamic sequential patterns search mechanism

    The main idea of DSPS_SM is to explore the sequence durationproperty of the given sequential patterns and utilise its nature dur-ing the search process of all the given sequential patterns in a

    A B M D

    C Event E RHS

    C Event E RHSE LHS(ST) = Start time of current search windowET > ELHS LDT

    ET = Current event timeent time should be greater than the last discovered left handent (ELHS).sequential pattern, which are on the right side of CEvent areknown as ERHS.If the CEvent is the rst event of a given sequential pattern thenthere will be no ELHS and if the CEvent is the last event of a givensequential pattern then there will be no ERHS (Fig. 10).Valid discovered eventFor an event to be considered as a valid discovered event, it hasto satisfy the following conditions: The event time should bewithin the boundaries of the current search window (CSW).

    CSWST P ET 6 CSWETA B M D

    A B M D

    A B M D

    C Event E RHSE LHS

    C EventE LHS

    Fig. 10. Sequential pattern events status in DSPS_SM.

  • single database scan. Moreover, DSPS_SM also utilises the temporal/sequential nature of sequential patterns to scan only the minimumrequired data-set from the event database during the search pro-cess of given SPs. In the following sub-sections we will discuss boththese concepts in more detail.

    4.5. SD properties for different sequential patterns

    Due to the sequence duration (SD) property of sequential pat-terns, the SP search process needs to be divided into differentsearch windows (SWs). Each sequential pattern can have differentSD (which means a different size and number of SWs for eachsequential pattern). This enhances the complexity of discoveringsupport of all sequential patterns in one database scan. However,the following properties of SDs enable us to utilise the scanning re-sult of each SW for multiple sequential patterns, which reduces the

    (ERHS) have no signicance in the current search window (CSW).This is due to the fact that for a valid support of a given SP, everyevent has to be discovered within the CSW. Hence, the process ofsearching ERHS in CSW will be terminated and DSPS_SM moves tothe next search window (NSW) (Fig. 12).

    Suppose we have a sequential pattern SP = (A? C?M? Y? C? B) and support of the events A and C is discoveredin CSW. However, if the support of event M is not discovered thenthere is no need to search the remaining events (YCB) in CSW andDSPS_SM moves to the NSW (see Fig. 12).

    4.8. SP temporal/sequential property 2

    If an already rst discovered left hand side event (ELHS) from theprevious search window (PSW) is valid in CSW then all previouslydiscovered ERHS of PSW are valid in CSW as well.

    4.9. Data structure

    By utilising the above mentioned properties of sequential pat-terns and the temporal/sequential nature of a given sequential pat-tern, the proposed algorithm can search all sequential patterninstances with only one database scan. To accomplish this, thealgorithm needs to have a data structure, which can hold the statusof all events during the SP support discovery process (this is be-cause we want to utilise the already discovered events of differentsequential patterns). The DSPS_SM processes the data with the fol-lowing data structure elements.

    s w

    4520 F. Anwar et al. / Expert Systems with Applications 39 (2012) 45114531burden of database scanning considerably.

    SD Property 1: The rst search window (FSW) of the given SP,which has the largest SD overlaps all the rst SWs of all othergiven sequential patterns (Fig. 11(a)).SD Property 2: The sequential pattern, which has the largest SDmay overlap with multiple SWs of other SPs with shorter SD asshown in Fig. 11(b) and (c).SD Property 3: All search windows (SWs) can expand with sin-gle granularity interval. For example, if the sequence durationgranularity is dened as minutes then all SWs can expand witha single minute (Fig. 11(d)).

    4.6. Sequential pattern temporal/sequential properties

    During the sequential pattern search process a signicant num-ber of different SWs overlap with each other (Fig. 11(d)). Therefore,the scanned result of the overlapped SW portion can be utilisedduring the search process of sequential pattern instances in thenext SW. This property enables the algorithm to expand the searchdynamically; that means it will only search the next event of SP inCSW if it is required. To utilise the search results of the previous SW,DSPS_SM utilises the following temporal/sequential properties ofsequential patterns:

    4.7. SP temporal/sequential property 1

    If DSPS_SM does not nd the support for the current event (CEvent)of a given sequential pattern then all the right hand side events

    Suppose we have three sequential pattern(SP1,7) (SP2,2) (SP3, 4)

    0 2 4 7

    SP2(SW1)

    SP3(SW1)

    SP1(SW1)

    Time

    0 4

    MutipleSWfor SP3

    Time

    SP1(SW1)

    7

    (a)(c)Fig. 11. Multiple search windows wiSequential pattern information (SPInfo): This data structure isused to hold information about all the sequential patterns forwhich support needs to be searched. The SPInfo is a mainly staticdata structure (populated at the start of DSPS_SM discovery pro-cess and only the SPSP and SPEP elds are updated once aninstance of a specic SP is found (Table 3).

    ith following SD

    0 2 4

    MutipleSWfor SP2

    Time

    SP3(SW1)

    0 2 4 Time

    SP2SearchWindows

    3 5

    (b)

    Fig. 12. First temporal/sequential property.(d)th sequence duration properties.

  • Search window information (SWInfo): This data structure isused to hold information about SWs of each sequential pattern.The SWInfo is updated according to the need of new SWs duringthe search process of all given sequential patterns (Table 4). ByFEP and LEP we mean the rst and last detected event positionsin a specic search window.Sequential pattern events information (SPEInfo): SPEInfo is avertical transformation of the original database for uniqueevents of all the given sequential patterns. SPEInfo expandsdynamically as the process shifts to the next search window(Table 5).Current search event (CSE): CSE holds the list of events for allthe given SPs, which need to be discovered next. CSE basicallyinforms DSPS_SM which events need to be searched next in theCSW (Table 6).SP discovered events information (SPDE): For each givensequential pattern a separate SPDE is created. SPDE holds theposition of all the events, which have been discovered so farfor that specic SP. Once all the events of a sequential patternare searched then SPDE is reset to its initial state with nosearched event (Table 7).

    5. DSPS_SM algorithm validation

    For a better understanding of the DSPS_SM concept and algorithmvalidation we run through the algorithm with the followingexample:

    Suppose we have a sequence database D over a time domainT and three given sequential patterns SPSET SP1; SP2; SP3 alongwith respective SDs (8,3,4). The problem to investigate is to nd all

    SP1 M ! A! B! M ! C;8SP2 fB! Q ! M;3g;SP3 fP ! Q ! A! B;4gIn the rst step, DSPS_SM populates SPInfo and CSE as per informationgiven in the above problem denition. Since we are at the start ofthe discovery process CSE will be populated with the rst event ofeach SP (Table 8, Table 9).

    Next, the algorithm populates the SWInfo in accordance with theuser-dened parameter of SD for each sequential pattern. For sim-plicity we assign ordered numeric values to events according todate/time and call this the position of the event during the exam-ple. Here, it is important to note that the number of events within aSW can vary, however the length of all the SWs of a specic SP willremain equal. Since SP2 has the shortest rst search window (SW1)with the last event position (LEP) being 07 (see Fig. 13); therefore,the LEP for rst SWs of all the other SPs will be set to 07 in SWInfo(Table 10). As the SWs expand during the search process, informa-tion about the rst search window will keep updating until itreaches the sequence duration (SD) length of that specic SP. Atthat point SWInfo will be populated with new search window infor-mation for that specic SP.

    Table 7SPDE for each given SP.

    (a) SP1 (b) SPi (c) SPn

    E1 Ei En E1 Ei En E1 Ei En

    Pos Pos Pos Pos Pos Pos Pos Pos Pos

    Time0 7 10

    SP 2(SW1,2)

    14 20 27 29

    0 3 5 8

    F. Anwar et al. / Expert Systems with Applications 39 (2012) 45114531 4521instances of multiple given sequential patterns in a data-sequence

    Table 3SPInfo.

    SPID SP SD SPSP SPEP

    Unique ID foreach SP

    SP eventlist

    Sequenceduration

    SP startposition

    SP endposition

    Table 4SWInfo.

    SW# SPID FEP LEP

    Search window numberfor specic SP

    SPID from SPInfodata structure

    First eventposition

    Last eventposition

    Table 5SPEInfo.

    Event1 Event2 Eventi Eventn

    Postion1 Postion1 Postion1 Postion1

    Postioni Postioni Postioni Postioni

    Postionn Postionn Postionn Postionn

    Table 6CSE.SPID Event

    SPID from SPInfo data structure Event to be searchDSPS_SM then populates SPEInfo by accessing the event position ofall the unique events in given sequential patterns during CSW. In

    Table 8SPInfo_1.

    SPID SP SD SPSP SPEP

    1 M? A? B?M? C 8 ? ?2 B? Q?M 3 ? ?3 P? Q? A? B 4 ? ?

    Table 9CSE_1.

    SPID Event

    1 M2 B3 P

    SP 3(SW1,2)

    SP 1(SW1,2)SDs for different SP

    Fig. 13. SD of different given SP.

  • the rst SW, SPEInfo is populated up to event position 07. For exam-ple, event M appears at 03 and event B in 04, 07 (Table 11 (rstrow)).

    Next, DSPS_SM populates the SPDE of three given sequential pat-terns with valid events by taking into account the denition ofvalid discovered event conditions (as discussed in Valid discov-ered event section). DSPS_SM discovers the rst event M of SP1at position 03, which fullls the valid discovered eventconditions. Although event A is discovered (position 01) inCSW, it does not satisfy the second condition of a valid discov-ered event (the event time should be greater than the last dis-covered ELHS). Since the second event of SP1 is not discovered,DSPS_SM stops searching for any other events due to the rst tem-poral/sequential property of SP (see SP temporal/sequentialproperty section) and SPDE for SP1 is populated as follows (Table12):

    The rst and second events B, Q of SP2 are discovered atpositions 04 and 05. Since both these detected events full the re-quired conditions of valid discovered events the algorithm updatesthe SPDE for SP2. Again although event M (position 03) is discov-

    row). Since it satises all the conditions of a valid discovered eventand it is the last event for the SP2, this means an instance of SP2 hasbeen found successfully (Table 18). Next, DSPS_SM passes the endposition of the discovered SP2 support, along with other user-dened parameters to Support discovery process for all CLAEASPand DSPS_SM restarts the search process for the next instance of SP2.

    DSPS_SM discovers events P and Q of SP3 at positions 09 and 10,which full the valid discovered event conditions. Although eventA is discovered in CSW, it does not satisfy the second condition ofa valid discovered event; hence, SPDE for SP3 is populated as follows

    Table 10SWInfo_1.

    SW# SPID FEP FLP

    1 1 0 71 2 0 71 3 0 7

    4522 F. Anwar et al. / Expert Systems with Aered within CSW, it does not satisfy the second condition of a validdiscovered event. Hence, SPDE for SP2 is populated as follows (Table13):

    The rst event of SP3 is not found in CSW, therefore SPDE for SP3remains at its initial state (Table 14).

    Table 11SPEInfo_1.

    M A B C Q P

    3 1 4 2 52 7 3

    10 9 8 8 10 99

    13 11 12 14 111213

    Table 12SPDE_SP1_1.Table 13SPDE_SP2_1.5.1. First search window expansion

    Before moving to the next search window the CSE is updatedwith the next event of each SP to be searched (Table 15). Next,DSPS_SM populates SWInfo and SPEInfo by only using the expandedportion of the new SW (Fig. 14), which has the following six events(Table 16 and Table 11 (second row)).

    The already discovered rst event of SP1 is at position 03, whichis within the CSW boundary for SP1 (010) positions, therefore it is avalid discovered event. As per the notion of (SP temporal/sequen-tial property 2) we do not have to check the discovered ERHS valid-ity, because they are valid automatically. Now the search processof all the remaining events is carried out in only the expanded por-tion of SPEInfo. Event A is discovered at position (09), which fullsthe valid discovered event conditions. Although event B is dis-covered in CSW, it does not satisfy the second condition of a validdiscovered event. Hence DSPS_SM terminates the search of anyremaining events due to (SP temporal/sequential Property 1) andSPDE for SP1 is updated as follows (Table 17).

    For SP2 the already discovered event B is on position 04 andCSW boundary for SP2 is 0310 positions, therefore, it is a validevent. According to the second temporal/sequential property ofSP, if a rst discovered event of a SP is valid, we do not have tocheck the ERHS discovered events validity, because it will be validautomatically. Now, support of the undiscovered third event ofSP2 M must be searched in the expanded portion of the SPEInfo.DSPS_SM nds support for M at position 10 (Table 11 second

    Table 15CSE_2.

    A(8) P(9) A(9) M(10) B(10) Q(10)

    SPID Event

    1 A2 M3 P

    Table 14SPDE_SP3_1.

    SP3

    P Q A B? ? ? ?

    pplications 39 (2012) 45114531(Table 19):

    5.2. Second search window expansion

    After updating the CSE, the DSPS_SM populates SWInfo and SPEInfoby only using the expanded portion of the new SW (Table 20,Table21 and Table 11 (third row)), which has the following sevenevents:

    The already rst discovered event of SP1 is at location 03, whichis within the CSW boundary for SP1 (014), therefore it is a valid dis-covered event. Moreover, we do not need to check the discovered

  • SP2

    th Applications 39 (2012) 45114531 4523Expended portion of the new SW

    F. Anwar et al. / Expert Systems wiERHS validity, because they are valid automatically (SP temporal/sequential property 2). Now the search process of all the remainingevents is carried out in only expanded portion of SPEInfo. Events B,M, C are discovered at positions (12,13,14) and all of them fullthe valid discovered event conditions. Since all the events of SP1

    0 7 10

    (SW2)

    14

    0 3 5

    SDs for different SP

    Fig. 14. Expanded portion

    Table 16SWInfo_2.

    SWID SPID FSP FEP

    1 1 0 102 2 3 101 3 0 10

    Table 17SPDE_SP1_2.

    Table 19SPDE_SP3_2.

    Table 20SWInfo_3.

    A(11) Q(11) Q(12) B(12) M(13) Q(13) C(14)

    SPID Event

    1 B2 B3 A

    Table 18SPDE_SP2_2.Time27

    8

    s of search windows.

    Table 21SWInfo_3.

    SW# SPID SP EPare discovered as shown in Table 22, DSPS_SM passes the end posi-tion of discovered SP1 instance along with other user-denedparameters to Support discovery process for all CLAEASP and DSPS_SMrestarts the search process for the next instance of SP1.

    The events B, Q,M of SP2 are discovered at positions 07, 11 and13. Since all three detections of events full the required condi-tions, this means the second instance of SP2 is discovered in thethird SW of SP2 (Table 23). DSPS_SM passes the end position of thediscovered SP2 instance along with other user-dened parametersto Support discovery process for all CLAEASP and DSPS_SM restarts thesearch process for the next instance of SP2.

    DSPS_SM discovers event A, B of SP3 at positions 11, 12, whichfull the valid discovered event conditions. Since event B is thelast event of SP3, it means the rst instance of SP3 is discoveredin the second SW of SP3 (Table 24). Next, DSPS_SM passes this

    1 1 0 143 2 7 142 3 3 14

    Table 22SPDE_SP1_3.

    Table 23SPDE_SP2_3.

    Table 24SPDE_SP3_3.

  • (the end time of the user-dened parameter of MTITVL). For eachsequential pattern instance found, there will be a specic ITIME.ITIME is always equal to or greater than the end time of thesequential pattern instance found in the data-sequence and lessor equal to the MTITVL_ET as shown in Fig. 15. Since each givensequential pattern has its own MTITVL and their SPET might be dif-ferent, therefore AEASP_DS may have to deal with different ITIMEsimultaneously.

    Just like search windows of different sequential patterns, theITIMEs of different SPs can also overlap each other. Due to this prop-erty of ITIME, we can utilise the support discovery of CLAEASP be-tween ITIMEs of different SPs. In Fig. 16, the ITIME of SP2 overlapsthe ITIME of SP1 SP3 and SP4. Therefore, events support counted in

    the discovered anomalous events against each sequential pattern.

    EASP

    event X is anomalous event for SPn between 10 am and 11 am every

    Table 25Updated SPInfo_1.

    SPID SP SD SPSP SPEP

    1 M? A? B?M? C 8 3 142 B? Q?M 3 4, 7 10, 133 P? Q? A? B 4 9 12

    SP1SP2

    Support

    SP 3Support

    4524 F. Anwar et al. / Expert Systems with Applications 39 (2012) 45114531Support

    SP1 ITIME SP 2 ITIME SP 3 ITIME

    Time

    Fig. 15. Different ITIME for each sequential pattern.

    SP1 ITIMESP3 ITIMESP4 ITIME SP2 ITIMEinformation to Support discovery process for all CLAEASP and re-starts the search process for the next instance of SP3.

    At this stage of DSPS_SM the values of SPEP can be seen in Table 25.

    5.3. Support discovery process for all CLAEASP (AEASP_DS)

    Once DSPS_SM nds the instance of a specic given SP, it passesthe end time/position of the discovered SP support to AEASP_DSalong with user-dened parameters of MTOL and MTITVL. The mainpurpose of AEASP_DS is to count the support of all the candidateevents given in CLAEASP during the limited time space (based onthe user-dened parameter of MTITVL). We call this time spacethe interested time (ITIME); which can be dened as follows:

    Interested time (ITIME) is a time interval between SPET(a searched sequential patterns instance end time) and MTITVL_ET

    Time

    Fig. 16. Overlapping ITIMES for different SPs.

    SP1st existence

    SP2nd existence

    Fig. 17. Overlapping ITday, however during rest of the day it is not an anomalous eventagainst SPn. To discover the periodicity of AEASP on the granularitylevel in which data sequences were divided, the proposed algo-rithm calculates the average support of CLAEASP events against eachsequential pattern during each set of respective intervals, forexample, 1st hour of all the 30 days, 2nd hour of all the 30 days,etc.

    6. Algorithm

    Fig. 18.

    6.1. AEASP algorithm evaluation

    The proposed algorithm to discover AEASP (see Fig. 18) for all thegiven sequential patterns was implemented in the Matlab environ-ment and experiments have been conducted to evaluate the algo-rithms accuracy and performance. The main inputs to thealgorithm are already detected events and known frequent se-quences of events (sequential patterns). These sequential patternscan be a perception of a domain expert or they can be already dis-

    Common ITIME for 1st and 2 nd existence of SP

    2 nd I TIME

    1st I TIMEThat is, to identify the events with average support less than theuser dened parameter of MTOL. The algorithm also outputs AEASPswhich are true for a specic time interval only and might not be anA over a whole time spectrum and vice versa. For example,ITIME of SP2 can also be used for SP1, SP3 and SP4.Furthermore, the ITIMES of the same SP can also overlap with

    each other, therefore support discovery of CLAEASP between ITIMEsof the same SPs can be reused as well (see Fig. 17). For example,if the rst ITIME is from 10 to 16 and the second ITIME is from 12to 17 the support of CLAEASP between 12 and 16 can be reused.

    Based on the AEASP discovered in all data sequences the pro-posed algorithm then calculates the average support (as a percent-age) of CLAEASP events against each sequential pattern and outputsTime

    IMES for same SP.

  • 12345678

    9

    1011

    12

    13141516

    17

    18

    19

    202122232425

    262728

    293031

    InputsDEVENTS //Already detected events datasetDSGR //Data sequence granularity/TP //Time PeriodSPSET //Set of know sequential pattern

    SD //Sequence duration of each SPMTITVL //Maximum time interval for each SP

    MTOL //Maximum ToleranceCLAEASP //Candidate list for AEASP Output Anomalous event against each given sequential pattern (AEASP)

    Data StructureSPInfo, SWInfo, SPEInfo, CSE, SPDE(s)AEASPInfo // store information about discovered AEASP in each DS

    Data filteringFDATA=dataFilter(DEVENTS,TP,DSGR);

    Loop // Loop till last data sequenceif (FDATA = // all data sequence has been processed.

    break;end;

    populateSWInfo(SWInfo, FDATAn) // Populate search windows

    Loop // Loop for all the given SP(s)

    // fetch values according to current search window (CSW)[SPInfo , SPEInfo, CSE]=fetchValues(SWInfo, FDATAn);

    Loop //check if already discovered event is valid in CSWresult=validEvent(SPDE(n)); if (result=Yes)

    // next event of SP to searchnextEvent2Search(SPInfo,CSE);

    else//populate CSE with 1st event of SPfirstEvent(SPInfo,CSE);

    end;endLoop;

    result=searchValidEvent(SPInfo,CSE);if (result=Yes)

    update(CSE,SPDE(n));323334

    353634353637

    38

    394041

    4243

    if (SPDE(n)=CSE) //if all the events of SP foundupdate(SPInfo);initialise(CSE, SPDE(n));//function to discovery AEASP for each SPAEASP_Fun(DEVENTS,AEASPInfo,SPInfo,MTITVL);

    break; // move to next SPend;

    break; // move to next SPend;

    endLoop; // move to next data sequence

    // conditional process if periodic AEASP is not required to be discoverif (CLAEASP(n) >MTOL)//function to remove the events having support > MTOL parameter//

    CLAEASP =removeInvalidEvents(AEASPInfo);end;

    endLoop; // all data sequence has been processed

    discoverAEASP(AEASPInfo) // discover overall AEASP against each SPdiscoverAEASP(AEASPInfo,TITVL) // discover AEASP for specific Time Interval

    Fig. 18. Algorithm for the discovery of AEASP for all the given SPs.

    F. Anwar et al. / Expert Systems with Applications 39 (2012) 45114531 4525

  • Table 26List of source sequential patterns.

    SP # Sequential pattern

    1 B?M? A? G2 F? B?M? A? G3 A? X? S? X4 B? H? Q? Y? Z? P? U5 P? Q? Y? Z? P? U6 P? Q? Y? Z? P7 Q? N? P? U8 E? G?M? Q8 P? U? Z? P

    2 05 05 0 02 06

    Table 28Evaluation of algorithm results on data sequence (25).

    SP # of SPinstancesdetectedby thealgorithm

    # of SPinstancescheckedmanually

    # of SPinstancesnotdetectedbyalgorithm

    AEASPdiscoveredby algorithmandmanuallyconrmed

    CLAEASP eventswhosexistence >MTOLand manuallyconrmed

    Data sequence 021 05(0) 05(0) 0 05 032 01(0) 01(0) 0 04 043 01(0) 01(0) 0 07 01

    Data sequence 031 03(0) 03(0) 0 04 042 01(0) 01(0) 0 07 013 03(0) 03(0) 0 07 014 01(0) 01(0) 0 07 015 03(0) 03(0) 0 05 036 04(0) 04(0) 0 01 077 04(1) 04(1) 0 07 018 02(0) 02(0) 0 06 029 06(0) 06(0) 0 04 0410 03(0) 03(0) 0 04 04

    Data sequence 041 03(0) 03(0) 0 04 04

    4526 F. Anwar et al. / Expert Systems with Applications 39 (2012) 451145313 07 07 0 06 024 02 02 0 06 025 02 02 0 05 0310 P? V? P

    Table 27Evaluation of algorithm results on rst data sequence.

    SP # of SPinstancesdetectedby thealgorithm

    # of SPinstancescheckedmanually

    # of SPinstancesnotdetectedbyalgorithm

    AEASPdiscoveredby algorithmandmanuallyconrmed

    CLAEASP eventswhoseexistence >MTOLand manuallyconrmed

    1 09 09 0 03 05covered patterns by using any of the sequential pattern discoverymethods presented in the literature or a combination of both.

    6.2. Experimental data

    We have used synthetic data to evaluate the accuracy and ef-ciency of the algorithm. The choice of using synthetic data is moti-vated by the fact that we can control the nature of the input data,which means we can ensure that input data actually contains thesequential patterns and AEASPs that need to be discovered by thealgorithm. Moreover, since the proposed algorithm mainly focuseson the discovery of anomalous events against sequential patternsand assumes already detected events and known sequential pat-terns as its input, the use of synthetic data instead of real worlddata does not impact on the evaluation of algorithms accuracyor efciency.

    6 03 03 0 05 037 07 07 0 04 048 05 05 0 04 049 02 02 0 07 0110 03 03 0 03 05

    Table 29AEASP for each sequential pattern.4 02(0) 02(0) 0 08 05 01(0) 01(0) 0 08 06 01(1) 01(1) 0 07 017 06(0) 06(0) 0 07 018 05(0) 05(0) 0 02 069 05(0) 05(0) 0 07 0110 07(0) 07(0) 0 03 056.3. Experiments and results

    We rst generated the synthetic data to replicate the alreadydetected events along with date/time information and designed10 different sequential patterns along with their parameters to

    2 01(0) 01(0) 0 05 033 03(0) 03(0) 0 07 014 00(0) 00(0) 0 0 05 00(0) 00(0) 0 0 06 01(0) 01(0) 0 07 017 02(0) 02(0) 0 08 08 01(0) 01(0) 0 05 039 04(0) 04(0) 0 05 0310 05(0) 05(0) 0 04 04

    Data sequence 051 03(0) 03(0) 0 03 052 01(0) 01(0) 0 06 023 04(0) 04(0) 0 06 024 02(0) 02(0) 0 05 035 03(0) 03(0) 0 05 036 03(0) 03(0) 0 04 047 04(0) 04(0) 0 05 038 05(0) 05(0) 0 05 039 06(0) 06(0) 0 06 0310 01(0) 01(0) 0 06 02

  • th Abe used in different experiments. Next, we manually altered thedata sequences to inject instances of different, already designedsequential patterns.

    In our rst experiment; the input to the algorithm are a syn-thetically generated events dataset; 10 designed sequential pat-terns (Table 26), user-dened parameter of CLAEASP, which isassigned 8 different events (Q, L, M, N, P, X, Z, A), and a user-denedparameter of maximum tolerance (MTOL) with 6% as its value. Thesynthetic data was then divided into ve different data sequences,each containing 300 random events along with the event time inseconds and intervals (minutes). We then applied the proposedalgorithm on the rst data sequence and conducted the followinginvestigation on the results:

    Manually check that instances of different sequential patternsdetected by the algorithm actually exist in the data sequence.

    Inspect through the data sequence to see if there are any in-stances of sequential patterns that are not detected by thealgorithm.

    Manually check that the discovered anomalous events againstspecic sequential pattern truly exist in the data sequence.

    Table 27 presents the results of the algorithm and onwardinvestigations. The rst column of Table 27 contains the sequentialpattern IDs. The second column gives the number of sequentialpattern instances detected by the algorithm. In the third columnthe number of each sequential pattern instance successfullychecked manually is given. The fourth column represents the num-ber of sequential pattern instances, which are not found by thealgorithm. Now, if the sum of columns 3 and 4 is equal to the valueof column 2 the algorithm has detected all the instances of the gi-ven sequential patterns with complete accuracy. In column ve,the number of anomalous events discovered by the algorithmand subsequently manually conrmed are given, whereas in thelast column of Table 27 the number of events (from CLAEASP) havinggreater presence then the user-dened parameter ofMTOL and sub-sequently manually conrmed are presented. The experiments re-

    Table 30Overall AEASP discovered from 40 data sequences.

    F. Anwar et al. / Expert Systems wisults show that the proposed algorithm successfully detected allthe instances of different sequential patterns and discovered allthe true AEASP from the CLAEASP.

    We repeated the above process on four more data sequences.Table 28 presents the results of each data sequence. In additionto the information given in Table 27, Table 28 also contains resultsof ying data sequences (results of ying data sequences are pre-sented in column 2 and 3 with brackets around it). The use of a y-ing data sequence enables the algorithm to detect sequentialpattern instances, which exist on the boundary of two data se-quences. The ying data sequence is generated by taking a portionof the current and previous data sequences (for detail see Datapre-processing section.

    Based on the results of all ve data sequences the proposedalgorithm then calculates the presence of all CLAEASP events againsteach sequential pattern and outputs the discovered anomalousevents against each sequential pattern. In Table 29, we have aver-Table 31AEASP discovered with hourly periodicity.

    pplications 39 (2012) 45114531 4527age support (in percentage) of CLAEASP events against each sequen-tial pattern. For example, event Q has 8.70% average supportagainst SP1, 22.22% against SP2 and 0.00% against SP3 etc. In Table29, cells with shading indicate that the event is an anomalous

  • 0th A20

    25

    30

    mul

    ous

    even

    ts

    quen

    tail

    patte

    rn

    P)

    SP1 SP2 SP3SP8 SP9 SP1

    4528 F. Anwar et al. / Expert Systems wievent against the specic sequential pattern, as their supportagainst the specic sequential pattern is less than or equal to theuser-dened parameter of TMOL (which is 6 in this experiment).For example, SP1 has no anomalous events and SP2 has one anom-alous event (P), etc.

    In our second experiment, we applied the proposed algorithmto discover not only overall anomalous events against each SP,but also to discover AEASP which are true for a specic time intervalonly and might not be an AEASP over a whole time interval and viceversa. We then analyse the AEASP discovered overall and at data se-

    0

    5

    10

    15

    1 2 3 4MT

    Num

    ber o

    f Ano

    agai

    nst e

    ach

    se (S

    Fig. 19. Number of AEASP agai

    0

    20

    40

    60

    80

    100

    120

    0 5000 10000 15000 20000

    Number o

    Exec

    uatio

    n Ti

    me

    in s

    econ

    ds

    Correlation Coefficient 0.999876

    Fig. 20. Linear growth in execution tim

    0

    5

    10

    15

    20

    25

    30

    35

    40

    0 10 20 3Sequence D

    Exec

    utio

    n Ti

    me

    in s

    econ

    ds

    Correlation Coefficient 0.394816

    Fig. 21. Execution time with diffeSP4 SP5 SP6 SP7

    pplications 39 (2012) 45114531quence interval level. The inputs to the algorithm are a syntheti-cally generated events dataset; 5 designed sequential patterns,user-dened CLAEASP which is assigned with eight different events(Q, L, M, N, P, X, Z, A) and a user dened parameter of maximum tol-erance (MTOL) with 5% as its value. The synthetic data was then di-vided into 40 different data sequences representing 8 hours foreach day (5 days); where each data sequence contains 300 randomevents.

    In Table 30, we have average support (in percentage) ofCLAEASP events against each sequential pattern, whereas in Table

    5 6 7 8 9 10

    ITVL values

    nst each increasing MTITVL.

    y = 0.0028x + 0.1111

    25000 30000 35000 40000 45000

    f events

    e against number of input events.

    y = 0.014x + 33.803

    0 40 50 60uration Value

    rent value for SD parameter.

  • 46

    ITVL

    tion Correlation Coefficient 0.62432

    ffere

    th A31 we have average support of CLAEASP events against eachsequential pattern calculated at data sequence interval level(hourly basis). We can see from Table 30 that there is onlyone overall AEASP discovered, Q, which is against SP5 (the cellwith shading). On the other hand, in Table 31 we can see thatthe algorithm has discovered a number of AEASP which are trueto specic data sequence intervals only, but they were not dis-covered over a whole time spectrum (Table 30). For example,in the rst hour, event Z is discovered as an anomalous eventagainst SP3 and events (Q, L, M, N, X, Z) are discovered AEASPagainst SP4. However, none of them were discovered as AEASPover the whole time spectrum. Moreover, it can also be seen thatalthough event Q is an anomalous event against SP5 over thewhole time spectrum, it is not an anomalous event against SP5during the 4th and 6th hour.

    The results of the different experiments illustrate that the num-ber of AEASPs discovered is strongly proportional to the MTITVL,which means if we increase the value of MTITVL the algorithm willdiscover fewer AEASP (see Fig. 19). This is expected because byincreasing the value of MTITVL we extend the time interval duringwhich an event can occur; hence, there is less chance of that spe-cic event being discovered as an anomalous event against a spe-cic sequential pattern.

    We used the Big O Notation approach to evaluate the perfor-mance of our proposed algorithm. To evaluate the performance ofthe algorithm we raised the number of input events (4000 in eachstep) and ran the algorithm 5 times (each time with different ran-domly regenerated data-sets), and then calculated the average exe-cution time in each step. The results, presented in Fig. 20, indicatethat our algorithmhasO(N) nature,whichmeans the algorithmexe-cution timewill grow linearly and is in direct proportion to thenum-ber of input events. We then evaluate the performance of thealgorithm by increasing the length of sequence duration (SD) andMT parameters in each step and calculated the execution time.

    0

    2

    0 10 20M

    Exec

    ua

    Fig. 22. Execution time with di8

    10

    12

    14

    Tim

    e in

    sec

    onds

    F. Anwar et al. / Expert Systems wiITVL

    As we can see from the results presented in Fig. 21 and Fig. 22, theproposed algorithm has O(1) nature against SD andMTITVL parame-ters, which means the algorithm executes in the same time regard-less of the size of the input parameters of SD andMTITVL.

    All the above experiments were conducted on a 2 GHz Pentiummachine with 1 GB RAM.

    7. Conclusions and future work

    Events occurring in observed scenes are one of the most impor-tant semantic entities that can be extracted from videos (Anwar &Naftel, 2008). Most of the work presented in the past is based uponnding frequent event patterns or deals with discovering alreadyknown abnormal events. In contrast, in this paper we presenteda framework to discover unknown anomalous events associatedwith a frequent sequence of events (AEASP); that is to discoverevents which are unlikely to follow a frequent sequence of events.This information can be very useful for discovering unknownabnormal events and can provide early actionable intelligence toredeploy resources to specic areas of view (such as PTZ camerasor attention of a CCTV user). Discovery of anomalous events againsta sequential pattern can also provide business intelligence for storemanagement in the retail sector. A comprehensive and exibleproblem denition framework is presented. This is followed by for-mulation of an efcient event mining framework to discover all theanomalous events against all the given sequential patterns (AEASP)in one database scan. The proposed event mining framework alsotakes the temporal aspect of AEASP into consideration, that is to dis-cover anomalous events which are true for a specic time intervalonly and might not be an AEASP over a whole time spectrum andvice versa. To confront the process/memory expensive process ofsearching all the instances of multiple sequential patterns in eachdata sequence a dynamic sequential pattern search mechanism(DSPS_SM) was also proposed. We then conducted different experi-ments to evaluate the proposed algorithms accuracy and perfor-mance. The experiment results show that the proposed algorithmdetected all the instances of different sequential patterns and dis-covered all the true AEASP successfully. We used the Big O Nota-tion approach to evaluate the performance of our proposedalgorithm. We raised the complexity of the algorithm by increasingthe number of input events in each step. The results show that theproposed algorithm has O(N) nature (algorithm execution timewill grow linearly in direct proportion to the number of inputevents). We then raised the complexity of the algorithm byincreasing the length of sequence duration (SD) and maximumintervalMTITVL parameters. The results of the experiments indicatethat the proposed algorithm has O(1) nature against the SD andMTITVL parameters (algorithm executes in the same time regardlessof the size of the input parameters of SD and MTITVL).

    The following provide some of directions, which could be inter-

    y = 0.0039x + 11.708

    30 40 50 60 Values

    nt value for MTITVL parameter.

    pplications 39 (2012) 45114531 4529esting to explore as an extension to the research work presented inthis paper.

    Although the proposed AEASP discovery algorithm takes tempo-ral aspects into consideration and can discover AEASP with peri-odicity, it only discovers the periodicity of AEASP on thegranularity level at which the data sequences were divided.The proposed AEASP discovery framework can be extended byproviding a exible mechanism in which the user can denethe required periodicity, irrespective of the temporal granular-ity at which data sequences are divided. For example, if thedetected events dataset is segmented on an hourly basis (datasequences), the user might be interested in different periodicitylevels such as daily, shift-wise, weekly, monthly, etc.In this paper we applied data mining techniques on alreadydetected events to extract the hidden information from them.We mainly concentrated on event level, as events are collection

  • s,ns

    x ete

    A

    A. n

    F

    C

    CSW

    CS

    LAI

    Huang, T., Koller, D., Malik, J., Ogasawara, G., Rao, B., Russell, S., et al. (1994).Automatic symbolic trafc scene analysis using belief networks. In Proceedings

    Jiawei, H., Jian, P., Behzad, M.-A., Qiming, C., Umeshwar, D., & Mei-Chun, H. (2000).FreeSpan: Frequent pattern-projected sequential pattern mining. In Proceedings

    a

    Ju ge

    discovery and data mining (PAKDD03), 2003 (pp. 222233).

    45 with AWInfo search window information data structurePEInfo sequential pattern events information data

    structureSE current search event data structurePDE sequential pattern discovered events

    information data structureEP last event positionEASP_DS support discovery process for all CLAEASPNSW next search windowPSW previous search windowSPInfo sequential pattern information data structureSPSP sequential pattern start positionSPEP sequential pattern end positionSSSW

    CSWTIMEcurrent search windowLHS_LDT

    F rst search windowSW(ET)

    E last discovered left hand side event time

    C end time of current search windowT

    CSW(ST) start time of current search window

    Ecurrent search windowcurrent event timeLHS

    ERHS right hand side eventsEvent

    E left hand side eventsLEN

    C current eventLEN

    SD sequence duration (SD) lengthSPS_SM

    DS data sequence lengthSD_ET

    D dynamic sequential patterns search mechanismSD_ST

    C

    current data sequence start timecurrent data sequence end timeDS_L

    PSD_ET

    ying data sequence lengthprevious data sequence end timeDS_SP SD_P

    FDS_EP(CSD_P)

    DS

    ying data sequence end position from CDS

    F (P ) y