2 Surveillance

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010 493

Survey on Contemporary Remote SurveillanceSystems for Public Safety

Tomi D. Raty

Abstract—Surveillance systems provide the capability of collect-ing authentic and purposeful information and forming appropriatedecisions to enhance safety. This paper reviews concisely the his-torical development and current state of the three different gener-ations of contemporary surveillance systems. Recently, in additionto the employment of the incessantly enlarging variety of sensors,the inclination has been to utilize more intelligence and situationawareness capabilities to assist the human surveillance personnel.The most recent generation is decomposed into multisensor envi-ronments, video and audio surveillance, wireless sensor networks,distributed intelligence and awareness, architecture and middle-ware, and the utilization of mobile robots. The prominent diffi-culties of the contemporary surveillance systems are highlighted.These challenging dilemmas are composed of the attainment ofreal-time distributed architecture, awareness and intelligence, ex-isting difficulties in video surveillance, the utilization of wirelessnetworks, the energy efficiency of remote sensors, the location dif-ficulties of surveillance personnel, and scalability difficulties. Thepaper is concluded with concise summary and the future of surveil-lance systems for public safety.

Index Terms—Distributed systems, human safety, surveillance,survey.

I. INTRODUCTION

SURVEILLANCE systems enable the remote surveillanceof widespread society for public safety and proprietary

integrity. This paper contains the revision of the backgroundand the three different generations of surveillance systems. Theemphasis of this paper is on the third-generation surveillancesystem (3GSS) and its current and significant difficulties. The3GSSs use multiple sensors. Domain-specific issues are omittedfrom this paper, despite being inherent to their own domain. Thefocus is on generic surveillance, which is applicable to publicsafety.

Surveillance systems are typically categorized into three dis-tinct generations of which the 3GSSs is the current genera-tion. The essential dilemmas of the 3GSSs are related to theattainment of real-time distributed architecture, awareness andintelligence, existing difficulties in video surveillance, the uti-lization of wireless networks, the energy efficiency of remotesensors, location difficulties of surveillance personnel, and scal-ability difficulties. These aspects repetitively occurred in the

Manuscript received August 4, 2009; revised November 16, 2009 and January28, 2010; accepted January 28, 2010. Date of publication March 1, 2010; dateof current version August 18, 2010. This paper was recommended by AssociateEditor L. Zhang.

The author is with the VTT Technical Research Centre of Finland, Oulu90571, Finland (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCC.2010.2042446

literature review. In public safety, real-time distributed archi-tecture is required to transmit sensor data immediately for de-duction. Awareness and intelligence is applied to address theautomatic deduction. Video surveillance is thoroughly used inpublic safety. The usage of wireless networks is growing in pub-lic safety and it is accompanied with energy efficiency. Surveil-lance personnel often patrol in surveyed areas and their preciselocation must be known to exploit their benefit to the fullest.As surveyed areas become constantly larger and more com-plex, scalability is a crucial issue in the surveillance of publicsafety.

Public safety and homeland security are substantial concernsfor governments worldwide, which must protect their peopleand the critical infrastructures that uphold them. Informationtechnology plays a significant role in such initiatives. It can assistin reducing risk and enabling effective responses to disasters ofnatural or human origin [1].

There is an increasing demand for security in society. Thisresults in a growing need for surveillance activities in manyenvironments. Recent events, including terrorist attacks, haveresulted in an increased demand for security in society. Thishas influenced governments to make personal and asset securitypriorities in their policies. Valera and Velastin [2] state that thedemand for remote surveillance relative to safety and securityhas received significant attention, especially in the public places,remote surveillance of human activities, surveillance in forensicapplications, and remote surveillance in military applications.The public can be perceived either as individuals or as a crowd.Valera and Velastin [2] indicate that a future challenge is todevelop a wide-area distributed multisensor surveillance system,which has robust, real-time computer algorithms, which areexecutable with minimal manual reconfiguration for differentapplications [2].

There is a growing interest in surveillance applications, be-cause of the availability of cheap sensors and processors atreasonable costs. There is also an emerging need from the pub-lic for improved safety and security in urban environments andthe significant utilization of resources in public infrastructure.This, with the growing maturity of algorithms and techniques,enables the application of technology in miscellaneous sectors,such as security, transportation, and the automotive industry. Theproblem of remote surveillance of unattended environments hasreceived particular attention in the past few years [3].

Intelligent remote monitoring systems allow users to sur-vey sites from significant distances. This is especially usefulwhen numerous sites require security surveillance simultane-ously. These systems use rapid and efficient corrective actions,which are executed immediately once a suspicious activity is de-

1094-6977/$26.00 © 2010 IEEE

494 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

tected. An alert system can be used to warn security personnelof impending difficulties and numerous sites can be simulta-neously monitored. This considerably reduces the load of thesecurity personnel [4].

A fundamental goal of surveillance systems is to acquire goodcoverage of the observed region with as few cameras as possibleto keep the costs for the installation and the maintenance of cam-eras, transmission channels, and complexity in scene calibrationreasonable [5].

In this paper, we first present the background and progressionof surveillance systems. This is followed by careful descrip-tions of the three generations of surveillance systems. Then wepresent the difficulties of contemporary surveillance systems,which compose of the attainment of real-time distributed archi-tecture, awareness and intelligence, existing difficulties in videosurveillance, the utilization of wireless networks, the energy ef-ficiency of remote sensors, location difficulties of surveillancepersonnel, and scalability difficulties. The paper is concludedwith a future prospect and a brief summary.

II. HISTORICAL SURVEILLANCE AND SURVEILLANCE SYSTEMS

The stone-age warrior used his eyes and ears from atop of amantle to survey his battle area and to distinguish targets againstwhich his could utilize his primitive weapons. Despite advance-ments in weaponry to catapults, swords, and shields, the eyesand ears of warriors were utilized for surveillance. The observa-tion balloon and the telegraph significantly improved range inboth visibility and information transmission, respectively, but inthe twentieth century, the improvements from the eyes and earstransformed surveillance into the concept “modern” [6].

Military operations have introduced the importance of thecombat surveillance problem. The location of target coordinatesand shifting own troops accordingly requires dynamic actionsaccompanied with decisions. Rapid, complete, and precise in-formation is needed to address this [7]. Information includedthe detection and approximate location of personnel, concentra-tions of troops, and the monitoring and storage of position dataover time and according to movements [8]. Surveillance infor-mation must be delivered to the correct commander when herequires it and the information must be presented in a meaning-ful form to address the problem of information processing [7].The data-collection problem is addressed by the entities, whichperform the surveillance, e.g., intelligence sources and humansurveillance, and transmit it to the command [7].

The fundamental intention of a surveillance system is to ac-quire information of an aspect in the real world. Military surveil-lance systems enhance the sensory capabilities of a militarycommander. Surveillance systems have evolved from simplevisual and verbal systems, but the purpose is still the same.Even the most primitive surveillance systems gathered informa-tion concerning reality and communicated it to the appropriateusers [9].

Generic surveillance is composed of three essential parts.These are data acquisition, information analysis, and on-fieldoperation. Any surveillance system requires means to monitorthe environment and collect data in the form of, e.g., video,

still images, or audio. Such data processed and analyzed by ahuman, a computer, or a combination of both at a commandcenter. An administrator can decide on performing an on-fieldoperation to put the environment back into a situation consideredas normal. On-field control operations are issued by on-fieldagents who require effective communication channels to upholda close interaction with the command center [10].

A surveillance system can be defined as a technological toolthat assists humans by offering an extended perception andreasoning capability about situations of interest that occur inthe monitored environments. Human perception and reasoningare restricted by the capabilities and limits of human sensesand mind to simultaneously collect, process, and store limitedamount of data [3].

To address this amount of information, aspects such as scala-bility and usability become very significant. This includes howinformation needs to be given to the right people at the right time.To tolerate this growing demand, research and development hasbeen subsequently executed in commercial and academic envi-ronments to discover improvements or new solutions in signalprocessing, communications, system engineering and computervision [2].

III. PROGRESSION OF SURVEILLANCE SYSTEMS

Over the past two decades, surveillance systems have been anarea of considerable research. Recently, plenty of research hasbeen concentrated on video-based surveillance systems, partic-ularly for public safety and transportation systems [11].

Data are collected by distributed sources and then they aretypically transmitted to some remote control center. The auto-matic capability to learn and adjust to altering scene conditionsand the learning of statistical models of normal event patternsare growing issues in surveillance systems. The learning sys-tem offers a mechanism to flag potentially anomalous eventsthrough the discovery of the normal patterns of activity andflagging the least probable ones. Two substantial restrictionsthat affect the deployment of these systems in the real worldcontain real-time performance and low cost. Multisensor sys-tems can capitalize from processing either the same type ordifferent type of information collected by sensors, e.g., videocameras, and microphones, of the same monitored area. Appro-priate processing techniques and new sensors offering real-timeinformation associated to different scene characteristics can as-sist both to improve the size of monitored environments and toenhance performances of alarm detection in regions monitoredby multiple sensors [3].

Security surveillance systems are becoming crucial in situa-tions in which personal safety could be compromised resultingfrom criminal activity. Video cameras are constantly being in-stalled for security reasons in prisons, banks, automatic tellermachines, petrol stations, and elevators, which are the most sus-ceptible for criminal activities. Usually, the video camera is con-nected to a recorder or to a display screen from which securitypersonnel constantly monitor suspicious activities. As securitypersonnel typically monitor multiple locations simultaneously,

RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY 495

this manual task is labor intensive and inefficient. Significantstress may be placed on the security personnel involved [4].

Another technological breakthrough substantial to the devel-opment of surveillance systems is the capability of remotelytransmitting and reproducing images and video information,e.g., TV broadcasting and the successive use of video sig-nal transmission and display in the close circuit TV systems(CCTV). CCTVs that provide data at acceptable quality dateback to the 1960s. The availability of CCTVs can be consideredas the beginning point that allowed online surveillance to befeasible, and 1960 can be considered the beginning date of thefirst generation surveillance systems [3].

Surveillance systems have developed in the three genera-tions [11]. The first generation of surveillance systems (1GSSs)used analogue equipment throughout the complete system [11].Analogue closed-circuit television cameras (CCTV) capturedthe observed scene and transmitted the video signals over ana-logue communication lines to the central back-end systems,which presented and archived the video data [11]. The mainchallenge in the 1GSS is that it uses analogue techniques forimage distribution and storage [2].

The second generation of surveillance systems (2GSSs) usesdigital back-end components [11]. They enable real-time au-tomated analysis of the incoming video data [11]. Automatedevent detection and alarms substantially improve the content ofsimultaneously monitored data and the quality of the surveil-lance system [11]. The difficulty in the 2GSS is that it doesnot support robust detection and tracking algorithms, which areneeded for behavioral analysis [2].

The 3GSSs have finalized the digital transformation. In thesesystems, the video signal is converted into the digital domain atthe cameras, which transmit the video data through a computernetwork, for instance a local area network. The back-end andtransmission systems of a third-generation surveillance systemhave also improved their functionality [11].

There are immediate needs for automated surveillance sys-tems in commercial, military applications, and law enforcement.Mounting video cameras is inexpensive, but locating availablehuman resources to survey the output is expensive. Despitethe usage of surveillance cameras in banks, stores, and park-ing lots, video data currently are used only retrospectively as aforensic tool, thus losing its primary benefit as an active real-time medium. What is required is a continuous 24-h monitoringof surveillance video to alert security officers of a burglary inprogress, or a suspicious individual lingering in a parking lot,while there still is time to prevent the criminal offence [12].

IV. FIRST-GENERATION SURVEILLANCE SYSTEMS

First video generation surveillance systems (1960–1980) con-siderably extend human perception capabilities in a spatialsense. The 1GSSs are based on analogue signal and image trans-mission and processing. In these systems, analogue video dataform a collection of cameras, which view remote scenes andpresent information to the human operators. The main disad-vantages of these systems concern the reasonably small atten-tion span of operators that may result in a significant miss rate

of the events of interest. From a communication perspective,these systems suffered from the main difficulties of analoguevideo communication, e.g., high-bandwidth requirements andpoor allocation flexibility [3].

The 1GSS utilizes analogue CCTV systems. The advantage isthat they provide good performance in some situations and thetechnology is mature. The utilization of analogue techniques forimage distribution and storing is inefficient. The current 1GSSsexamine the usage of digital information against analogue, re-view digital video recording, and CCTV video compression [2].

Computer vision is a significant artificial intelligence (AI)research area. From the 1970s to the 1990s, computer visionproved its practical value in a vast range of application domainsincluding medical diagnostics, automatic target recognition, andremote sensing [13].

V. SECOND-GENERATION SURVEILLANCE SYSTEMS

In this technological evolution, 2GSSs (1980–2000) corre-spond to the maturity phase of the analogue 1GSS. The 2GSSsbenefited from the early progression in digital video commu-nications, e.g., digital compression, robust transmission, band-width reduction, and processing methods, which assist the hu-man operator by prescreening important visual events [3].

Regarding the 2GSS, automated visual surveillance isachieved through the combination of computer vision technol-ogy and CCTV systems. The benefits of the second generationare that the surveillance efficiency of CCTV is enhanced. Thedifficulties lie within the robust detection and tracking algo-rithms needed for behavioral analysis. The current research of2GSS rests in real-time robust computer vision algorithms, au-tomatic learning of scene variability and patterns of behavior,and eliminating the differences between the statistical analysesof a scene and establishing natural language interpretations [2].

The 2GSS research addressed multiple areas with improvedresults in real-time analysis and separation of 2-D image se-quences, identification, and tracking of multiple objects in com-plex scenes, human behavior comprehension, and multisensordata fusion. The 2GSS also improved intelligent man–machineinterfaces, performance evaluation of video processing algo-rithms, wireless and wired broadband access networks, signalprocessing for video compression, and multimedia transmissionfor video-based surveillance systems [3].

The majority of research efforts during the period of the2GSSs have been used in the development of automated real-time event detection techniques for video surveillance. Theavailability of automated methods would significantly ease themonitoring of large sites with multiple cameras as the automatedevent detection enables prefiltering and the presentation of themain events [3].

VI. THIRD-GENERATION SURVEILLANCE SYSTEMS

The 3GSSs handle a large number of cameras, a geographicalspread of resources, and many monitoring points. From an imageprocessing view, they are based on the distribution of processingcapacities over the network and the use of embedded signal-


Fig. 1. Illustration of a typical processing flow in video surveillance systems[2].

processing devices to achieve the benefits of scalability andpotential robustness offered by distributed systems [14].

In the 3GSS, the technology revolves around wide-areasurveillance systems. This results in the advantages of the col-lection of more accurate information by combing different typesof sensors and in the distribution of the information. The diffi-culties are in the efficient integration and communication of in-formation, establishment of design methodologies, and movingand multisensor platforms. The current research of 3GSSs con-centrate on distributed and centralized intelligence, data fusion,probabilistic reasoning frameworks, and multicamera surveil-lance techniques [2].

The fundamental goals that are expected of a third-generationvision surveillance application, based on end-user requirements,are to offer good scene comprehension, surveillance informa-tion at real-time in a multisensor environment, and the use oflow-cost standard components. Fig. 1 presents a typical process-ing flow of video surveillance systems. It composes of objectdetection, object recognition, tracking, behavior and activitiesanalysis, and a database [2].

Once the object is detected, the object recognition task usesmodel-based techniques in recognition and tracking. This isfollowed with the behavior and activities analysis of the trackedobjects. The database addresses the storage and retrieval [2].

Research on distributed real-time video processing tech-niques in intelligent, open, and dedicated networks is anticipatedto offer interesting results. This is largely due to the availabil-ity of enhanced computational power at reasonable expenses,advanced video processing and comprehension methods, andmultisensor data fusion [3].

The main objective of the fully digital 3GSSs is to easeefficient data communication, management, and extraction ofevents in real-time video from a large collection of sensors. Toachieve this goal, improvements in automatic recognition func-tionalities and digital multiuser communications are required.Technologies, which satisfy the requirements of the recognitionalgorithms, contain computational speed, memory utilization,remote data access, and multiuser communications between dis-tributed processors. The availability of this technology signifi-cantly eases the 3GSS development and deployment [3].

The main application areas for the 3GSSs are in the regionof public monitoring. This is required by the rapid growth ofmetropolitan localities and by the increasing need to offer en-hanced safety and security to the general public. Other factorsthat drive the deployment of these systems include efficient re-source management and rapid emergency assistance [3].

The essential limitation in the efficiency of CCTV surveil-lance systems is the cost of offering adequate human monitoringcoverage for what is a considerably boring task. Additionally,

Fig. 2. Example of combining the data of multiple sensors in different events:(a) walking, (b) running, (c) talking, (d) knocking on a door, and (e) shouting[17].

CCTV is generally used as a reactive tool. If a problem hap-pens which is not noticed, then it will proceed without anyresponse [15].

The notable aspects of the 3GSSs are decomposed into thetopics of the following subchapters. They consist of multisensorsenvironments, video surveillance, audio surveillance, wirelesssensor networks, distributed intelligence and awareness, archi-tecture and middleware, and the utilization of mobile robots.

A. Multiple Sensor-Enabled Environments

Spatially distributed multisensor environments offer interest-ing possibilities and challenges to surveillance. Recently, therehave been studies on data fusion techniques to tolerate infor-mation sharing that results from different types of sensors [2].The communication aspects within separate parts of the sys-tem play a crucial role, with particular challenges either due tobandwidth constraints or due to the asymmetric characteristicsof communication [2]. Rasheed et al. exploit the utilization ofdata fusion over multiple modalities, including radar informa-tion, automatic identification systems (AIS), and global positionsystem (GPS) receivers [16].

Fig. 2 illustrates the reflection of two discrete sensors, thevideo sensor and the audio sensor, which can be fused to en-hance information. The sequences compose of walking, running,talking, knocking, and shouting events. The recorded audio issegmented into audio frames of 50 ms. Each sequence wasrecorded over a time of 8 s [17].


Fig. 3. Generic view of networked cameras [18].

Junejo et al. state that a single camera is not sufficient inmonitoring a large area. To address this problem, a networkof cameras is established. Junejo et al. utilize an automaticallyconfigurable network of nonoverlapping cameras to attain suf-ficient monitoring capabilities on large areas of interest. Fig. 3illustrates the principles of a network of cameras. In this ex-ample, each camera is mounted on to a moving platform whiledetecting and tracking objects [18].

B. Video Surveillance

Video surveillance has become an omnipresent aspect ofthe modern urban landscape, situated in a vast variety of en-vironments including shopping malls, railway stations, hospi-tals, government buildings, and commercial premises. In somecases, surveillance performs persuasion, discouraging unaccept-able behavior that can no longer be performed anonymously,recording and logging events for evidential reasons, or offeringremote observation of sensitive locations where access controlis crucial [19].

Intelligent visual surveillance systems address the real-timemonitoring of persistent and transient objects within a specificenvironment. The primary goals of these systems are to offeran automatic interpretation of scenes, and to understand andpredict the actions and interactions of the observed objects. Theunderstanding and prediction are based on the information col-lected by sensors. The basic stages of processing in an intelligentvisual surveillance system are moving object definition, recog-nition, tracking, behavioral analysis, and retrieval. These stagescontain the topics of machine vision, pattern analysis, artificialintelligence and data management [2].

As an active research topic in computer vision, visual surveil-lance in dynamic scenes attempt to detect, recognize, and trackcertain objects from image sequences. In addition, it is impor-tant to comprehend and depict object behaviors. The aim is todevelop an intelligent visual surveillance to replace the tradi-tional passive video surveillance that is proving to be inefficientas the amount of cameras exceeds the capability of human op-erator surveillance. Shortly, the goal of visual surveillance isnot only to place cameras in the place of human eyes, but also

Fig. 4. Illustration of tracking with an occluding object passing across theview of the camera [22].

to achieve the exhaustive surveillance task as automatically aspossible [20].

Intelligent cameras execute a statically defined collection oflow-level image-processing operations on the captured frames toenhance the video compression and intelligent host efficiency.Changing or reconfiguring the video processing and analysisduring the operation of a surveillance system is difficult [11].

The difficulty of tracking an individual maneuvering in acluttered environment is a well-studied region. Usually, the ob-jective is to predict the state of an object based on a set of noisyand unclear measurements. There is a vast range of applicationsin which the target-tracking problem is presented, including ve-hicle collision warning and avoidance, mobile robotics, speakerlocalization, people and animal tracking, and tracking a militarytarget [21].

Fig. 4 illustrates a tracking sequence with an individual pass-ing through the view of a camera causing an occlusion. Theoutput of the background subtraction method for each frameis a binary image, which is composed of foreground region.When an occlusion occurs, multiple objects may merge intothe same area. This requires an object model that can addresssplit-and-merge cases. Each pixel in the foreground indicates anobject label according to which the product of color and spatialprobability is the highest [22].

However, watchful the operators, manual monitoring suffersfrom information overload, which results in periods of operatorinattention due to weariness, distractions, and interruptions. Inpractice, it is unavoidable that a significant amount of the videochannels is not usually monitored, and potentially importantevents are overlooked. Additionally, weariness grows signifi-cantly as the amount of cameras in the system increases. Theautomation of all or part of this process would obviously offerdramatic benefits, ranging from a capability to alert an operatorof potential event of interest, to a completely automatic detectionand analysis system. However, the dependability of automateddetection systems is an essential issue, because frequent falsealarms introduce skepticism in the operators, who quickly learnto disregard the system [19].

It is desirable that visual surveillance systems can understandthe activity of the scene it is detecting and tracking. Ideally,this would be done in a manner, which is consistent with thatof a human observer. The task of automating the interpretationof the video data is a detailed one and can depend on a vastrange of factors, including location, context, time, and date.


This information indicates where objects are and what they maybe doing as they are observed, and attempts to characterize usualbehavior [19].

Research interests have shifted from ordinary static image-based analysis to video-based dynamic monitoring and anal-ysis. Researchers have advanced in addressing illumination,color, background, and perspective static aspects. They haveadvanced in tracking and analyzing shapes related to movinghuman bodies and moving cameras. They have improved activ-ity analysis and control of multicamera systems. The researchof Trivedi et al. [13] addresses a distributed collection of cam-eras, which provide wide-area monitoring and scene analysison several levels of abstraction. Installing multiple sensors in-troduces new design aspects and challenges. Handoff schemesare needed to pass tracked objects between sensors and clus-ters, methods are required to specify the best view given in thescene’s context, and sensor-fusion algorithms capitalize a givensensor’s strengths [13].

Modern visual surveillance systems deploy multicamera clus-ters operating at real-time with embedded adaptive algorithms.These advanced systems need to be operational constantly, andto robustly and reliably detect events of interest in difficultweather conditions. This includes adjusting to natural and ar-tificial changes in the illumination, and withstanding hardwareand software system failures [23].

Generally, the initial step for automatic video surveillance isadaptive background subtraction to extract foreground regionsfrom the incoming frames. Object tracking is then executed onthe foreground regions. In this case, tracking isolated objects isrelatively easy. When multiple tracked objects are placed intogroups with miscellaneous complexities of occlusion, trackingeach individual object through crowds becomes a challengingtask. First, when objects merge into a group, the visual character-istics for each object become unclear and obscure. The objectsdistant from the camera can be partially or completely occludedby the surrounding objects. Second, the poses and scales of thetarget objects may severely change when they are in crowds.Third, the motion speed and the direction of the target objectsmay essentially change during occlusion [24].

Basically, the approach of the detection of moving objects isthrough background subtraction that contains the model of thebackground and the detection of moving objects from those thatdiffer from such a model. In comparison to other approaches,such as optical flow, this approach is computationally afford-able for real-time applications. The main dilemma is its sen-sitivity to dynamic scene challenges and the subsequent needfor background model adaptation through background mainte-nance. This type of a problem is known to be essential anddemanding [25].

Fig. 5 illustrates a collection of images from a parking lotand the background subtraction output of these images. Objectdetection is achieved by constructing a representation of thescene, which is called a background model, and then locatingthe differences from the model against each incoming frame.The higher image sequence illustrates the complete scene andthe lower image sequence represents the resulting backgroundsubtraction output [22].

Fig. 5. Illustration of images and the output of background subtraction [22].

Fig. 6. Example of tracklet tracking [26].

Li et al. state that the aim of multitarget tracking is to inferthe target trajectories from image observations in a video. Thisposes a significant challenge in crowded environments wherethere are frequent occlusions and multiple targets have a similarappearance and intersecting trajectories. Data association-basedtracking (DAT) associates links to short track fragments, i.e.,tracklets, or detection responses into trajectories based on simi-larity in position, size, and appearance. This enables multitargettracking from a single camera by progressively associating de-tection responses into longer track fragments, i.e., tracklets, toresolve target trajectories. Fig. 6 presents an image of tracklettracking [26].

Human motion tracking which is based on the input fromred–green–blue (RGB) cameras can produce results in indoorscenes with consistent illumination and steady background [27].Outdoor scenes with significant background clutter results fromillumination changes are a challenge for conventional charged-couple device (CCD) cameras [27]. There have been contri-butions on pedestrian localization and tracking in visible andinfrared videos [28]. Fig. 7 presents a thermal image and a colorimage of the same scene [28].

A significant problem encountered in numerous surveillancesystems are the changes in ambient light, particularly in an out-


Fig. 7. (Left) Thermal image and (right) color image of a scene [28].

door environment, where the lighting conditions varies. Thisrenders the conventional digital color image analysis very dif-ficult. Thermography, or thermal visualization, is a type of in-frared visualization. Thermal cameras have been utilized forimaging objects in the dark. These cameras use infrared (IR)sensors that capture IR radiation of different objects in the en-vironment and forms IR images [29].

C. Audio Surveillance

The creativeness of the research of Istrate et al. [30] is to usesound as an informative source simultaneously with other sen-sors. Istrate et al. [30] suggest extracting and classifying normallife sounds, such as a door banging, glass shattering, and objectsfalling, with the intention of identifying serious accidents, forinstance as falling or somebody fainting. The approach of Is-trate et al. [30] comprises the replacement of the video camerawith a multichannel sound acquisition system, which analyzesthe sound range of the location at real-time and specifies situ-ations of emergency. Only the previously detected sound eventis transmitted to the alarm monitor, if it is considered to be apossible alarm. To reduce the computation time required for amultichannel real-time system, the sound extraction process hasbeen split into detection and classification. Sound event detec-tion is a complicated task, because the audio signals occur in anoisy environment [30].

Accurate and robust localization and tracking of acousticsources is of interest to a variety of applications in surveillance,multimedia, and hearing enhancement. The miniaturization ofmicrophone arrays combined with acoustic processing furtherenhances the advantages of these systems, but poses challengesto achieve precise localization performance due to decreasingaperture. For surveillance, acoustic emissions from ground ve-hicles offer an easily detected signature, which can be used forunobtrusive and passive tracking. This results in a higher lo-calization performance in distributed sensing environments. Itexceeds the requirement for excessive data transfer and fine-grain time synchronization among nodes, with low communi-cation bandwidth and low complexity. Additional improvementcan also be achieved through the fusion of other data modali-ties, such as video. Traditionally, large sensor arrays are usedfor source localization to guarantee adequate spatial diversityover sensors to resolve time delays between source observa-tions. The precision of delay-based bearing estimation degradeswith decreasing dimensions (aperture) of the sensor array [31].

Fig. 8. Example of a microphone array for measuring the bearing angle [32].

Sound localization using compact sensor nodes deployed innetworks has applications in surveillance, security, and law en-forcement. Numerous groups have reported noncoherent andcoherent methods for sound localization, detection, classifica-tion, and tracking in sensor networks. Coherent methods arebased on the arrival time differences of the acoustic signal to thesensors. In standard systems, microphones are separated to max-imize precision. The need of synchronization requires frequentcommunication that is expensive in terms of power consump-tion. The nodes must achieve synchronization to produce a validestimate [32].

Fig. 8 presents an example of sound location. An array ofmicrophones (M1, M2, M3, and M4) pairwise separated by adistance (d) is considered. The angle of the source of soundis presented against the coordinate axis. The bearing of themicrophone pair M1 and M3 is given as the beta angle. Thebearing of the microphone pair M2 and M4 is presented as thealpha angle [32].

Considering the nature of an event that is desirable to de-tect, the content of information created is more than just visualinformation. Many of the significant events from a monitoringpoint of view are accompanied with audio information, whichwould be useful to examine. The significance of these events isprovided by their semantic information and their temporal con-text. A monitoring system that must distinguish between a dooropening and glass breaking, should be expected to identify oneand not the other at a given time and location. By expanding therange of information available to the system, the precision of theoperation can be improved. The purpose of an audio sensor net-work would be to assist the end user to search through data andreturn the points of interest. This would not be done by adding anoverwhelming amount additional data, but by drawing attentionto the data already collected, but difficult to locate [33].

The sound analysis system has been separated into three mod-ules as illustrated in Fig. 9. The first module is applied to everychannel to detect sound events and to extract them from thesignal flow. The source of speech or sound can be localizedby comparing the predicted SNR for every channel. The fusionmodule chooses the premium channel if multiple events aredetected simultaneously. The third module receives the soundevent extracted by the previous module, and it predicts the mostprobable sound class [30].


Fig. 9. Analysis of sound [30].

D. Wireless Sensor Networks

Wireless devices, such as wireless-enabled laptops and palmpilots, have progressed into an integral part of daily lives. A wire-less network can be considered to be a sensor network, wherethe network nodes function as sensors. They sense changes inthe environment according to the movement of objects or hu-mans. A possible additional functionality could be the indoorsurveillance of corporate buildings and private houses [34].

Wireless sensor networks represent a new type of ad hoc net-works, which integrate sensing, processing, and wireless com-munication in a distributed system [35]. Sensor networks area growing technology that promises a novel ability to monitorand equip the physical world [36]. In a sensing-covered net-work, each point in a geographic area of interest needs to bewithin the sensing range of at least one sensor [35]. Sensor net-works comprise a significant amount of inexpensive wirelessdevices (nodes) that are densely distributed over the region ofinterest [36]. They are usually battery powered with restrictedcomputation and communication abilities [36]. Every node isequipped with different of sensing modalities, such as acoustic,infrared, and seismic [36].

Wireless sensor networks have the potential to improve theability to develop user-centric applications to monitor and pre-vent harmful events. The availability of inexpensive low-powersensors, radios, and embedded processors enables the deploy-ment of distributed sensor networks to offer information to usersin distinct environments and to provide them control over unde-sirable situations. Networked sensors can collaborate to processand make deductions from the collected data and provide theuser with access to continuous or selective observations of the

environment. In most situations these devices must be small insize, require low power, lightweight, and unobtrusive [37].

In addition to the new applications, wireless sensor networksoffer an alternative to several existing technologies. The wiringcosts restrict complicated environment controls and the recon-figurability of these systems. In many cases, the savings inthe wiring costs alone justify the use of the wireless sensornodes [38].

A basic issue that arises naturally in sensor networks is cov-erage. Due to the significant variety of sensors and their appli-cations, sensor coverage is subject to a vast sphere of interpreta-tions. Generally, coverage can be considered as a measure of thequality of service of a sensor network. Coverage formulationscan attempt to locate the weak points in a sensor field and sug-gest future deployment or reconfiguration schemes to enhancethe total quality of service [38].

In the previous years, wireless networks, such as IEEE802.11 a, b, and g wireless local-area networks (WLANs), havebecome plentiful and their popularity is only increasing. In thenear future, wireless networks will become omnipresent, andthey will supply high-speed communication capabilities almostanywhere. An immediate question is whether it is possible to uti-lize the wireless network infrastructure to implement other func-tionalities in addition to communication. WLANs have beenused for positioning mobile terminals and tracking their move-ments. If the communication infrastructure could be utilized forsecurity purposes, the deployment of the additional infrastruc-ture could be avoided or reduced, resulting in a considerablymore cost-effective solution [34].

Fig. 10 illustrates the basic functionality of a store-and-forward wireless sensor network (WSN) in which video infor-mation is obtained with cameras and transmitted forward. TheWSN composes of shared-medium cameras, store-and-forwardcameras, distributed servers, routing nodes, wireless camerasand base stations, and a control room. The cameras distributetheir information through the nodes and the distributed serverto the control room [39].

E. Distributed Intelligence and Awareness

The 3GSSs use distributed intelligence functionality. An im-portant design issue is to determine the granularity at whichthe tasks can be distributed based on available computationalresources, network bandwidth, and task requirements. The dis-tribution of intelligence can be achieved by the dynamic partitionof all the logical processing tasks, including event recognitionand communications. The dynamic task allocation dilemma isstudied through the usage of a computational complexity modelfor representation and communication tasks [3].

A surveillance task can be separated into four phases, whichare 1) event detection, 2) event representation, 3) event recog-nition, and 4) event query. The detection phase addresses mul-tisource spatiotemporal data fusion for efficient and reliable ex-traction of motion trajectories from videos. The representationphase revises raw trajectory data to construct hierarchical, in-variant, and adequate representations of the motion events. Therecognition phase handles event recognition and classification.


Fig. 10. Wireless sensor network accompanied with distributed locationservers [39].

The query component indexes and retrieves videos that matchsome query criteria [40].

The key to security is situation awareness. Awareness re-quires information, which spans across multiple scales of timeand space. A security analyst must keep track of “who are thepeople and vehicles in a space” (identity tracking), “where arethe people in a space” (location tracking), and “what are the peo-ple/vehicles/objects in a space doing” (activity tracking). Theanalyst must use historical content to interpret this data. Smartvideo surveillance systems are capable of enhancing situationalawareness over multiple scales of time and space. Currently, thecomponent technologies are evolving in isolation. For instance,face recognition technology handles the identity-tracking chal-lenge, while restricting the subject to be in front of the camera,and intelligent video surveillance technologies offer activitydetection capabilities to video streams while disregarding theidentity tracking challenge. To offer comprehensive, nonintru-sive situation awareness, it is crucial to address the challenge ofmultiscale, spatiotemporal tracking [41].

Bandini and Sartori [42] present a monitoring and controlsystem (MCS). An MCS attempts to support humans in deci-sion making regarding problems, which can occur in criticaldomains. It can be characterized based on its functionalities1) to gather data of the monitored situation, 2) to evaluate if thedata concern an anomalous situation, and 3) in case of anoma-lous situations, to perform the proper actions, e.g., to remedythe problems [42].

An action is typically the creation of an alarm to notify hu-mans about the problem. MCSs should be intelligent. For thisreason, MCSs have been traditionally developed by using arti-ficial intelligence (AI) technologies, such as neural networks,data mining, and knowledge-based systems [42].

Fig. 11. Generic architecture of a monitoring/control pervasive system [42].

Fig. 11 presents an illustration of an MCS, which is structuredinto three logical levels, which are 1) observation, 2) interpre-tation, and 3) actuation. In observation, the state of a monitoredfield is periodically captured by a specified monitoring agency(MA). This is usually a set of sensors. In interpretation, thevalues detected by sensors are evaluated by a specified inter-pretation agency (IA). In actuation, the specified actions areexecuted by a specific actuation agency (AA), depending on theinterpretational results [42].

F. Architecture and Middleware

The field of automated video surveillance is quite novel andthe majority of contemporary approaches are engineered in anad hoc manner. Recently, researchers have begun to considerarchitectures for video surveillance. Middleware that providesgeneral support to video surveillance architectures is the logi-cal next step. It should be noted that while video surveillancenetworks are a class of sensor networks, the engineering chal-lenges are quite different. A large quantity of data flows througha surveillance network. Especially, the requirement for extremeeconomizing in use of power and network bandwidth, which isa dominating factor in most sensor networks, is excluded frommost surveillance networks [43].

Fig. 12 illustrates a simple architecture for information fusion.The nodes scan the environment periodically and transmit asignal. The received signal is first processed by a preprocessorto extract significant characteristics from the environment. Thepreprocessors are responsible for quantifying how much theenvironment is different from the steady state. The informationfusion function then deducts if there is an intruder present ornot [34].

Due to the availability of more advanced and powerful com-munications, sensors, and processing units, the architecturalchoice in the 3GSSs can potentially become extremely vari-able and flexibly customized to acquire a desired performancelevel. The system architecture represents a key factor. For in-stance, different levels of distributed intelligence can result inpreattentive detection methods either closer to the sensors ordeployed at different levels in a computational processing hier-archy. Another source of variability results from the usage ofheterogeneous networks, either wireless or wired, and transmis-sion modalities both in means of source and channel coding andin means of multiuser access techniques. Temporal and spatial


Fig. 12. Simple example of a basic architecture [34].

coding scalability can be extremely productive for reducing thequantity of information to be transmitted by every camera de-pending on the intelligence level of the camera itself. Multipleaccess techniques are a fundamental tool to allow a significantamount of sensors to share a communication channel in the mostefficient and robust way [3].

Surveillance network management techniques are required inthe 3GSSs to coordinate distributed intelligence modules to ac-quire optimal performances and to adjust the system behavioraccording to the variety of conditions occurring either in a sceneor in the parameters of a system. All of these tools are crucial todesign efficient systems. Finally, a further evolution is the inte-gration among surveillance networks based on different types ofsensor information, such as audio or visual, but oriented accord-ing to completely different functionalities, e.g., face detection,and different types of sensors, e.g., standard cameras [3].

G. Utilization of Mobile Robots

Seals defines a robot to be an automatic machine with a certaindegree of autonomy, which is designed for active interactionwith the environment. It integrates different systems for theperception of the environment, decision making, and formationand execution of plans. In addition to these characteristics, amobile robot must produce a transitable path and then followthis path [44].

The extremely hostile environments imposed by combat,space, and deep ocean environments created the need for prac-tical autonomous vehicles for military applications, space, andocean exploration. Several efforts have formed the foundationfor autonomous vehicle development, such as Shaky, Jason, andthe Stanford Cart. These first generation autonomous vehicleswere used to explore fundamental issues in vision, planning,and robot control [45].

These systems were strictly hampered by primitive sensingand computing hardware. Efforts in the 1980s created the secondgeneration of autonomous vehicle testbeds. This era includes thedevelopments of autonomous land vehicle (ALV) and the UnitedStates Marine Corps (USMC) ground surveillance robot (GSR).The GSR was an autonomous vehicle, which transited fromone known geographic location to another known geographiclocation across a completely unknown terrain [45].

In detail, the GSR was an experimental M114 personnel car-rier, which had been modified for computer control. It had sen-sors and computer control for vision, navigation, and proxim-ity aspects. The vision subsystem was mounted on a transportplatform. The proximity sensor subsystem used acoustic rang-ing sensors to provide short range obstacle position and targettracking information. The proximity sensor subsystem fused theinformation from the sensors into consistent target and obstacleposition and velocity vectors. In target tracking, vision esti-mates of target bearing could be fused with proximity estimatesto enhance the knowledge of target angular position and motionaccurate vehicle response [46].

SURBOT was another notable mobile surveillance robot de-veloped in 1985. SURBOT was developed by Remote Tech-nology Corporation (REMOTEC) to execute visual, sound, andradiation surveillance within rooms specified as radiologicallyhazardous at nuclear power plants. The results verified that SUR-BOT could be used for remote surveillance in 54 separate con-trolled radiation rooms at the plant [47].

Currently, the development of a completely automatedsurveillance system based on mobile multifunctional robots is anactive research area. Mobility and multifunctionality are gener-ically adopted to reduce the amount of sensors required to covera given region. Mobile robots can be organized in teams, whichresults in intelligent distributed surveillance over considerableareas. Several worldwide projects attempt to develop completelyor semiautonomous mobile security systems. There are a fewsecurity robot guards commercially available, e.g., CyberGuard,RoboGuard, and Security Patrolbot [48].

Recent progression in automation technologies, combinedwith research in machine vision and robot control, should inthe near future allow industrial robots to adapt to unexpectedvariations in their environments. Such autonomous systems aredependent on real-time sensor feedback to reliably and pre-cisely detect, recognize, and continuously track objects withinthe robot’s workspace, especially for applications such as on-the-fly object interception [49].

Traditionally, the amount of different sensors mounted onthe robot, the amount of tasks related to navigation, explo-ration, monitoring, and detection operations present the designof the overall control system challenging. In recent years, therehas been research in issues, such as autonomous navigation inindoor and outdoor environments and outdoor rough terrains,visual recognition, sensor fusion and modulation, and sensorscheduling. An essential part of the research has concentratedon behavior-based approaches in which complexity is reducedwith computationally simple algorithms that process sensor in-formation at real-time with high-level inference strategies [48].

The inclusion of distributed artificial intelligence has intro-duced the development of new technologies in detection (sen-sors and captors), robotics (actuators), and data communication.These technologies enable surveillance systems to detect a widerfrequency range, to cover a wider sensor area, and to decide thecharacter of a particular situation [50].

Researchers in robotics have debated the surveillance issue.Robots and cameras installed can identify obstacles or humansin the environment. The systems guide robots around these ob-


Fig. 13. Platform model of iBot [52].

stacles. These systems typically extract purposeful informationfrom massive visual data, which requires substantial computa-tion or manpower [51].

A security guard system, which uses autonomous mobileguard robots, can be used in buildings. The guard can be awheel-type autonomous robot that moves on a planned path.The robot is always on alert for anything unusual, from movingobjects to leaking water. The robot is equipped with cameras.While the robot is patrolling, it transmits images back to themonitoring station. After the robot finishes patrolling, it can au-tomatically return to and dock in a battery recharging station.These security robot systems can improve the security of homesand offices [52].

A basic need in security is the ability to automatically verifyan intruder, to alert remote guards, and to allow them to monitorthe intruder when an intruder enters a secure or prohibited area.To assure both mobility and automaticity, the camera is embed-ded onto a teleoperated robot. Mobility and teleoperation, inwhich security guards can remotely instruct a mobile robot totrack and identify a potential intruder, are more attractive thanconventional immovable security systems [52].

An example of mobile robots is “iBotGuard” which was de-veloped by Liu et al. [52]. It is an Internet-based intelligent robotsecurity system, which can detect intruders utilizing invariantface recognition [52].

Fig. 13 illustrates the iBot platform model. This platformenables users to remotely control a robot in response to livevideo images captured by the camera on the robot. The iBotServer connects the robot and camera over a wireless channel,excluding problems associated with cables [52].

The iBot Server includes two components, which are 1) astreaming media encoder (SME) and 2) a controller server (CS).The iBot client includes another two components, which are 1)a streaming media player (SMP) and 2) a controller client (CC).The SME captures and encodes the real-time video from thecamera on the robot under the instruction of the CS. The encodedstreams are delivered by the streaming media server (SMS) tothe SMP. The SMP receives, decodes, and displays the mediadata. The CC communicates with the SMP and the CC interactswith the CS to perform the intelligent control algorithms. The CS

Fig. 14. Detected target tracked and geo-registered on the map [53].

eventually deploys its robot movement commands and camerapan-tilt-zoom commands [52].

Liu et al. present an unmanned water vehicle (UWV), whichperforms automatic maritime visual surveillance. The UMVmobile platform is equipped with a GPS device and a high-resolution omnicamera. Omnicameras provide a 360◦ view ca-pability. Targets are detected with a saliency-based model andadaptively tracked with through selective features. Each targetis geo-registered to a longitude and latitude coordinate. The tar-get geo-location and appearance information is then transmittedto the fusion sensor, where the target location and image isdisplayed on a map, as in Fig. 14 [53].

VII. DISCUSSION ON CURRENT DILEMMAS IN THE 3GSS

According to Pavlidis et al. [54], the contemporary securityinfrastructure could be summarized as the following: 1) secu-rity systems act locally and they do not cooperate in an effi-cient manner; 2) extremely high value assets are insufficientlyprotected by obsolete technology systems; and 3) there is a de-pendence on intensive human concentration to detect and assessthreats. Considering the practical realities, Pavlidis et al. [54]recommend to cooperate closely with both the business unit thatwould productize the surveillance prototype and the potentialcustomers [54].

Security-related technology is a growing industry. Govern-ments and corporations worldwide are spending billions of dol-lars in the research, development, and deployment of intelligentvideo surveillance systems, data mining software, biometricssystems, and Internet geolocation technology. The technologiestarget terrorists and violators of export restrictions. Surveillancetechnologies are typically shrouded with secrecy, because thefear of exposing them will make them less efficient, but thegrowing utilization of these technologies has provoked publicinterest and resistance to security-related technologies [55].

The following sections consist of the most notable aspectsdiscovered in literature review. They compose of the attain-ment of real-time distributed architecture, awareness and intel-ligence, existing difficulties in video surveillance, the utilizationof wireless networks, the energy efficiency of remote sensors,


the location difficulties of surveillance personnel, and scalabilitydifficulties.

A. Real-Time Distributed Architecture

It is fundamental to establish a framework or methodologyfor designing distributed wide-area surveillance systems. Thisranges from the generation of requirements to the creation ofdesign paradigms by defining functional and intercommunica-tion models. The future realization of a wide-area distributedintelligent surveillance system should be through a collection ofdistinct disciplines. Computer vision, telecommunications, andsystem engineering are clearly needed [2].

A distributed multiagent approach may provide numerousbenefits. First, intelligent cooperation between agents may en-able the use of less expensive sensors and, therefore, a largenumber of sensors may be deployed over a larger area. Sec-ond, robustness is enhanced, because even if some agents fail,others remain to perform the mission. Third, performance ismore flexible, there is a distribution of tasks at miscellaneouslocations between groups of agents. For instance, the likelihoodof correctly classifying an object or target increases if multiplesensors are concentrated on it from different locations [2].

A video surveillance network is a complicated distributed ap-plication and requires sophisticated support from middleware.The role of middleware is primarily to support communica-tion between modules. The nonfunctional requirements for thevideo surveillance networks are best defined in architecturalterms and contain scalability (middleware must offer tools suit-able for the scalable re-implementation of these algorithms),availability (the middleware needs to support sufficient faulttolerance to uphold acceptable levels of availability), evolvabil-ity (the capacity of the surveillance network to adjust to changes,including changes to the hardware and modifications to the soft-ware), integration (middleware is the intermediary for this typeof communication), security (middleware needs to offer secu-rity facilities to address such attacks), and manageability (thenetwork middleware must support the on-demand requirementfor manageability) [43].

The systems provide a concrete and profitable assistance toforensic investigations, despite that their potential capabilitiesare decreased in reality by the limitations of storage capacities,the frame skipping, and the data compression. Currently real-time reactivity is insufficient, because the human operators thatcannot handle enormous amounts of surveillance streams [57].

1) Architectural Dilemmas in Video Surveillance: While ex-isting research has addressed multiple issues in the analysis ofsurveillance video, there has been little work in the area ofmore efficient information acquisition based on real-time au-tomatic video analysis, such as the automatic acquisition ofhigh-resolution face images. There is a challenge in transmittinginformation across different scales and the interpretation of theinformation become essential. Multiscale techniques present acompletely novel region of research, including camera control,processing video from moving cameras, resource allocation,and task-based camera management in addition to challenges inperformance modeling and evaluation [41].

The fundamental techniques for interpreting video and ex-tracting information from it have received a substantial amountof attention. The successive set of challenges addresses on howto use these techniques to construct large-scale deployable sys-tems. Several challenges of deployment contain the cost min-imization of wiring, low-power hardware for battery-operatedcamera installations, automatic calibration of cameras, auto-matic fault detection, and the development of system manage-ment tools [41].

Improving the smart cameras with additional sensors couldtransform them into a high-performance multisensor system. Bycombining visual, acoustic, tactile, or location-based informa-tion, the smart cameras become more sensitive and can transmitresults that are more precise. This makes the results more appli-cable widely [11].

The usual scenario in an industrial research and developmentunit developing vision systems is that a customer presents asystem specification and its requirements. The engineer theninterprets these requirements into a system design and validatesthat the system design fulfils the user-specified requirements.The accuracy requirements are typically defined in terms ofdetection and false alarm rates for objects. The computationalrequirement is specified commonly by the system response timeto the presence of an object, e.g., real-time or delayed. The in-tention of the vision systems engineer is to then exploit theserestrictions and design a system that is operational in the sensethat it satisfies customer requirements regarding speed, accu-racy, and expenses [58].

The essential dilemma is that there is no known systematicway for vision systems engineers to conduct this translation ofthe system requirements to a detailed design. It is still an art toengineer systems that satisfy application-specific requirements.There are two basic steps in the design process, which are 1) thechoice of the system architecture and the modules to achieve thetask, and 2) the statistical analysis and validation of the systemto check if it fulfils user requirements. In real life, the systemdesign and analysis phases usually follow each other in a cycleuntil the engineer creates a design and a suitable analysis thatsatisfies the user specifications [58].

Automation of the design process is a research area withmultiple open issues, even though there has been some studiesin the context of image analysis, e.g., automatic programming.The systems analysis (performance characterization) phase inthe context of video processing systems has been an activeregion of research in the recent years. Performance evaluationof image and video analysis components or systems is an activeresearch topic in the vision community [58].

2) Real-Time Data Constraints: Society requires the resultsof research activities to address new solutions in video surveil-lance and sensor networks. Security and safety calls for newgenerations of multimedia surveillance systems, in which com-puters will act not only as supporting platforms but as the essen-tial core of real-time data comprehension process, is becominga reality [57].

Most of the new research activities in surveillance are explor-ing larger dimensions, such as distributed video surveillancesystems, heterogeneous video surveillance, audio surveillance,


and biometric systems. In vast distributed environments, theexploitation of networks of small cooperative sensors shouldconsiderably improve the surveillance capability of high-levelssensors, such as cameras [57].

As system size and diversity grow and consequently the com-plexity increases, the probability of inconsistency, unreliabilityand nonresponsiveness grows. The design and implementationof distributed real-time systems present essential challenges toensure that these complicated systems function as required. Tocomprehend or implement any complex system, it is necessaryto decompose it into component parts and functions. Distributedsystems can be considered in terms of independent concurrentactivities that need to exchange data that do not weaken theoverall predictability and performance of the system [59].

There are four crucial objectives that design methods for real-time systems should achieve, 1) to be able to structure the systemin concurrent tasks, 2) to be capable of developing reusablesoftware by information hiding, 3) to be able to determine thebehavioral characteristics of the system, and 4) to be able toanalyze the performance of the design by distinguishing itsperformance and the fulfillment of requirements [59].

The main motivation of the paradigm shift from a central toa distributed control surveillance system is an improvement ofthe functionality, availability, and autonomy of the surveillancesystem. These surveillance systems can respond autonomouslyto changes in the environment of the system and to detectedevents in the monitored scenes. A static surveillance systemconfiguration is not desirable. The system architecture mustsupport reconfiguration, migration, quality of service, and poweradaptation in analysis tasks [11].

Recently, there has been rapid development in advancedsurveillance systems to solve a collection of difficulties thatvary from people recognition to behavior analysis with the in-tention to enhance security. These challenges have encountereddifferent perspectives and were followed by a vast selection ofsystem architectures. As cheaper and faster computing hard-ware accompanied with efficient and versatile sensors reachedthe consumer, there was a rapid development of multicamerasystems. In spite of their large area coverage, they introducenew dilemmas that must be addressed in the architectural defi-nition [60].

B. Difficulties in Video Surveillance

In realistic surveillance scenarios, it is impossible for a sin-gle sensor to view all the areas simultaneously, or to visuallytrack a moving object for a long period. Objects become oc-cluded by buildings and trees and the sensors themselves haveconfined fields of view. A promising solution to this difficultyis to use a network of video sensors to cooperatively monitorall the objects within an extended region and seamlessly trackindividual objects that cannot be viewed continuously by an in-dividual sensor alone. Some of the technical challenges withinthis method are to 1) actively control sensors to cooperativelytrack multiple moving objects, 2) fuse information from mul-tiple sensors into scene-level object representations, 3) surveythe scene for events and activities that should “trigger” further

processing or operator involvement, and 4) offer human users ahigh-level interface for dynamic scene visualization and systemtasking [12].

Intelligent visual surveillance is a vital application area forcomputer vision. In situations in which networks of hundreds ofcameras are used to cover a wide area, the obvious restriction isthe ability of the user to manage vast amounts of information.Due to this reason, automated tools that can generalize activi-ties or track objects are crucial to the operator. The ability totrack objects across (spatially separated) camera scenes is thekey to the user requirements. Extensive geometric knowledge ofthe site and camera positions is normally needed. This type ofexplicit mapping to camera placement is impossible for large in-stallations, because it requires that the operator knows to whichcamera to switch when an object vanishes [61].

While detecting and tracking objects are crucial capabili-ties for smart surveillance, from the perspective of human in-telligence analyst, the most critical challenge in video-basedsurveillance is interpreting the automatic analysis of data intothe detection of events of interest and the identification of trends.Contemporary systems have just begun to examine automaticevent detection. The key points are video-based detection andtracking, video-based person identification, large-scale surveil-lance systems, and automatic system calibration [41].

Object tracking is a vital task for many applications in the re-gion of computer vision and particularly in those associated tovideo surveillance. Recently, the research community has con-centrated its interests on developing smart applications to en-hance event detection capabilities in video surveillance systems.Advanced visual-based surveillance systems need to processvideos resulting from multiple cameras to detect the presence ofmobile objects in the monitored scene. Every detected object istracked and their trajectories are analyzed to deduct their move-ment in the scene. Finally, at the highest levels of the system,detected objects are recognized and their behavior is analyzedto verify if the state is normal or potentially dangerous [62].

Motion detection, tracking, behavior comprehension, andpersonal identification at a distance can be realized by sin-gle camera-based visual surveillance systems. Multiple camera-based visual surveillance systems can be helpful, because thesurveillance region is enlarged and multiple view informationcan outperform occlusion. Tracking with a single camera easilycreates obscurity resulting from occlusion or depth (see Fig. 15).This incomprehensibility may be removed by another view. Vi-sual surveillance using multicameras introduces dilemmas, suchas camera installation, camera calibration, object matching, au-tomated camera switching, and data fusion [20].

The recognition of human activities in restricted settings, suchas airports, parking lots, and banks is of significant interest in se-curity and automated surveillance systems. Albanese et al. [63]state that science is still far from achieving a systematic solutionto this difficulty. The analysis of activities executed by humansin restricted settings is of great importance in applications, suchas automated security and surveillance systems. There has beenessential interest in this area where the challenge is to automat-ically recognize the activities occurring in the field of a cameraand detect abnormalities [63].


Fig. 15. Example of occlusion [15].

Visual surveillance is a very active research area in computervision because of the rapidly increasing number of surveillancecameras, which results in a strong demand for the automatic pro-cessing methods of their output. The scientific challenge is toplan and implement automatic systems that can detect and trackmoving objects, and interpret their activities and behaviors. Thisneed is a worldwide phenomenon, which is required by both pri-vate companies, governmental and public institutions, with theaim of enhancing public safety. Visual surveillance is a key tech-nology in public safety, e.g., in transport networks, town centers,schools, and hospitals. The main tasks in visual surveillance sys-tems contain motion detection, object classification, tracking,activity understanding, and semantic classification [25].

Luan et al. proclaim that tracking in low frame rate (LFR)video is a practical requirement for numerous real-time appli-cations, including visual surveillance. For tracking systems, anLFR condition is equivalent to abrupt motion, which is typ-ically encountered but difficult to address. Specifically, thesedifficulties include poor motion continuity, and fast appearancevariation of target and increased background clutter. The ma-jority of existing approaches cannot be readily applied to LFRtracking problems because of their vulnerability to motion andappearance discontinuity inflicted by LFR data [64].

1) Occlusions: Outdoor and indoor surveillance has somedistinct requirements. Indoor surveillance can be consideredas less complicated than outdoor surveillance. The operatingconditions are stable in indoor environments. The cameras aretypically fixed and not subject to vibration, weather conditionsdo not affect the scene, and the moving targets are generallylimited to people. Regardless of these simplified conditions, in-house scenes are characterized by other eccentricities, whichenlarge the dilemmas of surveillance systems [65].

Occlusions and operation in difficult weather conditions arefundamental challenges. In a multiple-target-tracking system,the key points of the local tracker are typically the detectionsubsystem and the measurements-to-tracks association subsys-tem. The design of the association system is dependent on thequality of the detection subsystem [66].

The difficulty of tracking multiple objects among complicatedcrowds in busy areas is far from being completely solved. Themajority of existing algorithms are designed under one or mul-tiple presumptions on occlusions, e.g., the number of objects,partial occlusion, short-term occlusion, constant motion, andconstant illumination. Some methods use a human model for

reasoning the occlusion between standing humans. Exploiting ahuman appearance model can achieve better results in trackingmultiple standing and walking people in a large crowd, but itmay also result in difficulties in addressing occlusions involvingobjects, such as bags, luggage, children, sitting people, and ve-hicles. The change and interchange of labels of tracked objectsafter occlusion are the most conventional and significant errorsof these methods [24].

Tracking multiple people in cluttered and crowded scenes isa demanding task primarily because of the occlusion betweenpeople. If a person is visually isolated, it is easier to performthe tasks of detection and tracking. The increase of the densityof objects in the scene increases interobject occlusions. A fore-ground blob may not belong to a single individual and it maybelong to several individuals in the scene. A person may evenbe completely occluded by other people, resulting in an impos-sibility to detect and track multiple individuals with a singlecamera. Multiple views of the same scene attempts to acquireinformation that might be omitted in a particular view [67].

The usage of multiple cameras in visual surveillance hasgrown significantly, because it is very useful to address manydifficulties, such as occlusion. Visual surveillance that uses mul-tiple cameras has numerous problems though. These includecamera installation, calibration of multiple cameras, correspon-dence between multiple cameras, automated camera switching,and data fusion [68].

2) Feature Extraction and Classification: The recognitionof moving targets in a video stream still remains a difficulty.Moving target recognition entails two main steps, which are 1)feature extraction and 2) classification. The feature extractionprocess derives a collection of features from the video stream.Numerous machine-learning classification techniques have beenstudied for surveillance tasks [69].

The most typical approach to detect moving objects is back-ground subtraction in which each frame of a video sequence iscompared against a background model. One dilemma in back-ground subtraction is caused by the detection of false objectswhen an object that belongs to the background, e.g., after re-maining stationary for a period of time, moves away. This cre-ates what are called “ghosts.” It is vital to address this problembecause ghost objects will unfavorably affect many tasks, suchas object classification, tracking and event analysis, e.g., aban-doned item detection [70].

Fig. 16 presents visual results from a basic motion tracker anda ghost detection algorithm. Boxes with dark borders indicate thevalid moving tracks created by the tracker. Boxes with dasheddark borders denote the valid but static tracks. Boxes with whiteborders represent invalid tracks, also known as, ghost tracks. Thepatches presented in the boxes present the foreground pixels,which are detected as moving pixels [70].

3) Automatic Video Analysis: The strategy proposed byWang et al. [71] to support rapid decision making is to reducethe amount of information required to be processed by humanoperators. For this reason, researchers have been studying auto-matic video content analysis technologies to extract informationfrom videos. Even though substantial progress has been made,the high computational cost of these techniques limits their us-


Fig. 16. Example basic tracking and ghosts [70].

age in real-time situations in the near future. Even though thesetechniques can essentially reduce the amount of video informa-tion, which must be analyzed by human operators, the humanoperators must resolve the ambiguities in the videos, synthesizea vast range of context information within the videos, and makefinal decisions. Therefore, it is important to design interactivevisualizations that can support real-time information synthesisand decision making for video surveillance tasks [71].

Additionally, the amount of cameras and the area undersurveillance are restricted by the personnel available. To re-duce the restrictions of traditional surveillance methods, thereis ongoing effort in the computer vision and artificial intelli-gence community to develop automated systems for the real-time monitoring of people, vehicles, and other objects. Thesesystems can create a depiction of the events occurring withintheir vicinity and raise alarms if they detect a suspicious personor unusual activity [22].

Camera systems for surveillance are in extensive use andproduce considerable amounts of video data, which are storedfor future or immediate utilization. In this context, efficient in-dexing and retrieval from surveillance video databases are cru-cial. Surveillance videos are rich in motion information, whichis the most important cue to identify the dynamic content ofvideos. Extraction, storage, and analysis of motion informationin videos and in content-based surveillance video retrieval areof importance [72].

C. Awareness and Intelligence

The ultimate goal of surveillance systems is to automaticallyevaluate the ongoing activities of the monitored environmentby flagging and presenting the suspicious events at real-time tothe operator to prevent dangerous situations. Data fusion tech-niques can be used to enhance the estimation of performanceand system robustness by exploiting the redundancy offeredby multiple sensors observing the same scene. With recent ad-vancements in camera and processing technology, data fusion isbeing considered for video-based systems. Intelligent sensors,which are equipped with microprocessors to execute distributed

data processing and computation, are available and can decreasethe computational burden of a central processing node [73].

The reliability of sensors is never explicitly considered. Thedifficulty in choosing the most relevant sensor or collection ofsensors to execute a particular task often arises. The task couldbe target tracking, audio recording of a suspicious event, ortriggering an alarm. It would be desirable to have a system thatcould automatically select the correct camera or collection ofcameras. If data from multiple sensors are available and datafusion can be performed, results could be considerably affectedin a case of a malfunctioning sensor. A means to evaluate theperformance of the sensors and to weight their contribution inthe fusion process is required [73].

In the contemporary generation of surveillance systems, inwhich multiple asynchronous and miscellaneous sensors areused, the adaption of the information acquired from them toderive the events from the environment is an important andchallenging research problem. Information adaption refers tothe process of combining the sensor and nonsensor informationusing the context and past experience. The issue of informa-tion adaption is vital, because when information is acquiredfrom multiple sources, adapted information offers more preciseinferences of the environment than individual sources [74].

1) Context Awareness: To improve software autonomy, ap-plications depend on the context information to dynamicallyadapt their behavior to match the environment and user require-ments. Context-aware applications require middleware for thetransparent distribution of components. Context-aware applica-tions are needed to support personalization and adaptation basedon context awareness. The user must understand how the appli-cations function, such as what context information and logic areutilized at particular automated actions. Context-aware applica-tions must ensure that actions are committed on behalf of usersare both accountable and intelligible. The system cannot simplybe trusted to act on behalf of users [75].

To address these dilemmas, autonomous context-aware sys-tems need to provide mechanisms to present a suitable balancebetween user control and software autonomy. This contains pro-viding mechanisms to make users aware of application adapta-tions by indicating aspects of the application state, such as con-text information and adaptation logic used in decision makingprocesses. The challenge is not to only identify what applicationstate information should be presented, but in what manner, e.g.,with what level of explanation. In traditional applications, thetradeoff between user control and software autonomy has beenfixed during the design phase. In contrast, context-aware appli-cations may need to adjust the balance of software autonomyand user control at run-time by changing the level of feedbackto users and the content of user input. The support for adaptionincludes the management of rules and user preferences that areused to distinguish how the context-aware system will respondto the available context information [75].

The design of aware systems, the systems that have capabili-ties of automatic adaption to changes, learning from experience,and active interaction with external entities, are all active topicsof research involving several disciplines ranging from computervision to artificial intelligence. To reach this goal, the approaches


Fig. 17. Example of a cognitive cycle [76].

based on the imitation of human brain skills are typical and, inthe past, they have offered successful applications. In Fig. 17,Dore et al. present a possible model that contains sensing in-formation from the external world, analyzing and representingthe information, conducting decisions, and issuing actions andcommunications to the external world [76].

2) Data Fusion: Blasch and Plano [77] state that “data fu-sion” is a term used to refer to the bottom-level, data-driven fu-sion. “Information fusion” refers to processing of already-fuseddata, such as from primary sensors or sources, into meaning-ful and preferably relevant information to another part of thesystem, human or not [77].

A multimedia system incorporates relevant media streamsto accomplish a detection task. As the different streams havedifferent confidence levels in achieving distinct tasks, it is vitalfor the system to wisely identify the most appropriate streamsfor a specific analysis task for it to reach higher confidence.The confidence information of media streams is usually used intheir incorporation by assigning weights to them accordingly.The confidence in a stream is normally determined on how it hasassisted in performing the detection task previously. Arguably, ifthe system acquires precise results based on a particular stream,a higher confidence level is assigned to it in the adaption process[16], [17].

Data fusion from multiple cameras involving the same objectsis a main challenge in multicamera surveillance systems and in-fluences the optimal data combination of different sources. It isrequired to estimate the reliability of the available sensors andprocesses to combine complementary information in regionswhere there are multiple views to solve dilemmas of specificsensors, such as occlusions, overlaps, and shadows. Some tradi-tional benefits, in addition to extended spatial coverage, are theenhancements in accuracy with the combination of covariancereduction, improved robustness by the identification of malfunc-

tioning sensors, and enhanced continuity with complementarydetections [78].

Typically, surveillance systems are composed of numeroussensors to acquire data from each target in the environment.These systems encounter two types of dilemmas, which are1) the fusion of data which addresses the combination of datafrom distinct sources in an optimal manner, and 2) the manage-ment of multiple sensors, which addresses the optimization ofthe global management of the system through the application ofindividual operations in every sensor [14].

In Castanedo et al.’s [14] surveillance systems, autonomousagents can cooperate with others agents for two different objec-tives, which are 1) to acquire enhanced performance or precisionfor a specific surveillance task, in which the complementary in-formation can be incorporated and then combined through datafusion techniques, and 2) to use capabilities of other agents toexpand system coverage and execute tasks that they are not ableto achieve individually [14].

The information adaption is a challenging task, because of1) the diversity and asynchrony of sensors, 2) the disagreementor agreement of media streams, and 3) the confidence regardingthe media streams. There is an issue on how to fuse individualinformation to establish comprehensive information. These areitems of importance and essential challenges [74].

D. Wireless Networks and Their Applicability

The WSN has multiple applications in environment monitor-ing applications. Advances in microsensor and communicationtechnologies have made it possible to manufacture cost-effectiveand small WSNs. Several interesting WSN applications havebeen specified, such as the active badge system, which locatesindividuals within a building. Radio frequency identification(RFID) technology is utilized in inventory management andmonitoring, e.g., rail car tracking. The confidence in object lo-cation may be improved with an RFID stream in comparisonto an audio stream [16]. Berkeley Smart Dust can be used toperiodically receive readings from sensors. The MassachusettsInstitute of Technology (MIT) cricket uses the time difference ofarrival (TDoA) model to distinguish the position and orientationof a device [79].

The combination of these technologies could provide manynew applications. The sensor networks can detect and indicateenvironment-related information and events. Through messag-ing systems, these events can be transmitted to the outside worldfor immediate processing. These events may trigger human orapplication programs to respond with actions, which may befurther conveyed back into the sensor networks [79].

By adopting the networks as the communications medium forreal-time transmission of video signals in a security-sensitiveoperation, many technological issues need to be resolved. Agreat amount of data flow can cause network congestion. Thesystem must provide real-time transmission of video signalseven though there might be only a small amount of bandwidthavailable. Robust and efficient error control mechanisms andvideo compression techniques need to be used to prevent thedifficulties related to limited bandwidth [4].


Recently, there has been an emphasis on the developmentof wide-area distributed wireless sensor networks with self-organization capabilities to tolerate sensor failures, changingenvironmental conditions, and distinct environmental sensingapplications. Particularly, mobile sensor networks (MSNs) re-quire support from self-configuration mechanisms to guaranteeadaptability, scalability, and optimal performance. The best net-work configuration is typically time varying and context depen-dent. Mobile sensors can physically change the network topol-ogy, responding to events of the environment or to changes inthe mission [80].

E. Energy Efficiency of Remote Sensors

With the emergence of high-resolution image sensors,video transmission requires high-bandwidth communicationnetworks. It is predicted that future intelligent video surveil-lance requires more computing power and higher communi-cation bandwidth than currently. This results in higher reso-lution images, higher frame rates, and increasing numbers ofcameras in video surveillance networks. Novel solutions areneeded to handle demanding restrictions of video surveillancesystems, both in terms of communication bandwidth and com-puting power [81].

Intruder detection and data collection are examples of ap-plications envisioned for battery-powered sensor networks. Inmany of these applications, the detection of a certain triggeringevent is the initial step executed prior to any other processing.If trigger events occur seldom, sensor nodes will use a largemajority of their lifetime in the detection loop. The efficient useof system resources in detection then plays a key role in thelongevity of the sensor nodes. The energy consumption in thesystem includes transmission energy and the energy requiredby processing has not been considered directly in the detectionproblem [82].

It is crucial to note that technology scaling will gradually de-crease the processing costs with the transmission cost remain-ing constant. With the usage of compression techniques, onecan reduce the number of transmitted bits. The transmissioncost is decreased with an increase of additional computation.This communication computation tradeoff is the fundamentalidea behind low-energy sensor networks. This is a sharp con-trast to the classical distributed systems, in which the goal isusually maximizing with the speed of execution. The most ap-propriate metrics in wireless networks is power. Experimentalmeasurements indicate that the communication cost in wirelessad hoc networks can be two orders of magnitude higher thancomputation costs regarding consumed power [38].

Integrated video systems (IVSs) are based on the recent devel-opment of smart cameras. In addition to high demands in com-puting performance, power awareness is of major importancein IVS. Power savings may be achieved by graceful degrada-tion of quality of service (QoS). There has been research donein the tradeoff of image quality and power consumption. Thework mainly concentrates on sophisticated image compressiontechniques [83].

A sensor surveillance system comprises a set of wireless sen-sor nodes and a set of targets to be monitored. The wirelesssensor nodes collaborate with each other to survey the targetsand transmit the sensed data to a base station. The wireless sen-sor nodes are powered by batteries and have demanding powerrequirements. The lifetime is the duration until there is no targetthat can be surveyed by any wireless sensor node or data cannotbe forwarded to be processed because of a lack of energy in thesensors nodes [84].

A client-side computing device has a crucial influence on thetotal performance of a surveillance system. The utilization ofa cellular phone as a client of a surveillance system is notable,because of its portability and omnipresent computing. The in-tegration of video information and sensor networks establishedthe fundamental infrastructure for new generations of multime-dia surveillance systems. In this infrastructure, different mediastreams, such as audio, video and sensor signals, would pro-vide an automatic analysis of the controlled environment and areal-time interpretation of the scene [85].

F. Dilemmas in Scalability

A scalable system should be able to integrate the sensor datawith contextual information and domain knowledge providedby both the humans and the physical environment to maintaina coherent picture of the world over time. The performance ofthe majority of the systems is far from what is required fromreal-world applications [86].

A large-scale distributed video surveillance system usuallycomprises many video sources distributed over a vast area, trans-mitting live video streams to a central location for monitoringand processing. Contemporary advances in video sensors and theincreasing availability of networked digital video cameras haveallowed the deployment of large-scale surveillance systems overexisting IP-network infrastructure. Implementing an intelligent,scalable, and distributed video surveillance system remains aresearch problem. Researchers have not paid too much attentionon the scalability of video surveillance systems. They typicallyutilize a centralized architecture and assume the availability ofall the required system resources, such as computational powerand network bandwidth [87].

Fig. 18 presents an example of sensor coverage in a large com-plex [15]. The sensor and its coverage is drawn and indicated,e.g., B1, C1, C2, and C3 [15].

The integration of heterogeneous digital networks in the samesurveillance architecture needs a video encoding and distribu-tion technology capable of adapting to the currently availablebandwidth, which may change in time for the same communi-cation channel, and to be robust against transmission errors. Thepresence of clients with different processing power and displaycapabilities accessing video information requires a multiscalerepresentation of the signal. The restrictions of surveillanceapplications regarding delay, security, complexity, and visualquality introduce strict demands to the technology of the videocodec. In a large surveillance system, the digital network thatenables remote monitoring, storage, control and analysis is notwithin a single local area network (LAN). It typically represents


Fig. 18. Schematic representation of sensor coverage in a large area [15].

a collection of interconnected LANs, wired or wireless, withdifferent bandwidths and QoS. Different types of clients con-nect to these networks and access one or multiple video sources,decode them at the temporal and spatial resolution they require,and provide different functions [88].

QoS is a fundamental concern in distributed IVS. In video-based surveillance, normal QoS parameters contain frame rate,transfer delay, image resolution, and video-compression rate.The surveillance tasks might also provide multiple QoS levels.In addition, the offered QoS levels can change over time due touser instructions or modifications in the monitored environment.Novel IVS systems need to contain dedicated QoS managementmechanisms [11].

1) Scalability in Testing: Testing of individual modules iscalled unit testing. Integration testing comprised of rerunningthe unit test cases after the system was completely integrated.For feature testing, which is also called system testing, testersdeveloped test cases based on the requirements of the system.They chose adequate test cases according to every expectedresult. Load testing comprises four subphases, which are 1)stability testing, 2) stress testing, 3) reliability testing, and 4)performance testing. Stability testing comprises the installationof software in a field-like environment and the verification of itsability to appropriately address data continuously. Stress test-ing comprises the verification of the ability of the software toaddress heavy loads for short periods without crashing. Reli-ability testing comprises the verification that the software canfulfill reliability requirements. Performance testing comprisesthe verification that the software can achieve performance re-quirements [89].

A substantial pitfall in incorporating intelligent functions intoreal-world systems is the lack of robustness, the inability to testand validate these systems under a variety of use cases, and thelack of quantification of the performance of the system. Addi-tionally, the system should gracefully degrade in performanceas the complexity of data grows. This is a very open researchissue that is vital for the deployment of these systems [3].

G. Location Difficulties

Location techniques have numerous possible applicationsin wireless communication, surveillance, military equipment,tracking, and safety applications. Sagiraju et al. [56] concen-trate on positioning in cellular wireless networks. The resultscan be applied to other systems. In the GPS, code-modulatedsignals are transmitted by numerous satellites, which orbit theearth, and are received by GPS receivers to determine the currentposition. To calculate a position, the receiver must first acquirethe satellite signals. Traditionally, GPS receivers have been de-signed with specific acquisition and tracking modes. After thesignal has been acquired, the receiver switches to the track-ing mode. If it loses the lock, then the acquisition needs to berepeated [56].

The GPS system comprises of at least 24 satellites in orbitaround the world, with at least four satellites viewable from anypoint, at a given time, on Earth. Despite GPS being a sophis-ticated solution to the location discovery process, it has mul-tiple network dilemmas. First, GPS is expensive both in termsof hardware and power requirements. Second, GPS requiresline-of-sight between the receiver and the satellites. It does notfunction well when obstructions, such as buildings, block thedirect “view” of the satellites. Locations can be calculated bytrilateration. For a trilateration to be successful, a node needsto have at least three neighbors who already are aware of theirpositions [38].

Security personnel review their wireless video systems forcritical incident information. Complementary information in theform of maps and live video streaming can assist in locating theproblematic zone and act quickly and with knowledge of thesituation. The need for providing detailed real-time informa-tion to the surveillance agents has been identified and is beingaddressed by the research community [10].

The analysis and fusion of different sensor information re-quires mapping observations to a common coordinate system toachieve situational awareness and scene comprehension. Avail-ability of mapping capabilities enables critical operational tasks,such as the fusion of multiple target measurements across thenetwork, deduction of the relative size and speed of the tar-get, and the assignment of tasks to Pan, Tilt, Zoom (PTZ) andmobile sensors. This presents the need for automated and effi-cient geo-registration mechanism for all sensors. For instance,target observations from multiple sensors may be mapped to ageodetic coordinate system and then displayed on a map-basedinterface. Fig. 19 illustrates an example of geo-registration in avisual sensor network [90].

H. Challenges in Privacy

Surveillance of events poses ethical problems. For instance,events involving humans and the right to monitor can conflictwith the individual privacy rights of the monitored people. Theseprivacy challenges depend heavily on the shared acceptance ofthe surveillance task as a necessity by the public with respect toa given application [3].

The suitability of homeland security for this role is plaguedby questions ranging from dependability to the risks that tech-


Fig. 19. Fields of view of four cameras at a port [90].

nologies, e.g., surveillance, profiling, and data aggregation, poseto privacy and civil liberties [1].

In many applications, surveillance data needs to be trans-mitted across open networks with multiuser access characteris-tics. Information protection on these networks is a crucial issuefor upholding privacy in the surveillance service. Paternity ofsurveillance data can be extremely essential for efficient usein law enforcement. Legal requirements necessitate the devel-opment of watermarking and data-hiding techniques for securesensor identity assessment [3].

Despite the relevance of the contemporary surveillance sys-tems, and their role of supporting human control, there is aworld-spread controversy about their utilization, connected withrisks of privacy violations [57].

Advancements in sensor, communications, and storage ca-pacities ease the large collection of multimedia material. Thevalue of this recorded data is only unlocked by technologiesthat can efficiently exploit the knowledge it contains. Regard-less of the concerns over privacy issues, such capabilities arebecoming more common in different environments, for exam-ple, in public transportation premises, cities, public building,and commercial establishments [91].

CCTV surveillance systems used in the field with their cen-tralized processing and recording architecture together with asimple multimonitor visualization of the crude video streamshas several disadvantages and restrictions. The most relevantdilemma is the complete lack of privacy. An automated andprivacy-respecting surveillance system is a desirable goal. Thelatest video analysis systems emerging currently are based oncentralized approaches that impose strict limitations to expand-ability and privacy [92].

To realize the fusion for integrated situation awareness,Trivedi et al. [13] developed the networked sensor tapestry(NeST) framework for multilevel semantic integration. NeSTensures the tracked person’s privacy by using a set of pro-grammable plug-in privacy filters operating on incoming sensordata. The filters either inhibit access to the data or remove anypersonally identifiable information. Trivedi et al. [13] use pri-vacy filters with a privacy grammar that can connect multiplelow-level data filters and aspects to create data-dependent pri-vacy definitions.

VIII. GROWING TECHNOLOGIES AND TRENDS

There are novel technologies and trends, which have begunor are beginning to establish themselves. Kankanhalli and Rui[93] have indicated numerous. Prati et al. [94] introduced amultisensor surveillance system containing video cameras andpassive infrared sensors (PIR). Calderara et al. [95] state thatvisual sensors will continue to be dominant sensors but they willbe complemented with other appropriate sensors.

Atrey et al. [96] claim that contemporary systems are con-structed for specific physical environments with specific sensortypes and sensor deployments. While this is efficient, it lacksportability required for widespread deployment. The system ar-chitecture should be capable of the usage of the sensors andresources to address the needs of the environment.

With the increasing variety and decreasing expenses of mis-cellaneous types of sensors, there will be an increase in theusage of radically differentiated media, such as infrared, mo-tion sensor information, text in diverse formats, optical sensordata, biological and satellite telemetric data, and location dataobtained by GPS devices. Some other developments are mobilesensors, such as moving cameras on vehicles used in publicbuses. Humans are also mobile sensors recording informationin different media types such as blogs. It would beneficial toenhance the environment with suitable sensors to reduce thesensor and semantic omissions [93].

Accompanied with the increased popularity of portable se-curity applications, it is more important that the surveillancesystem has low power consumption, simple functionality, andcompact size. This includes the integration of the miscellaneousfunctional blocks and a motion detection sensor (MDS) into asingle chip [97].

The process of extracting and tracking human figures in imagesequences is vital for video surveillance and video-indexing ap-plications. A useful and popular approach is based on silhouetteanalysis with spatiotemporal representation in which the goalis to achieve an invariant representation of the detected object.Symmetries of the silhouette can be used as a gait parameter forthe identification of a person [98].

Biometrics has been vastly applied to secure surveillance,access control, and personal identification with high security.With the rise of pervasive and personal computation, cell phonesand PDAs will become a major communication and computa-tion platform for individuals and suitable organizations. Eventhough biometrics has been an appropriate method for attachingphysical identity to a digital correspondence, a flexible biomet-rics system, which can accommodate real world applications ina secure manner is still a substantial challenge [99].

The next generation video surveillance system will be a net-worked, intelligent, multicamera cooperative system with inte-grated situation awareness of complicated and dynamic scenes.It will be applicable to urban centers or indoor complexes. Theessence of such a system is the increasingly intelligent and robustvideo analysis that is capable of reviewing the videos from low-level image appearance and feature extraction to middle-levelobject or event detection, and finally to high-level reasoningand scene comprehension. Significant steps have been reached


in examining these issues by the research laboratories in thelast decade. Currently, the focus is on the application of theseintegrated systems and the supplying of automated solutions torealistic surveillance dilemmas [100].

There has been a dramatic progression in sensing for securityapplications and in the analysis and processing of sensor data.O’Sullivan and Pless [101] concentrate on two broad applica-tions of sensors for security applications, which are 1) anomalydetection, and 2) object or pattern recognition [101].

In anomaly detection, the difficulty is to detect activity, behav-ior, objects, or substances that are atypical. Typical is definedwith respect to historical data and is extremely scenario de-pendent. Algorithms for anomaly detection must adjust to thescenario and be robust to a vast range of possible assumptions.As a result, there is typically no model for an anomaly andthe model for the location and time are derived from observa-tions. Scenarios that need anomaly detection include perimeter,border, or gateway surveillance [101].

In object or pattern recognition, there is typically a model orprior information of the object or pattern and the intention is tocategorize the pattern. The level of categorization, the requiredsystem robustness, and the required system efficiency defineand restrict the possible models and processing. The usage ofbiometrics for the recognition of people is a prime example ofan application that is evolving rapidly [101].

Gupta et al. propose a leader–follower system, which receivesmultimodal sensor information from a wide array of sensors,including radars and cameras. In such a system, a fixed widefield of view (FOV) sensor conducts the duties of the leader.The leader directs follower PTZ cameras to zoom in on targetsof interest. One of the typical difficulties in a leader–followersystem is that the follower camera can only follow the target asit remains in the FOV of the leader. Additionally, inaccuracies inthe leader–follower calibration may result in imprecise zoomingoperations [102].

In general, there is plenty of prototypical research, which hastransformed into practical solutions. Environments with mul-tiple sensors include solutions in which electronic locks anduser identification have been incorporated into doors, both ofwhich can be perceived as individual sensors. The electroniclock indicates its own status and the user identification devicedenotes the access rights of the user. This also forms a simplerealization of distributed intelligence and awareness in whicheach sensor acts independently but a higher level of deductioncan be performed based on the individual information of eachsensor. Video surveillance has been employed in solutions suchas the detection of the direction of movement. Airports haveutilized this technology to automatically raise alarms in situa-tions in which a person goes through a passage in the wrongdirection. Audio surveillance technology has been adopted tovideo camera solutions, which direct the cameras to the loca-tion of alarming sounds. Within various police forces, mobilerobots have been used to remotely survey a potentially haz-ardous environment and transmit video feed to the user. Wire-less sensor networks can be used to indicate the locations ofnomadic guards to the control room within an indoor perime-ter. All of these solutions have their own appropriate middle-

ware and architecture, which serves their unique properties andpurposes.

There are several major companies that deliver surveillancesystems. GE Security offers integrated security management,intrusion and property protection, and video surveillance [103].ObjectVideo provides intelligent video software for security,public safety, and other applications [104]. IOImage providesvideo surveillance, real-time detection, and alert and trackingservices [105]. RemoteReality offers video surveillance ser-vices, including the detection and tracking of objects, in bothvisible and infrared thermal spectra [106]. Point Grey Researchoffers digital camera technology for machine vision and com-puter vision applications [107].

IX. CONCLUSION

This paper presented the contemporary state of modernsurveillance systems for public safety with a special emphasison the 3GSSs and especially the difficulties of present surveil-lance systems. The paper briefly reviewed the background andprogression of surveillance systems, including a short review ofthe first and second generation of surveillance systems. The thirdgeneration of surveillance systems addresses topics such as mul-tisensor environments, video surveillance, audio surveillance,wireless sensor networks, distributed intelligence and aware-ness, and architecture and middleware. According to modernscience, the current difficulties of surveillance systems for pub-lic safety reside in the fields of the attainment of real-time dis-tributed architecture, awareness and intelligence, existing diffi-culties in video surveillance, the utilization of wireless networks,the energy efficiency of remote sensors, location difficulties ofsurveillance personnel, and scalability difficulties. A portion ofthe difficulties are the same as declared in the 3GSSs, but withdetailed descriptions on the characteristics of the dilemmas,such as the architectural, visual and awareness aspects. Otherdifficulties are completely novel or substantially highlighted,such as surveillance personnel location, application of wirelessnetworks, energy efficiency, and scalability.

Novel sensors and new requirements will accompany surveil-lance systems. This places demanding challenges on architec-ture and its real-time functionality. There are existing funda-mental concepts, such as video and audio surveillance, butthere is a lack of their intelligent usage and especially theirseamless interoperability through a united real-time architec-ture. Contemporary surveillance systems still reside in state inwhich individual concepts may achieve functionality in specificcases, but their comprehensive on-site interoperability is yetto be reached. Substantial evidence of a distributed multisen-sor intelligent surveillance system does not exist. As the sizeof surveyed complexes and buildings grow, the deployment ofwireless sensors and their energy consumption becomes morenotable. Wireless sensors are easy to deploy and low-energy con-sumption is constantly improving. Scalability issues are funda-mentally related to magnitude of areas under surveillance. Areasthat require surveillance are growing and also the complexityof surveillance systems is expanding. These both pose greatchallenges to the scalability aspect. Different sensors provide


different information and their exploitation in intelligent tasksremain a challenge. Sensor data should be decomposed intofundamental blocks and the intelligent components should havethe responsibility of composing the deductions from them. Anattempt should be made to construct a multisensor distributedintelligent surveillance system that functions at a relatively highlevel, capturing alerting situations with a very low false alarmrate. The surveillance personnel are one of the strongest aspectsin a surveillance system and should be retained in the system.Despite advancements in intelligence and awareness, the humanbeing will always be a forerunner in adaptability and deductions.

The endless demand and abundance of surveillance systemsfor public safety has multiple issues, which still require resolu-tions. Extensive intelligent and automation accompanied withenergy efficiency and scalability in large areas are required to beadopted by suppliers to establish surveillance systems for civicand communal public safety.

REFERENCES

[1] M. Reiter and P. Rohatgi, “Homeland security guest editor’s introduc-tion,” IEEE Internet Comput., vol. 8, no. 6, pp. 16–17, Nov./Dec. 2004,doi: 10.1109/MIC.2004.62.

[2] M. Valera and S. A. Velastin, “Intelligent distributed surveillance sys-tems: A review,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2,pp. 192–204, Apr. 2005, doi: 10.1049/ip-vis: 20041147.

[3] C. S. Regazzoni, V. Ramesh, and G. L. Foresti, “Scanning the is-sue/technology special issue on video communications, processing, andunderstanding for third generation surveillance systems,” Proc. IEEE,vol. 89, no. 10, pp. 1355–1367, Oct. 2001, doi: 10.1109/5.959335.

[4] A. C. M. Fong and S. C. Hui, “Web-based intelligent surveillance systemfor detection of criminal activities,” Comput. Control Eng. J., vol. 12,no. 6, pp. 263–270, Dec. 2001.

[5] K. Muller, A. Smolic, M. Drose, P. Voigt, and T. Wiegand, “3-D con-struction of a dynamic environment with a fully calibrated backgroundfor traffic scenes,” IEEE Trans. Circuits Syst. Video Technol., vol. 15,no. 4, pp. 538–549, Apr. 2005, doi: 10.1109/TCSVT.2005.844452.

[6] W. M. Thames, “From eye to electron—Management problems of thecombat surveillance research and development field,” IRE Trans. Mil.Electron., vol. MIL-4, no. 4, pp. 548–551, Oct. 1960, doi: 10.1109/IRET-MIL.1960.5008288.

[7] H. A. Nye, “The problem of combat surveillance,” IRE Trans. Mil.Electron., vol. MIL-4, no. 4, pp. 551–555, Oct. 1960, doi: 10.1109/IRET-MIL.1960.5008289.

[8] A. S. White, “Application of signal corps radar to combat surveillance,”IRE Trans. Mil. Electron., vol. MIL-4, no. 4, pp. 561–565, Oct. 1960,doi: 10.1109/IRET-MIL.1960.5008291.

[9] C. E. Wolfe, “Information system displays for aerospace surveillanceapplications,” IEEE Trans. Aerosp., vol. AS-2, no. 2, pp. 204–210, Apr.1964, doi: 10.1109/TA.1964.4319590.

[10] R. Ott, M. Gutierrez, D. Thalmann, and F. Vexo, “Advanced virtualreality technologies for surveillance and security applications,” in Proc.ACM SIGGRAPH Int. Conf. Virtual Real. Continuum Its Appl. (VCRIA),Jun. 2006, pp. 163–170.

[11] M. Bramberger, A. Doblander, A. Maier, B. Rinner, and H. Schwabach,“Distributed embedded smart cameras for surveillance applica-tions,” Computer, vol. 39, no. 2, pp. 68–75, Feb. 2006, doi:10.1109/MC.2006.55.

[12] R. T. Collins, A. J. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithmsfor cooperative multisensor surveillance,” Proc. IEEE, vol. 89, no. 10,pp. 1456–1477, Oct. 2001, doi: 10.1109/5.959341.

[13] M. M. Trivedi, T. L. Gandhi, and K. S. Huang, “Homeland securitydistributed interactive video arrays for event capture and enhanced situa-tional awareness,” IEEE Intell. Syst., vol. 20, no. 5, pp. 58–66, Sep./Oct.2005, doi:10.1109/MIS.2005.86.

[14] F. Castanedo, M. A. Patricio, J. Garcia, and J. M. Molina, “Extendingsurveillance systems capabilities using BDI cooperative sensor agents,”in Proc. 4th Int. Workshop Video Surveill. Sens. Netw. (VSSN), Oct. 2006,pp. 131–138.

[15] S. A. Velastin, B. A. Boghossian, B. P. I. Lo, J. Sun, and M. A. Vicencio-Silva, “PRISMATICA: Toward ambient intelligence in public transportenvironments,” IEEE Trans. Syst., Man, Cybern. A, Syst. Hum., vol. 35,no. 1, pp. 164–182, Jan. 2005, doi: 10.1109/TMSCA.2004.838461.

[16] Z. Rasheed, X. Cao, K. Shafique, H. Liu, L. Yu, M. Lee, K. Ram-nath, T. Choe, O. Javed, and N. Haering, “Automated visual anal-ysis in large scale sensor networks,” in Proc. 2nd ACM/IEEE Int.Conf. Distrib. Smart Cameras (ICDSC), Sep. 2008, pp. 1–10, doi:10.1109/ICDSC.2008.4635678.

[17] P. K. Atrey and A. El Saddik, “Confidence evolution in multimediasystems,” IEEE Trans. Multimedia, vol. 10, no. 7, pp. 1288–1298, Nov.2008, doi:10.1109/TMM.2008.2004907.

[18] I. N. Junejo, X. Cao, and H. Foroosh, “Autoconfiguration of a dy-namic nonoverlapping camera network,” IEEE Trans. Syst., Man,Cybern. B, Cybern., vol. 37, no. 4, pp. 803–816, Aug. 2007, doi:10.1109/TSMCB.2007.895366.

[19] D. Makris and T. Ellis, “Learning semantic sense models from observingactivity in visual surveillance,” IEEE Trans. Syst., Man, Cybern. B,Cybern., vol. 35, no. 3, pp. 397–408, Jun. 2005.

[20] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveil-lance of object motion and behaviors,” IEEE Trans. Syst., Man, Cy-bern. C, Appl. Rev., vol. 34, no. 3, pp. 334–352, Aug. 2004, doi:10.1109/TSMCC.2004.829274.

[21] C. Kreucher, K. Kastella, and A. O. Hero III, “Multitarget track-ing using the joint multitarget probability density,” IEEE Trans.Aerosp. Electron. Syst., vol. 41, no. 4, pp. 1396–1414, Oct. 2005, doi:10.1109/TAES.2005.1561892.

[22] M. Shah, O. Javed, and K. Shafique, “Automated visual surveillancein realistic scenarios,” IEEE Multimedia, vol. 14, no. 1, pp. 30–39,Jan.–Mar. 2007, doi: 10.1109/MMUL.2007.3.

[23] G. L. Foresti, C. Micheloni, L. Snidaro, P. Remagnino, and T. El-lis, “Active video-based surveillance system,” IEEE Signal Pro-cess. Mag., vol. 22, no. 2, pp. 25–37, Mar. 2005, doi: 10.1109/MSP.2005.1406473.

[24] L. Li, W. Huang, I. Y.-H. Gu, R. Luo, and Q. Tian, “An efficientsequential approach to tracking multiple objects through crowds forreal-time intelligent CCTV systems,” IEEE Trans. Syst., Man, Cy-bern. B, Cybern., vol. 38, no. 5, pp. 1254–1269, Oct. 2008, doi:10.1109/TSMCB.2008.927265.

[25] L. Maddalena and A. Petrosino, “A self-organizing approach to back-ground subtraction for visual surveillance applications,” IEEE Trans.Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008, doi:10.1109/TIP.2008.924285.

[26] Y. Li, C. Huang, and R. Nevatia, “Learning to associate: Hybrid boostedmulti-target tracker for crowded scene,” in Proc. IEEE Conf. Com-put. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 2953–2960, doi:10.1109/CVPRW.2009.5206735.

[27] A. Leykin, Y. Ran, and R. Hammoud, “Thermal-visible video fusionfor moving target tracking and pedestrian classification,” in Proc. IEEEConf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2007, pp. 1–8, doi:10.1109/CVPR.2007.383444.

[28] A. Leykin and R. Hammoud, “Robust multi-pedestrian tracking inthermal-visible surveillance videos,” in Proc. Conf. Comput. Vis. Pat-tern Recognit. Workshop (CVPRW), Jun. 2006, pp. 136–143, doi:10.1109/CVPRW.2006.175.

[29] W. K. Wong, P. N. Tan, C. K. Loo, and W. S. Lim, “An effective surveil-lance system using thermal camera,” in Int. Conf. Signal Acquis. Process.(ICSAP), Apr. 2009, pp. 13–17, doi: 10.1109/ICSAP.2009.12.

[30] D. Istrate, E. Castelli, M. Vacher, L. Besacier, and J. F. Serignat, “In-formation extraction from sound for medical telemonitoring,” IEEETrans. Inf. Technol. Biomed., vol. 10, no. 2, pp. 264–274, Apr. 2006, doi:10.1109/TITB.2005.859889.

[31] M. Stanacevic and G. Cauwenberghs, “Micropower gradient flow acous-tic localizer,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 10,pp. 2148–2157, Oct. 2005, doi: 10.1109/TCSI.2005.853356.

[32] P. Julian, A. G. Andreou, L. Riddle, S. Shamma, D. H. Goldberg, andG. Cauwenberghs, “A comparative study of sound localization algo-rithms for energy aware sensor network nodes,” IEEE Trans. Cir-cuits Syst. I, Reg. Papers, vol. 51, no. 4, pp. 640–648, Apr. 2004, doi:10.1109/TCSI.2004.826205.

[33] A. F. Smeaton and M. McHugh, “Towards event detection in an audio-based sensor network,” in Proc. 3rd Int. Workshop Video Surveill. Sens.Netw. (VSSN), Nov. 2005, pp. 87–94.

[34] J. Chen, Z. Safar, and J. A. Sorensen, “Multimodal wireless networks:Communication and surveillance on the same infrastructure,” IEEE


Trans. Inf. Forensics Secur., vol. 2, no. 3, pp. 468–484, Sep. 2007, doi:10.1109/TIFS.2007.904944.

[35] G. Xing, C. Lu, R. Pless, and Q. Huang, “Impact of sensing coverage ongreedy geographic routing algorithms,” IEEE Trans. Parallel Distrib.Syst., vol. 17, no. 4, pp. 348–360, Apr. 2006, doi: 10.1109/TPDS.2006.48.

[36] R. R. Brooks, P. Ramanathan, and A. M. Sayeed, “Distributed targetclassification and tracking in sensor networks,” Proc. IEEE, vol. 91,no. 8, pp. 1163–1171, Aug. 2003, doi: 10.1109/JPROC.2003.814923.

[37] A. M. Tabar, A. Keshavarz, and H. Aghajan, “Smart home care networkusing sensor fusion and distributed vision-based reasoning,” in Proc. 4thInt. Workshop Video Surveill. Sens. Netw. (VSSN), Oct. 2006, pp. 145–154.

[38] S. Megerian, F. Koushanfar, M. Potkonjak, and M. B. Srivastava,“Worst and best-case coverage in sensor networks,” IEEE Trans.Mobile Comput., vol. 4, no. 1, pp. 84–92, Jan./Feb. 2005, doi:10.1109/TMC.2005.15(410)4.

[39] V. Chandramohan and K. Christensen, “A first look at wired sen-sor networks for video surveillance systems,” in Proc. 27th Annu.IEEE Conf. Local Comput. Netw. (LCN), Nov. 2002, pp. 728–729.

[40] Z. Dimitrijevic, G. Wu, and E. Y. Chang, “SFINX: A multi-sensorfusion and mining system,” in Proc. 2003 Joint Conf. Fourth Int.Conf. Inf., Commun. Signal Process., Dec., vol. 2, pp. 1128–1132, doi:10.1109/ICICS.2003.1292636.

[41] A. Hampapur, L. Brown, J. Connell, A. Ekin, N. Haas, M. Lu, H. Merkl,S. Pankanti, A. Senior, C.-F. Shu, and Y. L. Tian, “Smart video surveil-lance: Exploring the concept of multiscale spatiotemporal tracking,”IEEE Signal Process. Mag., vol. 22, no. 2, pp. 38–51, Mar. 2005, doi:10.1109/MSP.20005.1406476.

[42] S. Bandini and F. Sartori, “Improving the effectiveness of monitoring andcontrol systems exploiting knowledge-based approaches,” Pers. Ubiqui-tous Comput., vol. 9, no. 5, pp. 301–311, Sep. 2005, doi: 10.1007/s00779-004-0334-3.

[43] H. Detmold, A. Dick, K. Falkner, D. S. Munro, A. Van Den Hengel,and P. Morrison, “Middleware for video surveillance networks,” in Proc.1st Int. Workshop Middleware Sens. Netw. (MidSens), Nov.–Dec. 2006,pp. 31–36.

[44] R. Seals, “Mobile robotics,” Electron. Power, vol. 30, no. 7, pp. 543–546,Jul. 1984, doi: 10.1049/ep.1984.0286.

[45] S. Harmon, “The ground surveillance robot (GSR): An autonomous vehi-cle designed to transit unknown terrain,” IEEE J. Robot. Autom., vol. RA-3, no. 3, pp. 266–279, Jun. 1987, doi: 10.1109/JRA.1987.1087091.

[46] S. Harmon, G. Bianchini, and B. Pinz, “Sensor data fusion through adistributed blackboard,” in Proc. IEEE Int. Conf. Robot. Autom., Apr.1986, pp. 1449–1454.

[47] J. White, H. Harvey, and K. Farnstrom, “Testing of mobile surveillancerobot at a nuclear power plant,” in Proc. IEEE Int. Conf. Robot. Autom.,Mar. 1987, pp. 714–719.

[48] D. Di Paola, D. Naso, A. Milella, G. Cicirelli, and A. Distante, “Multi-sensor surveillance of indoor environments by an autonomous mobilerobot,” in Proc. 15th Int. Conf. Mechatronics Mach. Vis. Pract. (M2VIP),Dec. 2008, pp. 23–28, doi: 10.1109/MMVIP.2008.474501.

[49] A. Bakhtari, M. D. Naish, M. Eskandari, E. A. Cloft, and B. Ben-habib, “Active-vision-based multisensor surveillance—An implemen-tation,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 5,pp. 668–680, Sep. 2006, doi: 10.1109/TSMCC.2005.855525.

[50] J. J. Valencia-Jimenez and A. Fernandez-Caballero, “Holonic multi-agent systems to integrate multi-sensor platforms in complex surveil-lance,” in Proc. IEEE Int. Conf. Video Signal Based Surveill. (AVSS),Nov. 2006, p. 49, doi: 10.1109/AVS.2006.58.

[51] Y.-C. Tseng, Y.-C. Wang, K.-Y. Cheng, and Y.-Y. Hsieh, “iMouse: Anintegrated mobile surveillance and wireless sensor system,” Computer,vol. 40, no. 6, pp. 60–66, Jun. 2007, doi: 10.1109/MC.2007.211.

[52] J. N. K. Liu, M. Wang, and B. Feng, “iBotGuard: An internet-basedintelligent robot security system using invariant face recognition againstintruder,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 1,pp. 97–105, Feb. 2005, doi:10.1109/TSMCC.2004.840051.

[53] H. Liu, O. Javed, G. Taylor, X. Cao, and N. Haering, “Omni-directionalsurveillance for unmanned water vehicles,” presented at the 8th Int.Workshop Vis. Surveill., Marseilles, France, Oct. 2008.

[54] I. Pavlidis, V. Morellas, P. Tsiamyrtzis, and S. Harp, “Urban surveillancesystems: From the laboratory to the commercial world,” Proc. IEEE,vol. 89, no. 10, pp. 1478–1497, Oct. 2001, doi: 10.1109/5.959342.

[55] J. Krikke, “Intelligent surveillance empowers security analysts,” IEEEIntell. Syst., vol. 21, no. 3, pp. 102–104, May/Jun. 2006.

[56] P. K. Sagiraju, S. Agaian, and D. Akopian, “Reduced complexity ac-quisition of GPS signals for software embedded applications,” IEEProc.-Radar Sonar Navig., vol. 153, no. 1, pp. 69–78, Feb. 2006, doi:10.1049/ip-rsn:20050091.

[57] R. Cucchiara, “Multimedia surveillance systems,” in Proc. 3rd Int. Work-shop Video Surveill. Sens. Netw. (VSSN), Nov. 2005, pp. 3–10.

[58] M. Greiffenhagen, D. Comaniciu, H. Niemann, and V. Ramesh, “Design,analysis, and engineering of video monitoring systems: An approach anda case study,” Proc. IEEE, vol. 89, no. 10, pp. 1498–1517, Oct. 2001,doi: 10.1109/5.959343.

[59] M. Valera and S. A. Velastin, “Real-time architecture for a large dis-tributed surveillance system,” in Proc. IEE Intell. Surveill. Syst., London,U.K., Feb. 2004, pp. 41–45.

[60] C. Micheloni, L. Snidaro, L. Visentini, and G. L. Foresti, “Sensorbandwidth assignment through video annotation,” in Proc. IEEE Int.Conf. Video Signal Based Surveill. (AVSS), Nov. 2006, pp. 48–48, doi:10.1109/AVSS.2006.102.

[61] R. Bowden and P. KaewTraKulPong, “Towards automated wide areavisual surveillance: Tracking objects between spatially-separated, uncal-ibrated views,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2,pp. 213–223, Apr. 2005, doi: 10.1049/ip-vis: 20041233.

[62] C. Micheloni, G. L. Foresti, and L. Snidaro, “A network of co-operativecameras for visual surveillance,” IEE Proc.-Vis. Image Signal Process.,vol. 152, no. 2, pp. 205–212, Apr. 2005, doi: 10.1049/ip-vis: 20041256.

[63] M. Albanese, R. Chellappa, V. Moscato, A. Picariello, V. S. Sub-rahmanian, P. Turaga, and O. Udrea, “A constrained probabilisticpetri net framework for human activity detection in video,” IEEETrans. Multimedia, vol. 10, no. 8, pp. 1429–1443, Dec. 2009, doi:10.1109/TMM.2008.2010417.

[64] L. Yuan, A. Haizhou, T. Tamashita, L. Shihong, and M. Kaware,“Tracking in low frame rate video: A cascade particle filter with dis-criminative observers of different life spans,” IEEE Trans. PatternAnal. Mach. Intell., vol. 30, no. 10, pp. 1728–1740, Oct. 2008, doi:10.1109/TPAMI.2008.73.

[65] R. Cucchiara, C. Grana, A. Prati, and R. Vezzani, “Computer visionsystem for in-house video surveillance,” IEE Proc.-Vis. Image SignalProcess., vol. 152, no. 2, pp. 242–249, Apr. 2005, doi: 10.1049/ip-vis:20041215.

[66] J. A. Besada, J. Garcia, J. Portillo, J. M. Molina, A. Varona, and G. Gonza-lex, “Airport surface surveillance based on video images,” IEEE Trans.Aerosp. Electron. Syst., vol. 41, no. 3, pp. 1075–1082, Jul. 2005, doi:10.1109/TAES.2005.1541452.

[67] S. M. Khan and M. Shah, “Tracking multiple occluding people by local-izing on multiple scene planes,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 31, no. 3, pp. 505–519, Mar. 2009, doi: 10.1109/TPAMI.2008.102.

[68] W. Hu, M. Hu, X. Zhou, T. Tan, J. Lou, and S. Maybank, “Principal axis-based correspondence between multiple cameras for people tracking,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 663–671, Apr.2006, doi: 10.1109/TPAMI.2006.80.

[69] D.-Y. Chen, K. Cannons, H.-R. Tyan, S.-W. Shih, and H.-Y. M. Liao,“Spatiotemporal motion analysis for the detection and classification ofmoving targets,” IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1578–1591,Dec. 2008, doi:10.1109/TMM.2008.2007289.

[70] F. Yin, D. Makris, and S. A. Velastin, “Time efficient ghost removal formotion detection in visual surveillance systems,” Electron. Lett., vol. 44,no. 23, pp. 1351–1353, Nov. 2008, doi: 10.1049/el:20082118.

[71] Y. Wang, D. Bowman, D. Krum, E. Coelho, T. Smith-Jackson, D. Bailey,S. Peck, S. Anand, T. Kennedy, and Y. Abdrazakov, “Effects on videoplacement and spatial context presentation on path reconstruction taskswith contextualized videos,” IEEE Trans. Vis. Comput. Graph., vol. 14,no. 6, pp. 1755–1762, Nov./Dec. 2008, doi:10.1109/TVCG.2008.126.

[72] W. Hu, D. Xie, Z. Fu, W. Zeng, and S. Maybank, “Semantic-basedsurveillance video retrieval,” IEEE Trans. Image Process., vol. 16, no. 4,pp. 1168–1181, Apr. 2007, doi:10.1109/TIP.2006.891352.

[73] L. Snidaro, R. Niu, G. L. Foresti, and P. K. Varshney, “Quality-basedfusions of multiple video sensors for video surveillance,” IEEE Trans.Syst., Man, Cybern. – Part B: Cybern., vol. 37, no. 4, pp. 1044–1051,Aug. 2007, doi: 10.1109/TSMCB.2007.895331.

[74] P. K. Atrey, M. S. Kankanhalli, and R. Jain, “Timeline-based informa-tion assimilation in multimedia surveillance and monitoring systems,” inProc. 3rd Int. Workshop Video Surveill. Sens. Netw. (VSSN), Nov. 2005,pp. 103–112.

[75] B. Hardian, “Middleware support for transparency and user control incontext-aware systems,” presented at the 3rd Int. Middleware DoctoralSymp. (MDS), Melbourne, Australia, Nov.–Dec. 2006.


[76] A. Dore, M. Pinasco, and C. S. Regazzoni, “A bio-inspired learningapproach for the classification of risk zones in a smart space,” in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8, doi:10.1109/CVPR.2007.383440.

[77] E. Blasch and S. Plano, “Proactive decision fusion for site security,”in Proc. 8th Int. Conf. Inf. Fusion, Jul. 2005, pp. 1584–1591, doi:10.1109/ICIF.2005.1592044.

[78] F. Castanedo, M. A. Patricio, J. Garcia, and J. M. Molina, “Robust datafusion in a visual sensor multi-agent architecture,” in Proc. 10th Int.Conf. Inf. Fusion, Jul. 2007, pp. 1–7, doi: 10.1109/ICIF.2007.4408121.

[79] Y.-C. Tseng, T.-Y. Lin, Y.-K. Liu, and B.-R. Lin, “Event-drivenmessaging services over integrated cellular and wireless sensor net-works: Prototyping experiences of a visitor system,” IEEE J. Sel.Areas Commun., vol. 23, no. 6, pp. 1133–1145, Jun. 2005, doi:10.1109/JSAC.2005.845623.

[80] J.-S. Lee, “A petri net design of command filters for semiautonomousmobile sensor networks,” IEEE Trans. Ind. Electron., vol. 55, no. 4,pp. 1835–1841, Apr. 2008, doi: 10.1109/TIE.2007.911926.

[81] E. Norouznezhad, A. Bigdeli, A. Postula, and B. C. Lovell, “A high res-olution smart camera with GigE vision extension for surveillance appli-cations,” in Proc. Second ACM/IEEE Int. Conf. Distrib. Smart Cameras,Sep. 2008, pp. 1–8, doi: 10.1109/ICDSC.2008.4635711.

[82] S. Appadwedula, V. V. Veeravalli, and D. L. Jones, “Energy-efficientdetection in sensor networks,” IEEE J. Sel. Areas Commun., vol. 23,no. 4, pp. 693–702, Apr. 2005, doi: 10.1109/JSAC.2005.843536.

[83] A. Maier, B. Rinner, W. Schriebl, and H. Schwabach, “Online multi-criterion optimization for dynamic power-aware camera configura-tion in distributed embedded surveillance clusters,” in Proc. 20th Int.Conf. Adv. Inf. Netw. Appl. (AINA 2006), Apr., pp. 307–312, doi:10.1109/AINA.2006.250.

[84] H. Liu, X. Jia, P.-J. Wan, C.-W. Yi, S.-K. Makki, and N. Pissnou,“Maximizing lifetime of sensor surveillance systems,” IEEE/ACMTrans. Netw., vol. 15, no. 2, pp. 334–345, Apr. 2007, doi:10.1109/TNET.2007.892883.

[85] Y. Imai, Y. Hori, and S. Masuda, “Development and a brief evaluationof a web-based surveillance system for cellular phones and other mo-bile computing clients,” in Proc. Conf. Hum. Syst. Interact., May 2008,pp. 526–531, doi: 10.1109/HSI.2008.4581494.

[86] V. A. Petrushin, O. Shakil, D. Roqueiro, G. Wei, and A. V. Gershman,“Multiple-sensor indoor surveillance system,” in Proc. 3rd Can. Conf.Comput. Robot Vis., Jun. 2006, p. 40, doi:10.1109/CRV.2006.50.

[87] P. Korshunov and W. T. Ooi, “Critical video quality for distributed au-tomated video surveillance,” in Proc. 13th Annu. ACM Int. Conf. Multi-media, Nov. 2005, pp. 151–160.

[88] A. May, J. Teh, P. Hobson, F. Ziliani, and J. Reichel, “Scalable videorequirements for surveillance systems,” IEE Intell. Surveill. Syst., pp. 17–20, Feb. 2004.

[89] A. Avritzer, J. P. Ros, and E. Weyuker, “Reliability testing of rule-based systems,” IEEE Softw., vol. 13, no. 5, pp. 76–82, Sep. 1996, doi:10.1109/52.536461.

[90] K. Shafique, F. Guo, G. Aggarwal, Z. Rasheed, X. Cao, and N. Haering,“Automatic geo-registration and inter-sensor calibration in large sen-sor networks,” in Smart Cameras. New York: Springer-Verlag, 2009,pp. 245–257.

[91] C. Caricotte, X. Desurmont, B. Ravera, F. Bremond, J. Orwell, S. A. Ve-lastin, J. M. Obodez, B. Corbucci, J. Palo, and J. Cernocky, “Towardgeneric intelligent knowledge extractions from video and audio: TheEU-funded CARETAKER project,” in Proc. Inst. Eng. Technol. Conf.Crime Secur., Jun. 2006, pp. 470–475.

[92] S. Fleck and W. Strasser, “Smart camera based monitoring system andits application to assisted living,” Proc. IEEE, vol. 96, no. 10, pp. 1698–1714, Oct. 2008, doi:10.1109/JPROC.2008.928765.

[93] M. S. Kankanhalli and Y. Rui, “Application potential of multimediainformation retrieval,” Proc. IEEE, vol. 96, no. 4, pp. 712–720, Apr.2008, doi: 10.1109/JPROC.2008.916383.

[94] A. Prati, R. Vezzani, L. Benini, E. Farella, and P. Zappi, “An integratedmulti-modal sensor network for video surveillance,” in Proc. ACM Int.Workshop Video Surveill. Sens. Netw., Nov. 2005, pp. 95–102.

[95] S. Calderara, R. Cucchiara, and A. Prati, “Multimedia surveillance:Content-based retrieval with multicamera people tracking,” in Proc. ACMInt. Workshop Video Surveill. Sens. Netw., Oct. 2006, pp. 95–100.

[96] P. K. Atrey, M. S. Kankanhalli, and R. Jain, “Information assimila-tion framework for event detection in multimedia surveillance systems,”ACM Multimedia Syst. J., vol. 12, no. 3, pp. 239–253, Dec. 2006.

[97] J. Kim, J. Park, K. Lee, K.-H. Baek, and S. Kim, “A portable surveil-lance camera architecture using one-bit motion detection,” IEEE Trans.Consum. Electron., vol. 53, no. 4, pp. 1254–1259, Nov. 2007, doi:10.1109/TCE.2007.4429209.

[98] L. Havasi, Z. Szlavik, and T. Sziranyi, “Detection of gait charac-teristics for scene registration in video surveillance system,” IEEETrans. Image Process., vol. 16, no. 2, pp. 503–510, Feb. 2007, doi:10.1109/TIP.2006.88839.

[99] Y. Huang, X. Ao, Y. Li, and C. Wang, “Multiple biometrics system basedon DavinCi platform,” in Proc. Int. Symp. Inf. Sci. Eng. (ISISE), Dec.2008, pp. 88–92, doi: 10.1109/ISISE.2008.163.

[100] L.-Q. Xu, “Issues in video analytics and surveillance systems: Re-search/prototyping vs. applications/user requirements,” in Proc. IEEEConf. Adv. Video Signal Based Surveill. (AVSS), Sep. 2007, pp. 10–14,doi: 10.1109/AVSS.2007.4425278.

[101] J. A. O. O’Sullivan and R. Pless, “Advances in security technologies:Imaging, anomaly detection, and target and biometric recognition,” inProc. IEEE/MTT-S Int. Microw. Symp., Jun. 2007, pp. 761–764, doi:10.1109/MWSYM.2007.380051.

[102] H. Gupta, X. Cao, and N. Haering, “Map-based active leader-followersurveillance system,” presented at the Workshop Multi-Camera Multi-Modal Sens. Fusion Algorithms Appl. (M2SFA2), Marseille, France,Oct. 2008.

[103] GE Security website. (2009). [Online]. Available: http://www.gesecurity.com/portal/site/GESecurity

[104] ObjectVideo website. (2009). [Online]. Available: http://www.objectvideo.com/company/

[105] IOImage website. (2009). [Online]. Available: http://www.ioimage.com/[106] RemoteReality website. (2009). [Online]. Available: http://www.

remotereality.com/[107] PointGrey webiste. (2009). [Online]. Available: http://www.ptgrey.com/

Tomi D. Raty received the Ph.D. degree in informa-tion processing science from the University of Oulu,Oulu, Finland, in 2008.

He is currently a Senior Research Scientist anda Team Leader of the Software Platforms Team atVTT Technical Research Centre of Finland, Oulu.His research interests include surveillance systems,model-based testing, network monitoring, softwareplatforms, and middleware. He is the author or coau-thor of more than 20 papers published in variousconferences and journals.

Dr. Raty has served as a Reviewer for IEEE TRANSACTIONS ON MOBILE

COMPUTING and in several conferences.

Documents

2 Surveillance