54
Project 35 Visual Surveillance of Urban Scenes

Project 35 Visual Surveillance of Urban Scenes. PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES Principal Investigators David Clausi, Waterloo Geoffrey

Embed Size (px)

Citation preview

Project 35

Visual Surveillance of Urban Scenes

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Principal Investigators

• David Clausi, Waterloo

• Geoffrey Edwards, Laval

• James Elder, York (Project Leader)

• Frank Ferrie, McGill (Deputy Leader)

• James Little, UBC

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Partners

• Honeywell (Jeremy Wilson)

• CAE (Ronald Kruk)

• Aimetis (Mike Janzen)

bleuant
mention CAE Strive

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Participants

Postdoctoral Fellows

Francisco J. Estrada (York)

Bruce Yang (Waterloo)

Students

Eyhab Al-Masri (Waterloo)

Kurtis McBride (Waterloo)

Natalie Nabbout (Waterloo)

Isabelle Begin (McGill)

Albert Law (McGill)

Prasun Lala (McGill)

John Harrison (McGill)

Antoine Noel de Tilly (Laval)

Samir Fertas (Laval)

Michael Yurick (UBC)

Wei-Lwun Lu (UBC)

Patrick Denis (York)

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Goals

• Visual surveillance of urban scenes can potentially be used to enhance human safety and security, to detect emergency events, and to respond appropriately to these events.

• Our project investigates the development of intelligent systems for detecting, identifying, tracking and modeling dynamic events in an urban scene, as well as automatic methods for inferring the three-dimensional static or slowly-changing context in which these events take place.

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Results

• Here we demonstrate new results in the automatic estimation of 3D context and automatic tracking of human traffic from urban surveillance video.

• The CAE S-Mission real-time distributed computing environment is used as a substrate to integrate these intelligent algorithms into a comprehensive urban awareness network.

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

STRIVE-SFX

HLA - EMS-FOM

Facility Level

...

GPS Feed

CAM Feed

TerraVizUI

TerraVizUI

TerraVizUI

EMS-PAServer

EMS-EnvServer

STRIVE-TFXTerrain Server

ActenumScheduler System

Server

STRIVE-SFX

STRIVE-SFX

McGillVideo-Cam Traffic Analyser

Server

...

STRIVE-SFX

STRIVE-SFX STRIVE-SFX

Feed to Legacy System

ActenumProtocol

ActenumProtocol

SARLOG

HistoricalTraffic Data

GISData Historical Calls

Post ListsConstraints

AppSpy

AppSpy

AppSpy

LegacyProtocol

LegacyProtocol

LegacyProtocol

Note: To provide a faillure-safe architecture allthe database disks need to be duplicatedand provide a dual access ( or a raidsystem could be used). The four servershave to be duplicated as backup serversand share the dual access databases withthe main system. The backup servers aremonitoring the status of the main systemsand when a faillure of the main system isdetected, they reinitialized their internalstates from the last SAR & LOG of themain system and resume operations.

CAE Professional Services

CAE Inc

McGill University

Actenum Proprietary CAE Inc 2007

dispatcher dispatcher dispatcher

logic

HLA

logs other typesof logs

historicdata

CAE STRIVE ARCH.

3D Urban Awareness from Single-View Surveillance Video

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

3D Urban Awareness

• 3D scene context (e.g., ground plane information) is crucial for the accurate identification and tracking of human and vehicular traffic in urban scenes.

• 3D scene context is also important for human interpretation of urban surveillance data

• Limited static 3D scene context can be estimated manually, but this is time-consuming, and cannot be adapted to slowly-changing scenes.

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Ultimate Goal

• Our ultimate goal is to automate this process!

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Immediate Goal

• Automatic estimation of the three vanishing points corresponding to the “Manhattan directions”.

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Manhattan Frame Geometry

• An edge is aligned to a vanishing point if the interpretation plane normal is orthogonal to the vanishing point vector in the Gaussian Sphere (i.e. dot product is 0)

Optical Centre

vanishing pointvectorGaussian

Sphere

InterpretationPlane

OrientedEdges

Interpretation planenormal

Image Plane

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Mixture Model• Each edge Eij in the image is

generated by one of four possible kinds of scene structure:

– m1-3: a line in one of the three Manhattan directions

– m4: non-Manhattan structure

• The observable properties of each edge Eij are:

– position

– angle

• The likelihoods of these observations are co-determined by:

– The causal process (m1-4)

– The rotation Ψ of the Manhattan frame relative to the camera

mimi

mimi

E11 E12

E22E21

Ψ

Image

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Mixture Model

• Our goal is to estimate the Manhattan frame Ψ from the observable data Eij. mimi

mimi

E11 E12

E22E21

Ψ

Image

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

E-M Algorithm• E Step

– Given an estimate of the Manhattan coordinate frame, calculate the mixture probabilities for each edge

m1

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

E-M Algorithm• E Step

– Given an estimate of the Manhattan coordinate frame, calculate the mixture probabilities for each edge

m2

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

E-M Algorithm• E Step

– Given an estimate of the Manhattan coordinate frame, calculate the mixture probabilities for each edge

m3

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

E-M Algorithm• E Step

– Given an estimate of the Manhattan coordinate frame, calculate the mixture probabilities for each edge

m4

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

E-M Algorithm• M Step

– Given estimates of the mixture probabilities for each edge, update our estimate of the Manhattan coordinate frame

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Results

X Y Z0

1

2

3

4

5

6

7

8

9

Manhattan Directions

Ab

solu

te A

ng

ula

r D

evia

tio

n

Mean Error Over Entire Test Image Database

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Results

• Convergence of the E-M algorithm for example image

0 2 4 6 8 10 12 14 16 180

5

10

15

20

25

30

35

40

Iteration

Ab

so

lute

An

gu

lar

De

via

tio

n

Vanishing point XVanishing point YVanishing point Z

Test Image

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Results

• Example: lines through top 10 edges in each Manhattan direction

Tracking Human Activity

Single-Camera Tracking

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Tracking Using Only Colour / Grey Scale

• Tracking using only grey scale or colour features can lead to errors

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Tracking Using Dynamic Information

• Incorporating dynamic information enables successful tracking

Tracking over Multi-Camera Network

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Goal

• Integrate tracking of human activity from multiple cameras into world-centred activity map

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Input left and right sequences

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Independent tracking

• Each person tracked independently in each camera using Boosted Particle Filters.

– Background subtraction identifies possible detections of people which are then tracked with a particle filter using brightness histograms as the observation model.

• Tracks are projected via a homography to the street map, and then  Kalman filtered independently based on the error model.

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Independent tracks

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Integration

• Tracks are averaged to approximate joint estimation of composite errors

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Merged trajectories

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Future Work

• Integrated multi-camera background subtraction

• Integrated particle filter in world coordinates using joint observation model over all sensors in network.

Tracking in Dynamic Background Settings

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Foreground Extraction and Tracking in Dynamic Background Settings

• Extracting objects from dynamic backgrounds is challenging

• Numerous applications:

– Human Surveillance

– Customer Counting

– Human Safety

– Event Detection

• In this example, the problem is to extract people from surveillance video as they enter a store through a dynamic sliding door

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Methodology Overview

• Video sequences are pre-processed and corner feature points are extracted

• Corners are tracked to obtain trajectories of the moving background

• Background trajectories are learned and a classifier is formed

• Trajectories of all moving objects in the test image sequences are classified based on learned model into either background or foreground trajectories

• Foreground Trajectories are kept in image sequence and the object corresponding to those trajectories is tagged as foreground

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Demo 1: Successful Tracking and Classification

• This demo illustrates This demo illustrates a case of successful a case of successful tracking and tracking and classification of an classification of an entering person. entering person.

• The person is The person is classified into classified into foreground based on foreground based on the extracted the extracted

trtrajectories..

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Demo 2: Failed Tracking but Successful Classification

• Demo 2 shows a case when the tracker loses track of the person after a few frames

• However, the classification is still correct since only a small number of frames are required to identify the trajectory.

Recognizing Actions using the Boosted Particle Filter

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Motivation

Frame 682 Frame 814

Input

Output

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

update the SPPCA template updater

System Diagram

New frame

BPF Tracker

Tracking results

Action Recognizer

SPPCA Template Updater

Extracted image patches

Output 2:Action labelsof the players

Output 1:Locations/sizesof the players

predict new templates

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

HSV Color Histogram

• The HSV color histogram is composed of:

– 2D histogram of Hue and Saturation

– 1D histogram of Value

+

HueSaturation

2D histogram

Value

1D histogram

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

The HOG descriptor

The HOG descriptor

SIFT descriptor SIFT descriptor

SIFT descriptor SIFT descriptor

Image gradients

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

?

Template Updating: Motivation

• Tracking: search for the location in the image whose image patch is similar to a reference image patch – the template.

• Template Updating: Templates should be updated because the players change their pose.

?? ?

Frame 677 Frame 687

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Template Updating: Operations

• Offline

– Learning: Learn the template model from training data

• Online:

– Prediction:

Predict the new template used in the next frame

– Updating:

Update the template model using the current observation

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

SPPCA Template Updater

New frame

Tracker

Tracking results

Extracted image patches

SPPCA Template Updater

New templates

Update the SPPCA template updater

Predict new templates

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Graphical Model of SPPCA

observation (continuous)

coordinate on the Eigen space

(continuous)

switch to select an Eigen space

(discrete)

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Action Recognizer

• Input: a sequence of image patches

• Output: action labels

Action Recognizer

skating down

skating left

skating right

skating up

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

• Summary:

– Features:

The HOG descriptor

– Classifier:

The SMLR classifier

– Weights:

weights learned by MAP estimation with a sparsity-promoted Laplacian prior

– Basis functions:

motion similarity between the testing and training data

Action Recognizer

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Action Recognizer: Framework

Testing data

Training data

Frame similarity

Weightingmatrix

Motion similarity

Compute the frame-to-frame

similarity

Convolve the frame similarity

with the weighting matrix

SMLR classifier

Action labels

HOG descriptors

HOG descriptors

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Tracking & Action Recognition

Frame 97 Frame 116 Frame 682

Frame 710 Frame 773 Frame 814

PROJECT 35: VISUAL SURVEILLANCE OF URBAN SCENES

Vehicle Tracking