CROWD MODELING USING SOCIAL NETWORKS1 …ijunejo/rimaICIP.pdfCROWD MODELING USING SOCIAL NETWORKS1 Rima Chaker Imran N Junejo Zaher Al Aghbari University of Sharjah, U.A.E. 27272 ABSTRACT

CROWD MODELING USING SOCIAL NETWORKS1

Rima Chaker Imran N Junejo Zaher Al Aghbari

University of Sharjah, U.A.E. 27272

ABSTRACT

In this work, we propose an unsupervised approachfor detecting the anomalies in a crowd scene using so-cial network model. Using a window-based approach,scene objects are first detected and tracked, and a spatio-temporal partitioning is constructed to produce a set ofspatio-temporal cuboids that capture spatial and temporalfeatures. A hierarchical social network is built to modelthe crowd behavior: the bottom-level models local be-havior and the top level models the global. We performanomaly detection and demonstrate the effectiveness ofthe proposed approach on a benchmark crowd analysisvideo sequences. Our results reveal that we outperformmajority, if not all, the state-of-the-art methods.

Index Terms— Crowd Modeling, Social NetworkModel

1. INTRODUCTION

With the advent of technology and the increasing needfor security round-the-clock, video surveillance has beenreceiving an increasing attention from the scientific com-munity in recent years. One of the popular topics in in-telligent video surveillance systems is the crowd analy-sis, i.e. how to automatically detect anomalies in publicplaces or during public events. Early detection, or pre-diction, of abnormal behaviors occurring in surveillancescenes is of utmost significance. By alerting human oper-ators, potential dangerous consequences can be reducedor prevented. However, the analysis of crowded scenesis a very challenging task, due to the fact that the anal-ysis of human actions performed by individuals is stillnot a fully solved problem. Hence, crowd behavior anal-ysis became an active research area in critical domainssuch as detection, or crowd monitoring and surveillanceof riot.

Technically speaking, crowd behavior analysis com-prises the following three main steps: (i) pre-processing(i.e. feature detection and tracking), (ii) motion infor-mation extraction and (iii) abnormal behavior modeling.Some researchers have attempted to track the objectsthroughout the scene. [1] use object tracking to detect

1 THIS WORK IS SUPPORTED BY UNIVERSITY OF SHAR-JAH, U.A.E. (PROJECT #120227).

unusual events in image sequences. Their method trackseach individual object in the scene and constructs modelsto determine abnormality in the objects’ behaviors. Sim-ilarly, [2] track subjects in high density crowded scenes,captured from a distance. They learn the direction ofmotion as prior information based on a force modelfloor fields. However, floor fields are chaotic in crowdedscenes as tracking of each individual in pedestrian envi-ronments results in highly inconsistent trajectories, thusmaking the discrimination between usual and unusualevents extremely difficult. Finally, this approach is con-sidered appropriate only in scenes containing only a fewobjects, as it is difficult to reliably segment and trackeach individual in a crowded environment.

Tracking in a crowded scene is a daunting task,hence researchers resort to optical flow or tracklets: [3]explored the socio-psychological concept social forcein combination with optical flow to compute interactionforces that are later combined with Latent Dirichlet Al-location to model normal behaviors and detect abnormalones. This method is further extended in [4] using Par-ticle Swarm Optimization, in addition to social forcemodel, to optimize the computed interaction force andthus detect global abnormal activities. [5] draw inspira-tion from the existence of Coherent Structures in fluiddynamics for segmenting dominant crowd flows andflow instability detection. Perhaps a more intuitive ap-proach is to find interest points and track them over time.[6] analyzed motion patterns by clustering the extractedtracklets in a crowded scenes. Despite the many differ-ent representations of video events, many of the existingworks ignored the importance of “contextual” anomalyin the field of crowd analysis. Contextual anomaly ariseswhen an individual apparently exhibits a behavior simi-lar to the others but it is anomalous in a specific context(e.g. neighborhood). [7] focus on detecting contextualanomalies in the context of neighborhood motion basedon statistical analysis of detected blobs. [8] detect sub-tle context-dependent behavioral anomalies based uponcontextual information. The strength of their contex-tual features is demonstrated by social and scene con-texts. Social context prevent self-justifying groups andpropagate anomalies in social network. Scene contextimproves the detection of subtly abnormal behaviors.

Beside the motion information, other works includedimportant object features such as appearance or size. [9]handled this restriction by applying Mixture of DynamicTextures (MDT) to jointly model the appearance anddynamics of crowded scenes. Their approach is morereliable in anomaly localization since it investigates bothtemporal and spatial abnormalities. In [10] a sparse re-construction cost is proposed to detect the presence ofanomalies in crowded scenes. They adopted the localspatio-temporal patches to construct the normal dictio-nary that measures the abnormality of a test sample.

In this work, the proposed method is based on thesocial network model. In this regard, [11] explored thestudy of crowd social grouping to solve the multi-persondata-association-based tracking (DAT) problem. Simi-larly, [12] study of social behavior among animals overhumans for some ethical and practical issues. They pro-posed an automatic segmentation and classification ofspontaneous social behavior in continuous video of in-teracting pairs of mice. Since social networks can rep-resent the social relationships among people in crowds,one can leverage the underlying social network for manyapplications like identifying social communities, detect-ing and locating abnormal events, etc. [13] made use ofthe social network in the context of prison surveillance.They address the importance of monitoring and recog-nizing social network for the aim of discovering its com-munities and leadership structures. Specifically, graph-cut solution is applied with the unknown correspondencebetween faces and tracks. Afterwards, modularity-cut al-gorithm is employed to discover social groups and es-timates the group leaders. However they addressed theproblem in closed-world surveillance environments i.e.prisons. In addition, their method requires the identifica-tion of individuals with respect to a pre-defined watch-list of faces.

2. SCENE MODELING WITH SOCIALNETWORKS

This section describes the proposed method, which con-tains the following steps: (i) extract tracklets of the hu-man subjects, (ii) Partition the current time window Ωiinto spatio-temporal cuboids ςj , (iii) construct local andglobal social networks, and (iv) perform anomaly detec-tion. KLT tracker [14] is used to obtain the tracklets ofthe objects in the scene.

In order to group tracklets that exhibit similar behav-ior, especially for our application, we focus on selectingthe features that account for the (i) direction and magni-tude of the motion, and (ii) distance between the movingobjects. Thus, we use the following measures:Cosine Similarity: Cosine similarity measures the co-sine of the angle between two tracklets and the value is

in the range [0; 1]. Let φτu , φτv denote the dominantdirections of tracklets τu and τv respectively, the cosinesimilarity is defined as [15]:

sτu,τvφ = (φτu · φτv‖φτu‖.‖φτv‖

+ 1)× 1

2

Magnitude Similarity: Let ρτu , ρτv denote the denotethe magnitudes of tracklets τu and τv respectively, themagnitude similarity is defined as [15]:

sτu,τvρ = 1− |ρτu − ρτv |max(ρτu , ρτv )

Combining the above two similarities produces:

sτu,τvφρ = α.sτu,τvφ + (1− α).sτu,τvρ (1)

with 0 ≤ α ≤ 1 is the learning parameter.Velocity Similarity: The similarity in motion character-istics is denoted by sτu,τvv and is computed using the Dy-namic Time Warping (DTW) algorithm, where the dis-

tance measure used is qv(i, j) = e

−dvxσvx +e

−dvyσvy

2 , wheredvx and dvy represents velocity distance between the twotracklets along the x and y axis, respectively.Definition 1 (Social Similarity Measure): The socialsimilarity measure between τu and τv is defined as

sτu,τvφρv = β.sτu,τvφρ + (1− β).sτu,τvv (2)

with 0 ≤ β ≤ 1 is the learning parameter.

2.1. Building Social Networks

Social network is a social structure represented as a graphin which nodes represent objects and edges represent so-cial interactions between pedestrians. The social inter-action weights are based on our social similarity weightmeasure eq (2). Once the tracklets are obtained fromthe scene, we apply the spatio-temporal algorithm thatproduces a dense number of cuboids ς1, ς2, . . . , ςNr×Nc (2 × 2, 4 × 4 or 8 × 8 depending on the scene character-istics) as well as the time windows Ω1,Ω1, . . . ,ΩNΩ

(cf.Fig. 1). Within each cuboid in each time window, we dothe following:(1) Closeness centrality On a graph, a geodesic betweentwo nodes is a path connecting the nodes with the small-est number of edges. The classical definition of the close-ness centrality is (the inverse of) the average distanceto all other nodes [16]. Since similar behaving track-lets need to be spatially close to each other, in additionto the social similarity measure, i.e. eq (2), we use thecloseness centrality among connected tracklet nodes forpruning only: if similar tracklet nodes are spatially dis-tant (threshold td), their connecting edge is deleted. Thisis then followed by applying the connected component

Partition the window into cuboids

Merge local social network into global social network

global social network components within window

local social network components within cuboids

Fig. 1: System overview

algorithm to the whole network. The idea is to identifythe different dynamics of the scene, represented by theclusters in the network. We call each of these clustersobtained at this stage as local social network (LSN ).Local Social Network: A centroid is used to representeach LSN , denoted by CentSNi . It consists of themean vector of its tracklets’ features, which are spatial< x,y >, direction φ, magnitude ρ, and velocity v.These LSN capture the local dynamics of each ςi.Global Social Network (GSN ): Once we obtain LSNifrom the previous step, clustering is employed to havea coarser view of the scene. Our measure of the clustercentroid similarity is defined by eq (2). If similarity oftwo comparable social network’s centroid is sufficientlyhigh, i.e. greater than a user-defined threshold tf , thenthe networks are merged together into one bigger net-work, otherwise they are reported as disjoint networks.For this, we perform Hierarchical Agglomerative Clus-tering [17]. Fig.1 illustrates the framework of our hierar-chical clustering algorithm. Thus the merging is based onthe social similarity measure as defined above - betweenevery two LSN components’ centroid, i.e. CentSNi andCentSNj of LSNi and LSNj , respectively. If the sim-ilarity measure between LSNi and LSNj is above tf ,they merge to form: GSNij = LSNi ∪ LSNj , and anew Cent for this is computed as well.

This bottom-up approach aims to merge similarLSNs from different cuboids towards discovering theglobal social network GSNj within time window Ωk,thus capturing the dynamics of the whole window Ωi (cf.Fig 2).

2.2. Anomaly Detection

Our social similarity measure, in addition to the size ofsocial network, is considered an essential step for identi-fying abnormality. For each cuboid ςj , to identify whichlocal social network component(s) LSN ςj

i is anomalous,we use:

LAςj =LSN

ςjh

maxsize(LSNςji )

, 1 ≤ i ≤ n and i 6= h

Thus, if the relative size of the tested local social networkcomponent is less than th% the size of the densest local

s12s

s34s34

s56

Loca

l Soc

ial N

etw

ork

per c

uboi

dsGl

obal

Soc

ial N

etw

ork

per w

indo

w

s12s14

s23

s34 s13 s67

ss12s14

ss23

s3ss4

3333333333

ssss113s12

s12

s23s45

s56

s9,10

s78

s45s56

s12s13

s23

s34

ssssssssssssssss14

s16s17s18

s19

s20

s21

s22s23

s25

s31

s35

s41

s42

s46

s51s50 s53

s24

s12s23

s9,10

s78

s34ssssssssssssssssssssssss33333333333333333333333333333344444444444444444444

s67

s13

s14

s24

s29s31

s32

s33

s35

s36

s1,14

s12,14

s13,14

s23,14

s11,12

s13,14 s10,14

s8,9

s7,9

s6,8

s3,7s47

s13,10

s5,1

ss2ss55511

00

ss19

66,,888

sss56666

12s12

44

ss12

s2

2

999 1099,,110

5

s565 s78

s212 s67s67

Fig. 2: GSN per window obtained by grouping LSNi.

social network component in the neighborhood, it is clas-sified as an anomalous local social network component.

Global anomalies are detected in the similar way, us-ing:

GAςi =GSNΩi

h

maxsize(GSNΩij )

, 1 ≤ j ≤ n and j 6= h

if the relative size of the tested GSNΩih is less than th%

the size of the densest GSN component in the neighbor-hood maxsize(GSNΩi

j ), it is classified as an anoma-lous. In summary, the method starts by determiningthe size of each social network component based on thenumber of its tracklets members. Then it determines thelargest or densest social network in the neighborhood.The algorithm, hence, uses a threshold ta to differentiateeach non-densest social network as normal or abnormalcomponent. If the relative size of each tested social net-work component is less than th% the size of the densestsocial network component in the neighborhood then it isclassified as an anomalous social network component.

3. EXPERIMENTS & RESULTS

We conducted an extensive set of experiments on the fol-lowing crowd dataset: the UCSD pedestrian dataset. Thedataset contains two sequences: UCSD Ped1, containinggroups of people walking towards and away from thecamera with some amount of perspective distortion, andUCSD Ped2, containing groups of people walking inparallel to the camera plane (cf. Fig. 3). Commonly oc-curring anomalies include small golf carts in the scene,skaters, bikes, and people in wheelchairs. Each scene

Fig. 3: Examples of anomaly detections using (col 1) theMDT approach[9], (col 2) the SF-MPPCA ap-proach [9], (col 3) our detection approach and(col 4) our tracking results. For MDT, its abnor-mal detection foreground mask is too large thusits results are not accurate; and for SF-MPPCA,it inaccurately detects the small car (row 1), com-pletely misses the bike in (row 2), completelymisses the skater in (row 3) and produces spu-rious abnormality.

was divided into clips of about 200 frames, and res-olution of 158 x 238 for UCSD Ped1 and resolution360 x 240 for UCSD Ped2. Our proposed method runall the experiments on a PC computer with an Intel(R)Core(TM) i5−2400 3.10GHz CPU and 4GB RAM underthe MATLAB implementation. For both sequences, thesearch window size for tracking is set to 16 x 16 pixelsin the Lucas-Kanade based methods, with three pyramidlevels, around 1000 foreground features and 4 as theminimum distance imposed between corner features. Forthe construction of feature vectors, we partition each 50frames time window into 8 x 8 spatio-temporal cuboids.The values for α and β are chosen experimentally to be0.8 and 0.4, respectively; and th% = 50%.

We use the frame-level criterion: a frame is con-sidered an anomaly if it contains at least one abnormalpixel, and denoted as positive. For the LSN, we mea-sure the frame accuracy. For the global evaluation, weuse the Receiver Operating Characteristic (ROC) curve,which is based on True Positive Rate (TPR) and FalsePositive Rate (FPR) [18]. Area under the curve(AUC)is computed from the ROC curve to compare our methodto the other methods, as shown in Fig. 4, and comparedin Table 1. Rate of Detection [9] is shown in Table 2.

It is clear that our method clearly out performs thestate-of-the-art methods.

4. CONCLUSION

We present a general method that covers both local andglobal anomalous events without the need for back-

Fig. 4: ROC Curves: (top) Frame-level for UCSD-Ped1.(bottom) Frame-level for UCSD-Ped2.

MDT [9] SRC [10] Our MethodUCSD-Ped1 81.8% 86.0% 85.1%UCSD-Ped2 84.8% 86.1% 87.0%

Table 1: Quantitative frame-level comparison using theAUC for our method and the state-of-the-artmethods.

MDT [9] SRC [10] Our Methodlocalization 45% 46% 47.4%

Table 2: The quantitative comparison of the detectionrate (RD) at equal error for the anomaly local-ization task on UCSD Ped1.

ground/foreground segmentation and individual track-ing. We performed learning and tested on standarddataset and compared our results with the state-of-the-art. The main advantages of our method are that it learnsthe dominant behavior in an unsupervised manner whilesimultaneously detecting anomalous patterns. Such char-acteristic of a visual surveillance system operates betterin an unconstrained environment.

5. REFERENCES

[1] A. Gritai A. Basharat and M. Shah, “Learning object mo-tion patterns for anomaly detection and improved objectdetection,” in Computer Vision and Pattern Recognition.IEEE, 2008, pp. 1–8.

[2] S. Ali and M. Shah, “Floor fields for tracking in high

density crowd scenes,” in Computer Vision-ECCV. IEEE,2008, pp. 1–14.

[3] A. Oyama R. Mehran and M. Shah, “Abnormal crowdbehavior detection using social force model,” in Com-puter Vision and Pattern Recognition. IEEE, 2009, pp.935–942.

[4] A.D. Bue R. Raghavendra and M. Cristani, “Optimizinginteraction force for global anomaly detection in crowdedscenes,” in Computer Vision Workshops (ICCV Work-shops). IEEE, 2011, pp. 136–143.

[5] S. Ali and M. Shah, “A lagrangian particle dynamics ap-proach for crowd flow segmentation and stability analy-sis,” in Computer Vision and Pattern Recognition. IEEE,2007, pp. 1–7.

[6] Z. Yi W. Chongjing, Z. Xu and L. Yuncai, “Analyzingmotion patterns in crowded scenes via automatic trackletsclustering,” Communications, China 10, vol. 4, pp. 144–154, 2013.

[7] Y. Wu F. Jiang and A.K. Katsaggelos, “Detecting contex-tual anomalies of crowd motion in surveillance video,” inConference on Image Processing (ICIP). IEEE, 2009, pp.1117–1120.

[8] Ed.P. Sparks M. J. V. Leach and N.M. Robertson,“Contextual anomaly detection in crowded surveillancescenes,” Pattern Recognition Letters, vol. 44, pp. 71–79,2013.

[9] W. Li V. Mahadevan and V. Bhalodia, “Anomaly detec-tion in crowded scenes,” in Computer Vision and PatternRecogniton (CVPR). IEEE, 2010, pp. 1975–1981.

[10] J. Yuan Y. Cong and J. Liu, “Sparse reconstruction costfor abnormal event detection,” in Computer Vision andPattern Recogniton (CVPR). IEEE, 2011, pp. 3449–3456.

[11] Z. Qin and C.R. Shelton, “Improving multi-target track-ing via social grouping,” in Computer Vision and PatternRecogniton (CVPR). IEEE, 2012, pp. 1972–1978.

[12] D. Lin D.J. Anderson P. Perona X.P. Burgos-Artizzu,P. Dollr, “Social behavior recognition in continuousvideo,” in Computer Vision and Pattern Recogniton(CVPR). IEEE, 2012, pp. 1322–1329.

[13] T. Yu, K. Patwardhan S. Lim, and N.Krahnstoever, “Mon-itoring, recognizing and discovering social networks,” inComputer Vision and Pattern Recogniton (CVPR). IEEE,2009, pp. 1462–1469.

[14] J.Y. Bouguet, “Pyramidal implementation of the affinelucas kanade feature tracker description of the algorithm,”Intel Corporation, vol. 1, pp. 1–9, 2001.

[15] D. Mitrovic M. Zeppelzauer, M. Zaharieva and C. Breit-eneder, “A novel trajectory clustering approach for mo-tion segmentation,” in Advances in Multimedia Modeling,2010, pp. 433–443.

[16] L.C. Freeman, “Centrality in social networks conceptualclarification,” Social networks, vol. 1, pp. 215–239, 1979.

[17] E. Han G. Karypis and V. Kumar, “Chameleon: Hier-archical clustering using dynamic modeling,” Computer,vol. 8, pp. 68–78, 1999.

[18] Tom Fawcett, “An introduction to roc analysis,” PatternRecognition Letters, vol. 27, pp. 861–874, 2006.

Documents

CROWD MODELING USING SOCIAL NETWORKS1 …ijunejo/rimaICIP.pdfCROWD MODELING USING SOCIAL NETWORKS1 Rima Chaker Imran N Junejo Zaher Al Aghbari University of Sharjah, U.A.E. 27272 ABSTRACT