Knowledge-based event recognition from salient
regions of activity
Nicolas Moënne-LoccozViper groupComputer vision & multimedia laboratory University of Geneva
January 23 2003 / [email protected]
M4 – Meeting – January 2004
NML - CVML - UniGe 2
Outline
• Context
• Salient Regions of Activity (SRA)
• Learning the semantic of SRA
• Visual Event Query language
• Conclusion
NML - CVML - UniGe 3
Context
• Retrieval of visual events based on user query Abstract representation of the visual content Query Language to express visual events
• Approach – Region-based description of the content– Classification of the regions– Events queried as spatio-temporal constraints on the
regions
NML - CVML - UniGe 4
Overview
Domain Knowledge
Salient regions
of activity
Labelled regions
Videosdatabase
User queriesRegion extraction Classification
NML - CVML - UniGe 5
Salient regions of activity
• Regions of the image space – Moving in the scene– Having an homogenous colour distribution
Moving objects or meaningful parts of moving objects
• Extraction : – From moving salient points– By an adaptive mean-shift algorithm
NML - CVML - UniGe 6
Salient points extraction
• Scale invariant interest points (Mikolajczyk, Schmid 2001)
– Extracted in the linear scale-space
– Local maxima of the scale normalized Harris function (image space)
– Local maxima of the scale normalized Laplacian (scale space)
)),(()),(det(),( 2 svHTracesvHsvh
),(),(
),(),(),( 2
2
svLsvLL
svLLsvLsvH
xyx
yxx
),(),(),( 2 svLsvLssvl yyxx
)()(),( sGvIvsLii vv
NML - CVML - UniGe 7
Salient points extraction
• Example :
scale
NML - CVML - UniGe 8
Salient points trajectories
• Trajectories used to :– Find salient points moving in the scene– Track salient points along the time
• Points matching using Local grayvalue invariants (Schmid)
kjiijk
lkijklij
kjiijkkkjiij
llijkklkijklij
jiij
ii
jiji
ii
LLLL
LLLL
LLLLLLLL
LLLLLLLL
LL
L
LLL
LL
L
wg
)(
)(
yxji
yxi
LL
jiij
ii
yxiii
,,,
,,0
,
NML - CVML - UniGe 9
Salient points trajectories
• Mahalanobis distance :
• Set of matching points minimize
– Greedy Winner-Takes-All algorithm
Set of points trajectories
Moving salient points :
1,
,tjti WwWw
ji wwd
jiT
jiji wgwgwgwgwwd 1,
1,, tjtijiw WwWwwwTi
wTw
NML - CVML - UniGe 10
Salient regions estimation
• Estimate characteristic regions of the moving salient points
• Mean-Shift algorithm : estimate the position
Likelihood of pixels (RGB colour distribution)
Ellipsoidal Epanechnikov Kernel
rvr of
v r
rv rv vPvvK
vvvPvvKr
www NvPvP ,
21
4
3rr
Trr vvvvvvK
r
NML - CVML - UniGe 11
Salient regions estimation
• Kernel adaptation step : estimate shape and size
• Algorithm :
rwWW
v
AdaptationKernel
ShiftMeanv
ssdiagwv
Ww
wW
rr
r
r
wwrr
converge , until
repeat
)3,3(,
each for
pointssalient moving
rr rvP cov
rr of
NML - CVML - UniGe 12
Salient regions representation
• Set of salient regions of activity represented by :
– Position – Ellipsoid – Colour distribution
– Set of salient points
• Salient regions tracking– Regions are matched by a majority vote of their salient
points
rvr
rgbrgbr ,
rW
NML - CVML - UniGe 13
Salient regions of activity
NML - CVML - UniGe 14
Regions classification
• To obtain an abstract description :– Map regions to a domain-specific basic vocabulary
Meetings : {Arm, Head, Body, Noise}
• SVM classifier :
– Set of 500 annotated salient regions of activity (~200 frames)
NML - CVML - UniGe 15
Regions classification
• Confusion Matrix :
• Discussion :– Noise class is ill-defined– Good results explained by the limited number of classes
Arm Head Body Noise
Arm 1.000 0 0 0
Head 0 0.909 0.091 0
Body 0 0 1.000 0
Noise 0 0.052 0 0.946
NML - CVML - UniGe 16
Visual event language
• To express visual events queries– Spatio-temporal constraints on labelled regions (LR)
• To integrate domain Knowledge– As specification of the layout (L)– As set of basic events
a formula of the language is a conjunctive form of :
– Temporal relations {after, just-after} between 2 LR– Spatial relations {above, left} between 2 LR {in} between a LR and a L– Identity relations {is} between 2 LR {is-a} between a LR and a label
NML - CVML - UniGe 17
Knowledege - Meetings
• Scene layout : L = {SEATS, DOOR, BOARD}
NML - CVML - UniGe 18
Knowledege - Meetings• Basic events : {Meeting-participant, sitting, standing}
Meeting-participant : actors LR
constraints is-a(head, LR).
Sitting : actor : LR
constraints : Meeting-participant(LR),
in(SEATS, LR).
Standing : actor : LR
constraints : Meeting-participant(LR),
~in(SEATS, LR).
NML - CVML - UniGe 19
Events queries
• Example of user queries :
Sitting-down : actors LR1, LR2
constraints is(LR1, LR2),
sitting(LR1),
standing(LR2),
just-after(LR1, LR2).
Go-to-board : actors LR1, LR2
constraints is(LR1, LR2),
standing(LR1),
~in(Board, LR1),
standing(LR2),
in(Board, LR2), just-after(LR2, LR1).
NML - CVML - UniGe 20
Events queries - Results
• Results :
• Discussion :• Recall validate the retrieval capability • False alarms occur because of the hard decision
Precision Recall
Sit-down 0.43 1.00
Stand-up 0.50 1.00
Go-to-board 1.00 1.00
Enter 0.20 1.00
Leave 0.25 0.50
NML - CVML - UniGe 21
Conclusion
• Contributions– Well-suited framework for constraint domains– Generic representation of the visual content– Paradigm to retrieve visual events from videos
• Limitations– Cannot retrieve all visual events (e.g. emotion)
• Ongoing work– Uncertainty handling and fuzziness– Integration of other modalities (e.g. transcripts)