Upload
sarah-garrison
View
214
Download
0
Embed Size (px)
DESCRIPTION
Salient region attracts attention and so the eyes Saliency depends mainly on two factors: Bottom-up : task-independent, depending on intrinsic features of the stimuli Top-down : task-dependant, integrating high-level processes (cognitive state,...) 3/24 Introduction
Citation preview
Spatio-temporal saliency model to predict eye movements in video free viewing
Gipsa-lab, Grenoble Département Images et Signal
CNRS, UMR 5216
S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, A. Guérin-Dugué
GDR-vision 12/06/2008
Plan Introduction
Model
Experiment and results
Conclusion
2/24
Salient region attracts attention and so the eyes
Saliency depends mainly on two factors: Bottom-up : task-independent, depending on intrinsic
features of the stimuli Top-down : task-dependant, integrating high-level
processes (cognitive state, ...)
3/24
Introduction
Introduction
Spatio-temporal saliency model
Achromatic stimuli
Simulates some parts of the human visual system: retina, primary visual cortex (V1)
Two pathways : static and dynamic
4/24
Model
Model
Ms Md
5/24
Model
Model
Ms Md
Spatio-temporal saliency model
Achromatic stimuli
Simulates some parts of the human visual system: retina, primary visual cortex (V1)
Two pathways : static and dynamic
6/24
Model
Model
Ms Md
Spatio-temporal saliency model
Achromatic stimuli
Simulates some parts of the human visual system: retina, primary visual cortex (V1)
Two pathways : static and dynamic
7/24
Model
Model
Spatio-temporal saliency model
Achromatic stimuli
Simulates some parts of the human visual system: retina, primary visual cortex (V1)
Two pathways : static and dynamic
Two outputs :
Magnocellular-like: Low spatial frequencies, band pass filter,
whitens spectrum, provides global information
Parvocellular-like: High spatial frequencies, high pass filter,
whitens spectrum, enhances frame contrast
8/24
Retina model
Model _ retina model
9/24
Retina model
Model _ retina model
« Parvocellular-like » « Magnocellular-like »
Ms Md
Visual stimuli are processed in different frequency bands and orientation in V1 Static: 6 orientations, 4
frequency bands Dynamic: 6 orientations, 3
frequency bands (lower)
10/24
Cortical-like filters
Model _ cortical-like filters
Ms Md
11/24
Static pathway
Model _ static pathway
Static pathway:
Interactions: strengthens the contours Short: between cells of
overlapping receptive fieldLong: between collinear cells
Normalization
Summation in all orientation and frequency bands: static saliency Ms Md
12/24
Dynamic pathway
Model _ dynamic pathway
Dynamic pathway:
2 motion estimation steps: Dominant motion compensationLocal motion estimation using the
same bank of cortical filters as static pathway
Temporal filtering
Dynamic saliency: module of motion vector Ms Md
13/24
Dynamic pathway
Model _ dynamic pathway
Dynamic pathway:
2 motion estimation steps: Dominant motion compensationLocal motion estimation using the
same bank of cortical filters as static pathway
Temporal filtering
Dynamic saliency: module of motion vector Ms Md
Multiplicative fusion
14/24
Fusion and example of saliency maps
Model _ fusion and example of saliency maps
),(),(),( yxMyxMyxM dsand
Original video
Md Mand
Ms
Purpose : compare model results with human eye positions
Free viewing, eye positions recorded by Eyetracker Eyelink II
15 subjects20 clips of 30s composed of different snippets
strung together
Stimulus size = 720x576 pixels, 40°x30° field of view
15/24
Experiment and results
Experiment and result
Snippet 1 Snippet 2 Snippet k-1 Snippet kSnippet k-2
[Itti] : R. Carmi and L. Itti, « Visual causes versus correlates of attentional selection in dynamic scenes », Vision Research, vol.46, 2008
Mh
Criterion : Normalized Scanpath Saliency (NSS) [Itti]
16/24
Global analysis
Experiment _ global analysis
mapsaliency model mapdensity position eyehuman
),,(),,(),,()(),,(
m
h
kyxM
mmh
MM
kyxMkyxMkyxMkNSSm
0.540.440.54Real eye movementsMdnMsnSDMsnH
Naives saliency maps
Ms: staticMd: dynamicMand: fusion
MsnH: entropyMsnSD: standard-deviationMdn: absolute difference
[Itti] : R. J. peters and L. Itti, « Applying computational tools to predict gaze direction in interactive visual environments », ACM Trans. On Applied Perception, vol.5, 2008
Saliency maps Ms Md Mand
Real eye movements 0.68 0.87 0.96
NSS as a function of frame
17/24
Temporal analysis
Experiment _ temporal analysis
Snippet 1 Snippet 2 Snippet N
Average on the kth frame of each snippet
1…k …
Frame rate = 25 fps
NSS as a function of frame
18/24
Temporal analysis
Experiment _ temporal analysis
Frame rate = 25 fps
Snippet 1 Snippet 2 Snippet N
Average on the kth frame of each snippet
1…k …
NSS as a function of frame
19/24
Temporal analysis
Experiment _ temporal analysis
Frame rate = 25 fps
Snippet 1 Snippet 2 Snippet N
Average on the kth frame of each snippet
1…k …
NSS as a function of frame
20/24
Temporal analysis
Experiment _ temporal analysis
Frame rate = 25 fps
Snippet 1 Snippet 2 Snippet N
Average on the kth frame of each snippet
1…k …
NSS as a function of frame
21/24
Temporal analysis
Experiment _ temporal analysis
Dispersion of eye positions as a function of frame
Frame rate = 25 fps
NSS as a function of frame
22/24
Temporal analysis
Experiment _ temporal analysis
Dispersion of eye positions as a function of frame
Frame rate = 25 fps
10-13th frame ≈ 400-520 ms
NSS as a function of frame
23/24
Temporal analysis
Experiment _ temporal analysis
Dispersion of eye positions as a function of frame
Frame rate = 25 fps
New model of spatio-temporal saliency, biologically inspired
Retina filter with two outputs Interactions Same bank of cortical-like filters for static and dynamic pathways
This model is reliable to predict the first fixations
references : S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, A. Guérin-Dugué, « Spatio-temporal saliency model to predict
eye movements in video free viewing », Proc. Eusipco 2008 S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, A. Guérin-Dugué, « Modelling spatio-temporal saliency to
predict gaze direction for short videos », submitted in International Journal of Computer vision
24/24
Conclusion
Conclusion
Thanks for your attention !