View
213
Download
0
Category
Tags:
Preview:
Citation preview
A Probabilistic Framework for Video Representation
Arnaldo Mayer,
Hayit GreenspanDept. of Biomedical Engineering
Faculty of Engineering
Tel-Aviv University, Israel
Jacob Goldberger,
CUTe Systems, Ltd.
Introduction
• In this work we describe a novel statistical video representation and modeling scheme.
• Video representation schemes are needed to enable
segmenting a video stream into meaningful video-objects, useful for event detection, indexing and retrieval applications.
PACS: Picture Archiving &Communication Systems
StorageStorage
Query/RetrieveQuery/Retrieve
InternetInternet Database Database ManagementManagement
Query/RetrieveQuery/Retrieve
Visual Visual InformationInformation
Tele-MedicineTele-Medicine
Spatio-Temporal Segmentation of Multiple Sclerosis Lesions in
MRI
What are interesting events in medical data?
Spatio-Temporal Tracking ofTracer in Digital Angiography
• Analysis of a video as a single entity
Vs analysis of video as a sequence of frames
• Inherent Spatio-temporal tracking
• Gaussian Mixture Modeling in color & space-time domain
t
x
y
Introduction
Learning a Probabilistic Model in Space-Time
Feature Vectors
[L,a,b,x,y,t] (6 - dimensional space)
Expectation Maximization (EM)t
y
Gaussian MixtureModelx
Video Representation via Gaussian Mixture Modeling
• Each Component of the GMM Represents a Cluster in the Feature Space (=Blob) and a Spatio-temporal region in the video
• PdF For the GMM :
With the Parameter set
)()(2
1exp
||)2(
1)|(
1
1
jj
jT
k
j jd
j xxxf
kjjjj 1},,{
• Given a set of feature vectors and parameter values,
the Likelihood
expresses how well the model fits the data.
• The EM algorithm: iterative method to obtain the
parameter values that maximize the Likelihood
ML
|,...,argmax 1 nML xxf
1 11, , | | ,
n m
n j t j jjtf x x f x
…
Expectation step: estimate the Gaussian clusters to which the points in feature space belong
Maximization step: maximum likelihood parameter estimates using this data
1
1
1
1
1
1ˆ
ˆ
ˆ ˆˆ
n
j tjt
n
tj ttj n
tjt
Tn
tj t j t jtj n
tjt
wn
w x
w
w x x
w
k
jjjtj
jjtjtj
xf
xfw
1
,|
,|
The EM Algorithm
Initialization & Model selection• Initialization of the EM algorithm via K-means:
– Unsupervised clustering method
– Non-parametric
• Model selection via MDL (Minimum Description Length)– Choose k to maximize:
– lk = #free parameters for a model with k mixture components
nl
xL k log2
)|(log
)2
)1(()1(
ddkkdklk
Static space-time blob Dynamic space-time blob
The GMM for a given video sequence can be visualized as a set of hyper-ellipsoids (2 sigma contour) within the 6 dimensional color-space-time domain.
Video Model Visualization
Detection & Recognition of Events in Video
C
L a b x y t
L a b x y t
Cxt
Ctt
Ctt - Duration of space-time blob
Static/Dynamic blobs - thresholds on Rxt (Hor. motion) & Ryt (Ver. motion)
Direction of motion - sign of Rxt, Ryt
Correlation coefficient :
11; ij
jjii
ijij R
CC
CR
Cyt
Detection & Recognition of Events in Video
C
L a b x y t
L a b x y t
Cxt
Ctt
Blob motion (pixels per frame) via linear regression models in space & time :
)()|( titt
xtxi Et
C
CEttxE
Cyt
Horizontal velocity of blob motion in image plane is extracted as the ratio of cov. parameters. Similar formalism allows for the modeling of any other motion in the image plane.
Probabilistic Image Segmentation
A direct correspondence can be made between the mixture representation and the image plane.
Each pixel of the original image is now affiliated with the most probable Gaussian cluster.
Pixel labeling:
Probability of pixel x to be labeled:
jjjj
xfxLabel ,|argmax)(
)|(
,|))((
xf
xfjxLabelp jjj
Limitations of the Global Model
• How can we represent non-convex spatio-temporal regions?
• All the data must be available simultaneously - Inappropriate for live video- Model fitting time increases directly with sequence length
Piecewise Gaussian Mixture Modeling
• Modeling the Video sequence as a succession of overlapping blocks of frames.
• Obtain a succession of GMMs instead of a single global model.
• Important issues: initialization;matching between adjacent segments for region tracking. (“gluing”)
Piecewise GMM :“Gluing” / Matching at Junctions
Frame J5 blobs via GMM5
Frame J5 blobs via GMM6
Frame J5Ex:
Blob matching
Original Sequence
Segmentation Map Sequence
BOF #
Pix / frame
Horizontal Velocity
BOF #
Pix / frame
Vertical Velocity Sweater
Trousers
Methodology Time
K >= 41) CSF2) White Matter3) Gray Matter4) Sclerotic Lesions
Segmentation Maps
Blobs in[L x y t] Feature Space
Frame by frame Segmentation
3D (x,y,t) Connected Components
GMM for
Luminance
Conclusions
• The modeling and the segmentation are combined to enable the extraction of video-regions that represent coherent regions across the video sequence, otherwise termed video-objects or sub-objects.
• Extracting video regions provides for a compact video content description, that may be useful for later indexing and retrieval applications.
• Medical applications: lesion modeling & tracking
AcknowledgmentPart of the work was supported by the Israeli Ministry of Science, Grant number 05530462.
Recommended