Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Wei-Ta Chu
2010/10/7
Video Syntax Analysis 21
Multimedia Content Analysis, CSIE, CCU
Scene Detection in Movies and TV shows2
Multimedia Content Analysis, CSIE, CCU
Rasheed, et al. “Scene detection in Hollywood movies and tvshows” Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 343-348, 2003.
Introduction
Multimedia Content Analysis, CSIE, CCU
3
Every year around 4500 motion pictures arereleased around the world spanning overapproximately 9000 hours of video.
Inexpensive and popular digital technology isavailable through cable and internet, such as videoon demand.
Accessing video content Detect shots and set of key frames Combine similar shots and form scenes or story units
Problems in Previous Works
Multimedia Content Analysis, CSIE, CCU
4
A false color match between shots of two differentscenes may wrongly combine scenes.
Action scenes may be broken into many scenes fornot satisfying the color matching criterion.
System Flowchart
Multimedia Content Analysis, CSIE, CCU
5
BSC: backward shot coherencePSB: potential scene boundaries
Shot Detection
Multimedia Content Analysis, CSIE, CCU
6
Based on histogram intersection 16 bin HSV normalized color histogram
8 bins for hue, 4 bins each for saturation and value
Key frame Selection
Multimedia Content Analysis, CSIE, CCU
7
(1) Initially, the middle frame of the shot is selected andadded to the null set Ki.
(2) Each frame within a shot is compared to every frame in Ki. (3) If the frame differs from all previously chosen key frames
by a fixed threshold, it is added in Ki.
Shot-based Features
Multimedia Content Analysis, CSIE, CCU
8
Shot length Shot motion content
Estimate the parameters of a global affine motion model Calculate the difference between the actual and the re-
projected motion of blocks
Shot Motion Content
Multimedia Content Analysis, CSIE, CCU
9
Scene Boundary Detection Algorithm
Multimedia Content Analysis, CSIE, CCU
10
Pass 1: Detecting potential scene boundaries basedon color properties
Pass 2: Removal of weak scene boundaries byanalyzing the shot length and motion content
Potential Scene Boundaries
Multimedia Content Analysis, CSIE, CCU
11
Backward shot coherence
Potential Scene Boundaries12
Backward shot coherence Compute the shot coherence of the shot i in a window of previous shots
Taking the maximum shot coherence in a window of length N
The shots with local mimimum BSCs are scene boundary candidates
To filter out false alarms: if a pair of key frames of twoadjacent potential scenes are similar, merger them into onescene.
Potential Scene Boundaries
Multimedia Content Analysis, CSIE, CCU
13
BSC for 300 shots.
First key frame of eachshot
Selection of Window Size
Multimedia Content Analysis, CSIE, CCU
14
The computation of BSC is controlled by the selectionof window size N.
A memory parameter which mimics a human’s ability to recall a shot seen in the past.
If N is too large, it may span over several scenes. If N is too small, over-segmentation of video may be
obtained. N=10 in this paper
Scene Dynamics Analysis
Multimedia Content Analysis, CSIE, CCU
15
Scenes with weak structure are often broken in severalscenes. E.g. action scenes–non-repetitiveness of shots
Scene dynamics
Action scenes: larger SMC and smaller L (length of shot) The PSB between two consecutive scenes will be
removed if SD of both scenes exceed a fixed threshold.
Scene Dynamics Analysis
Multimedia Content Analysis, CSIE, CCU
16
Scene Representation
Multimedia Content Analysis, CSIE, CCU
17
A shot is a good representative whenThe shot is shown several times (higher SC)The shot spans over longer period of time (larger shot
length)The shot has minimal motion content (smaller SMC)
Multiple faces are preferred.
Shot Goodness
Multimedia Content Analysis, CSIE, CCU
18
A correlation matrix of dimension N X N is constructed whereelement (i,j) is the coherence of shot i with shot j.
Three shots with the highest W are selected as candidate shotsand face detection is performed.
Detection of Faces
Multimedia Content Analysis, CSIE, CCU
19
A method based on skin detection is adopted. The middle frame of candidate shots are tested. Each isolated segment of skin is considered as face
and the frame with highest votes is taken as thescene key frame.
In the case of a tie or no face, the key frame of theshot with the highest goodness value is selected.
Scene Key Frame
Multimedia Content Analysis, CSIE, CCU
20
More Examples
Multimedia Content Analysis, CSIE, CCU
21
Experimental Results
Multimedia Content Analysis, CSIE, CCU
22
Five movies, one sitcom, and one TV show
False alarm(false positive)
Miss(false negative)
Experimental Results
Multimedia Content Analysis, CSIE, CCU
23
Slightly over segmentationis preferable over under-segmentation.
While browsing a video,it’s better to have two segments of one scenerather than on segmentconsisting of two scenes.
Experimental Results
Multimedia Content Analysis, CSIE, CCU
24
References
Multimedia Content Analysis, CSIE, CCU
25
Rasheed, et al. “Scene detection in Hollywood movies and tvshows” Proc. of IEEE Computer Society Conference on Computer Vision and PatternRecognition, vol. 2, pp. 343-348, 2003.
Yeung, et al. “Segmentation of video by clustering and graph analysis” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94-109,1998.
Vendrig, et al. “Systematic evaluation of logical story unit segmentation” IEEE Transactions on Multimedia, vol. 4, no. 4, pp. 492-499, 2002.
Brief Introduction of Montage26
Multimedia Content Analysis, CSIE, CCU
Montage
Multimedia Content Analysis, CSIE, CCU
27
Montage “refers to the editing of the film, the cutting and piecing together of exposed film in amanner that best conveys the intent of the work”
Methods of Montage
Multimedia Content Analysis, CSIE, CCU
28
Metric The editing follows a specific number of frames, cutting to the next shot
no matter what is happening within the image. This montage is used to elicit the most basal and emotional of reactions
in the audience. Example
http://en.wikipedia.org/wiki/Soviet_montage_theory
Methods of Montage
Multimedia Content Analysis, CSIE, CCU
29
Rhythmic Cutting based on time -- along with a change in the speed of the metric
cuts -- to induce more complex meanings than what is possible withmetric montage.
Once sound was introduced, rhythmic montage also included auralelements (music, dialogue, sounds).
Example
Methods of Montage
Multimedia Content Analysis, CSIE, CCU
30
Tonal A tonal montage uses the emotional meaning of the shots -- not just
manipulating the temporal length of the cuts or its rhythmicalcharacteristics -- to elicit a reaction from the audience even morecomplex than from the metric or rhythmic montage.
For example, a sleeping baby would emote calmness and relaxation. Example: This is the clip following the death of the revolutionary sailor
Vakulinchuk, a martyr for sailors and workers.
Methods of Montage
Multimedia Content Analysis, CSIE, CCU
31
Overtonal/Associational The overtonal montage is the accumulation of metric, rhythmic, and tonal
montage to synthesize its affect on the audience for an even moreabstract and complicated effect.
Example: In this clip, the men are workers walking towards aconfrontation at their factory, and later in the movie, the protagonist usesice as a means of escape.
Methods of Montage
Multimedia Content Analysis, CSIE, CCU
32
Intellectual Uses shots which, combined, elicit an intellectual meaning Example: from Eisenstein's October and Strike. In Strike, a shot of striking
workers being attacked cut with a shot of a bull being slaughteredcreates a film metaphor suggesting that the workers are being treatedlike cattle. This meaning does not exist in the individual shots; it onlyarises when they are juxtaposed.
http://www.tcf.ua.edu/classes/Jbutler/T112/EditingIllustrations06.htm
Wei-Ta Chu
2010/10/7
Overview of CBIR33
Multimedia Content Analysis, CSIE, CCU
Y. Rui, T.S. Huang, and S.-F. Chang, “Image retrieval: current techniques, promising directions, and open issues” Journal of Visual Communication and Image Representation, vol. 10, pp. 39-62,1999.
Image Retrieval
Multimedia Content Analysis, CSIE, CCU
34
Image retrieval has been an active research areasince 1970s, with the thrust from the researchcommunities of database management and computervision.
Text-based approachesAnnotate images by textUse text-based database management systems to
perform image retrieval
Needs of Content-based ImageRetrieval
Multimedia Content Analysis, CSIE, CCU
35
In the early 1990s, two difficulties arise largelyVast amount of labor required in annotating large-
scale image collectionsRich content in the images and the subjectivity of human
perception
Instead of annotating by text-based keywords,images are indexed by their own visual content,such as color and texture.
An Image Retrieval System Architecture
Multimedia Content Analysis, CSIE, CCU
36
Image processingand compression
Computer vision andimage understanding Information retrieval
and databasemanagement system
Computational geometry,database management, patternrecognition
Database management
User psychology anduser interface
Feature Extraction
Multimedia Content Analysis, CSIE, CCU
37
General featuresColorTextureShape…
Domain-specific featuresHuman facesFingerprints
Color (1/2)
Multimedia Content Analysis, CSIE, CCU
38
Robust to background complication andindependent of image size and orientation
Color histogramHistogram intersection (L1 metric) L2-related metricCumulated color histogram
Euclidean distance is the L2 norm.
Color (2/2)
Multimedia Content Analysis, CSIE, CCU
39
Color momentsMost of the information is concentrated on the low-
order moments.The first moment (mean), the second (variance) and the
third (skewness)
Color setA selection of colors from the quantized color space.Color set feature vectors are binary. Thus a binary
search tree was constructed to allow a fast search.
Texture (1/2)
Multimedia Content Analysis, CSIE, CCU
40
Visual patterns that have properties of homogeneitythat do not result from a single color or intensity.
Containing important information about the structuralarrangement of surfaces and their relationship to thesurrounding environment.
Rushing, et al., “Using association rules as texture” IEEE Trans. on PAMI, vol. 23, no.8, pp. 845-858, 2001.
Texture (2/2)
Multimedia Content Analysis, CSIE, CCU
41
Co-occurrence matrix of texture featuresExplore the gray level spatial dependence of texture.Based on the orientation and distance between image
pixels
Tamura texture features Use of wavelet transform in texture representation
Shape
Multimedia Content Analysis, CSIE, CCU
42
Boundary-based featureUse only the outer boundary of the shapeFourier descriptor–use the Fourier transformed
boundary as the shape feature
Region-based featureUse the entire shape regionMoment invariants–use region-based moments which
are invariant to transformations
Color Layout
Multimedia Content Analysis, CSIE, CCU
43
Global color feature tends to give too many falsepositives when the image collection is large.
Using both color feature and spatial relationsDivide the whole image into blocks and extract color
features from each blocks.
Kasutani, et al., “The mpeg-7 colorlayout descriptor: a compact imagefeature description for high-speedimage/video segmentation retrieval” Proc. Of ICIP, pp. 674-677, 2001.
Segmentation
Multimedia Content Analysis, CSIE, CCU
44
Very important to image retrieval Both the shape feature and the layout feature depend on
good segmentation Still a unsolved problem
Chien, et al., “Predictive watershed: a fast watershed algorithm for videosegmentation” IEEE Trans. on CSVT, vol. 13, no. 5, pp. 453-461, 2003.
Summary
Multimedia Content Analysis, CSIE, CCU
45
Many visual features have been explored. What features and representations should be used
is application dependent. MPEG-7 standard–multimedia content description
interface
High Dimensional Indexing
Multimedia Content Analysis, CSIE, CCU
46
Make CBIR truly scalable to large size imagecollections
Two main challengesHigh dimensionalityNon-Euclidean similarity measure
ApproachDimension reductionUse appropriate multidimensional indexing techniques
Curse of Dimensionality
Multimedia Content Analysis, CSIE, CCU
47
An example of classifying data in two dimension
Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
Curse of Dimensionality
Multimedia Content Analysis, CSIE, CCU
48
If we divide a space into regular cells, then the number of suchcells grows exponentially with the dimensionality of the space.
We need an exponentially large quantity of training data inorder to ensure that the cells are not empty.
Dimension Reduction
Multimedia Content Analysis, CSIE, CCU
49
Although there is curse of dimensionality, it doesn’t prevent us from finding effective techniques.Real data will often be confined to a region of the
space having lower dimensionality.Real data will typically exhibit some smoothness
properties so that small changes in the input variableswill produce small changes in the target variables.
Karhunen-Loeve transform (KLT) Clustering
Multidimensional Indexing Techniques
Multimedia Content Analysis, CSIE, CCU
50
Select appropriate multidimensional indexingalgorithms to index the reduced but still highdimensional feature vectors.
Techniquesk-d treeR-tree…
Retrieval Systems
Multimedia Content Analysis, CSIE, CCU
51
Most image retrieval systems supportRandom browsingSearch by exampleSearch by sketchSearch by text (including keyword or speech)Navigation with customized image categories
QBIC, Query by Image Content
Multimedia Content Analysis, CSIE, CCU
52
Support query based on example images, user-constructed sketches and drawings, and selectedcolor and texture patterns
FeaturesColor, texture, shape
IndexingR*-tree
http://wwwqbic.almaden.ibm.com/
Virage
Multimedia Content Analysis, CSIE, CCU
53
Support visual queries based on color, composition,texture, and structure. Support arbitrarycombinations of the above atomic queries.
http://www.virage.com/
PhotoBook
Multimedia Content Analysis, CSIE, CCU
54
A set of interactive tools for browsing and searchingimages
FeaturesShape, texture, face
Include human in the image annotation and retrievalloopRelevance feedback
VisualSEEk and WebSEEk
Multimedia Content Analysis, CSIE, CCU
55
Spatial relationship query of image regions andvisual feature extraction from compressed domain
FeaturesColor set, wavelet transform based texture
http://persia.ee.columbia.edu:8008/
MARS
Multimedia Content Analysis, CSIE, CCU
56
The research features are the integration of DBMSand IR, integration of indexing and retrieval, andintegration of computer and human.
Investigate how to organize various visual featuresinto a meaningful retrieval architecture which candynamically adapt to different applications anddifferent users.
Others
Multimedia Content Analysis, CSIE, CCU
57
ALIPR (Automatic Linguistic Indexing of Pictures -Real Time)
RetrievalWare Netra ART MUSEUM Blob-world…
Wei-Ta Chu
2009/10/15
VisualSEEk58
Multimedia Content Analysis, CSIE, CCU
J.R. Smith and S.-F. Chang, “VisualSEEk: a fully automated content-based image query system” Proc. of ACM Multimedia, pp. 87-98, 1996.
Introduction
Multimedia Content Analysis, CSIE, CCU
59
Enable querying by image regions and spatial layout. Unconstrained images are decomposed into near-symbolic
images which lend to efficient spatial query Address spatial queries involving adjacency, overlap, and
encapsulation of regions
Introduction
Multimedia Content Analysis, CSIE, CCU
60
Need to devise an image similarity function whichcontains both color feature and spatial components.
Intrinsic parametersSimilarity between query and target colors and/or
region sizes and (absolute) spatial locations Derived parameters
The inferences that can be made from the intrinsicparameters, such as relative spatial locations andoverall assessment of image matches with multipleregions.
Image Query Process
Multimedia Content Analysis, CSIE, CCU
61
Characteristics of VisualSEEk
Multimedia Content Analysis, CSIE, CCU
62
Automated extraction of localized regions andfeatures
Querying by both feature and spatial information Feature extraction from compressed data Development of techniques for fast indexing and
retrieval Development of highly functional user tools
System Overview
Multimedia Content Analysis, CSIE, CCU
63
Color Sets
Multimedia Content Analysis, CSIE, CCU
64
Tc: color space transformation
QcM: quantizer that partitions the color space into M subspaces
BcM: M dimensional binary space such that each axis corresponds to one
unique index value m
A color set is a binary vector in BcM which corresponds to a selection of
colors {m}
Example
Multimedia Content Analysis, CSIE, CCU
65
Tc: RGB to HSV M = 8 for Qc
M. Quantize the HSV color space to 2hues, 2 saturations, and 2 values.
BcM is an eight dimensional binary space
A color set c contains a selection from the eightcolorsE.g. c = [10010100] corresponds to the selection of
three colors, m = 0, m = 3, and m = 5, from thequantized HSV color space.
Color Sets
Multimedia Content Analysis, CSIE, CCU
66
Color sets provide a compact alternative to colorhistograms for representing color information.
Their utilization stems from the conjecture thatsalient regions have not more than a few, equallyprominent colors.
Color Sets
Multimedia Content Analysis, CSIE, CCU
67
Color Set Back-Projection
Multimedia Content Analysis, CSIE, CCU
68
Goal: to extract color regions Back-project process
Color set selection Back-projection onto the image Thresholding and labeling
Back-projection Given image I and color set c, let k be the index of the color
at image point I(x,y), then generate image B(x,y) by B(x,y) = c[k]
Color Set Back-Projection69
Color Similarity
Multimedia Content Analysis, CSIE, CCU
70
In HSV color space, the similarity between any two colors mi=(hi, si, vi) and mj = (hj, sj, vj)is
Color histogram
Histogram distance Minkowski metric A dark red image is equally dissimilar to a red image as to a blue
image.
Color Similarity
Multimedia Content Analysis, CSIE, CCU
71
Histogram Quadratic Distance It measures the weighted similarity between histograms The quadratic distance between histograms hq and ht
A = [ai,j] and ai,j denotes the similarity between colors with indices iand j
Since the histogram quadratic distance computes the cross similaritybetween colors, it’s computational expensive.
Hafner, et al. , "Efficient Color Histogram Indexing for Quadratic Form DistanceFunctions," IEEE Trans. on PAMI, vol. 17, no. 7, pp. 729-736, Jul., 1995
Color Similarity
Multimedia Content Analysis, CSIE, CCU
72
Histogram Quadratic Distance
Consider a pure red image x=[1.0, 0.0, 0.0]T, and a pureorange image y=[0.0, 1.0, 0.0]T.
The quadratic distance between x and y is 0.2. The Euclidean distance between x and y is .
Hafner, et al. , "Efficient Color Histogram Indexing for Quadratic Form DistanceFunctions," IEEE Trans. on PAMI, vol. 17, no. 7, pp. 729-736, Jul., 1995
Color Similarity
Multimedia Content Analysis, CSIE, CCU
73
Color sets give only a selection colors. Color set distance
The quadratic distance between two color sets cq and ct is
Color Set Query Strategy
Multimedia Content Analysis, CSIE, CCU
74
The color set query compares only the color content of regionsor images.
Given query Q = {cq}, the best match to Q is target Tj = {cj},where
Color region matching is accomplished by performing severalrange queries on the query color set’s colors, taking the intersection of these lists and minimizing the sum of attributesin the intersection list.
Single Region Query
Multimedia Content Analysis, CSIE, CCU
75
Fixed query location The spatial distance between regions is given by the Euclidean distance
of centroids
Single Region Query76
Bounded query location User specify bounds within which a target region is assigned a spatial
distance of zero. When a target regions is outside of the bounds, calculate by Euclidean
distance. Useful in many situations when users don’t care about the exact
position
Centroid Location Spatial Access–Spatial Quad-Trees
77
The centroids of the image regions are indexed using a spatialquad-tree on their x and y values.
The quad-tree provides quick access to 2-D data points. A query for region at location (xt,yt) is processed by first
traversing the spatial quad-tree to the containing node, thenexhaustively searching the block for the points that minimize
Rectangle Location Spatial Access–R-Trees
78
Region spatial locations are also indexed by their minimum boundingrectangles. (MBRs)
MRBs of the regions are indexed using an R-tree.
The R-tree provides a dynamic structure for indexing rectangles.
The R-tree, which consists of a hierarchy of overlapping spatial nodes, isdesigned to visit only a small number of nodes in a spatial search.
Size
Multimedia Content Analysis, CSIE, CCU
79
Area distance
Spatial extentCalculated based on the widths and heights of minimum
bounded rectangles
Single Region Query Strategy
Multimedia Content Analysis, CSIE, CCU
80
Integrating distances of color set, region location, area, andspatial extent. Weighted sum:
Single Region Query Strategy
Multimedia Content Analysis, CSIE, CCU
81
Query: find the region that best matches Q = {cq, (xq,yq), areaq,(wq,hq)}
First computing the individual queries for color, location, sizeand spatial extend
The intersection of the region match lists is then computed toobtain the set of common images.
Multiple Regions Query82
Intersecting the results of single region matches Computing image match scores based on adding the weighted
scores from the best regions matches. Check relative spatial locations
Absolution Locations
Multimedia Content Analysis, CSIE, CCU
83
Query: find the region that best matches Q ={QA,QB,QC},where Qi = {ci
q, (xiq,yi
q), areaiq, (wi
q,hiq)}
The query is processed by intersecting the query region lists toobtain the list of candidate images. The best match minimizesthe weighted sum of the region distances between the queryand target image.
Region Relative Location
Multimedia Content Analysis, CSIE, CCU
84
Convert relative location into 2-D strings (t0t1 < t2 < t7 < t3 < t6 < t4 < t5) (t0 < t5t7 < t6 < t2 < t3t1 < t4)
Scale invariance and rotation invariance (t0 < t7 t2 t1 < t6 t5 t3 < t4) (t5 < t6t7 < t4 t0 t3 t2 < t1)
Adjacency, nearness, overlap, and surroundcan be detected via checking 2-D strings.
Relative Locations
Multimedia Content Analysis, CSIE, CCU
85
For each candidate image, the 2-D string isgenerated from the identified region and iscompared to the 2-D string of the query image.
This final operation either validates the targetimage or rejects it.
Evaluation
Multimedia Content Analysis, CSIE, CCU
86
Users sketch regions, positionsthem on the query grid, and assignsthem properties of color, size andabsolution location.
The user may also assignboundaries for location and size.
Evaluation87
Global color histogramquery process gives userslittle control in specifyingthe query and more readilyreturns images that are notdesired.
Evaluation88
Synthetic Evaluation Data89
Evaluation
Multimedia Content Analysis, CSIE, CCU
90
Q1: region indexing anddistance computation strategyin this paper
Q2: the same query strategyon a region database that wasgenerated automatically fromthe target images using colorset back-projection
Q3: based on color histogram
Evaluation of Color Sets
Multimedia Content Analysis, CSIE, CCU
91
Retrieval effectivenessdegrades only slightlyusing color sets.
This indicates that theperceptually significantcolor information isretained in the color sets.
Examples of VisualSEEk Queries
Multimedia Content Analysis, CSIE, CCU
92