Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Lecture 12: Video Representation,
Summarisation, and Query
Dr Jing ChenNICTA & CSE UNSW
CS9519 Multimedia SystemsS2 2006
COMP9519 Multimedia Systems – Lecture 12 – Slide 2 – J Chen
Last week …Structure of video
FrameShotSceneStory
Why video structure analysisTransition effect between shotsShot segmentationScene segmentation
COMP9519 Multimedia Systems – Lecture 12 – Slide 3 – J Chen
Last Week-- A diagram of video structure
video data
...
...
shot #1 shot #2 shot #3 shot #4 shot #19 shot #20 shot #21
shots
Scenes (stories)
...
scene #1 scene #2 scene #8
keyframe keyframe keyframe keyframe keyframe keyframe keyframe
* H.B.Kang, Video Abstraction Techniques For A Digital Library
COMP9519 Multimedia Systems – Lecture 12 – Slide 4 – J Chen
Last Week -- Structure of Video Sequence
story
scene1 scene2
shot1 shot2 shot3 shot4 shot5
frames1,2,………………………………………………………………N
COMP9519 Multimedia Systems – Lecture 12 – Slide 5 – J Chen
Last Week- Why video structure analysis?Typical video retrieval system diagram
Some applications:Indexing and browsingNon-linear editingEvent detection
Video data
Segmentation
Key frame computation
Feature extraction
Color Motion Shape …
Video query, retrieval and production
Video browsing
* Yan Liu & Fei Li
COMP9519 Multimedia Systems – Lecture 12 – Slide 6 – J Chen
Last Week -- Types of Shot Transition
CutFadeDissolveWipe
COMP9519 Multimedia Systems – Lecture 12 – Slide 7 – J Chen
Last Week -- video shot segmentation methods
Spatial domain approachesPixel domain -- Frame differencingMotion compensated frame differencingHistograms (global, joint and local)Model driven
Compressed domain approachesDCT coefficientsDC termsMB types of B frame
COMP9519 Multimedia Systems – Lecture 12 – Slide 8 – J Chen
Last Week -- Thresholding vs Clustering based shot segmentation
ThresholdingLocal decision (based on the info of very few frames)Thresholds are typically highly sensitive to the type of input video
ClusteringView shot segmentation as a k-class unsupervised clustering problemAssign frames to one of the k classes via k-meansGlobal decisionNot only eliminates the need for threshold setting but also allows multiple features to be used simultaneously to improve the performance
COMP9519 Multimedia Systems – Lecture 12 – Slide 9 – J Chen
Last Week – Scene SegmentationReference frame (R-Frame)
Distance between two shots
Clustering of Shots – Visual Dissimilarity
R-frames for shot i RAi RBi RCi . . .
R-frames for shot j RAj RBj RCj RCj. . .
COMP9519 Multimedia Systems – Lecture 12 – Slide 10 – J Chen
Last Week – A diagram of scene segmentation with scene transition graph
COMP9519 Multimedia Systems – Lecture 12 – Slide 11 – J Chen
OutlineWe have covered
Feature extractionImage/video retrieval systemVideo structure analysis
Q: how do we visualise the resultsHow to browse large video files?How to present the retrieval results?
RequirementsUnder the constraint of the available screen spaceEfficiently and user friendlyBrowse a large number of videos (1000s)
COMP9519 Multimedia Systems – Lecture 12 – Slide 12 – J Chen
Outline
StoryboardsHierarchical browserVideo directory browserVideo summaryVideo skimmingThumbnail imagesBrowsing many clips
Concept of video query
COMP9519 Multimedia Systems – Lecture 12 – Slide 13 – J Chen
Video browser/retrieval interface
Traditional interfaces:storyboards
How to show the temporal variation in the video?child storyboardanimated gifsskimshierarchical browser
* H. Sundaram
COMP9519 Multimedia Systems – Lecture 12 – Slide 14 – J Chen
Visual Browsing Example 1Lateral browser surrounding temporal browser, courtesy of Imperial College London
Carnegie Mellon
COMP9519 Multimedia Systems – Lecture 12 – Slide 15 – J Chen
Visual Browsing Example 2Best “people” shots, Carnegie Mellon Informedia system:
COMP9519 Multimedia Systems – Lecture 12 – Slide 16 – J Chen
StoryboardsSay this is the result of a search for video data..
What are the problems here?
* H. Sundaram
COMP9519 Multimedia Systems – Lecture 12 – Slide 17 – J Chen
Child storyboardsOnce we click on each thumbnail, this action pops up a child storyboard.
the storyboard shows the temporal behavior of the video
15 min of video.still no audio!clicking plays the shothow to select key-frames?
clustering
* H. Sundaram
COMP9519 Multimedia Systems – Lecture 12 – Slide 18 – J Chen
Key-frame SelectionConsiderations
Flexibility (number and level)Fidelity (content comprehension)
ApproachesFixed number, fixed spacingFirst/last frame, clean frameDifference, motionClustering
Cluster all frames using complete link algorithm (FXPAL)Use the maximum of the pair-wise distances between frames to determine the inter-cluster similarity, and produces small, tightly bound clusters
* H. Sundaram
COMP9519 Multimedia Systems – Lecture 12 – Slide 19 – J Chen
More Advance Browsing and Summarization Schemes
Hierarchical browser
Video directory browser
Video summary
Video skimming
Thumbnail images
Browsing many clips
COMP9519 Multimedia Systems – Lecture 12 – Slide 20 – J Chen
Key-frame Based Hierarchical Video Browser
* H.J.Zhang et al, Video Parsing, Retrieval and Browsing: An Integrated and Content-Based Solution, ACM MM 2005
COMP9519 Multimedia Systems – Lecture 12 – Slide 21 – J Chen
Visual Summaries for Video News Cluster at the Scene Level
* ClassView paper
COMP9519 Multimedia Systems – Lecture 12 – Slide 22 – J Chen
Data Structure and Browser Layout for Key-Frame Based Hierarchical Browser
COMP9519 Multimedia Systems – Lecture 12 – Slide 23 – J Chen
Scene Transition Graph
COMP9519 Multimedia Systems – Lecture 12 – Slide 24 – J Chen
Web-based Video Directory Browser (FX-PAL)
COMP9519 Multimedia Systems – Lecture 12 – Slide 25 – J Chen
Key Frames Attached to Time ScaleThe positions of the key-frames are marked by blue triangles along a mouse-sensitive time scale adjacent to the key-frame
As the mouse moves over the time scale, the key-frame for the corresponding time is shown and the triangle for that key-frame turns red.
COMP9519 Multimedia Systems – Lecture 12 – Slide 26 – J Chen
Mapping Confidence Scores to Gray LevelsMetadata: annotation of audio/videoTranslate metadata values into “confidence score”Present the confidence score by levels of gray
High confidence areas are marked in blackAreas of lower confidence fade progressively to white
COMP9519 Multimedia Systems – Lecture 12 – Slide 27 – J Chen
Confidence Score DisplayFeatures can be selected from a pull-down menu
COMP9519 Multimedia Systems – Lecture 12 – Slide 28 – J Chen
Metadata media player
COMP9519 Multimedia Systems – Lecture 12 – Slide 29 – J Chen
Demo of video directory browserhttp://www.fxpal.com/?p=mbase
COMP9519 Multimedia Systems – Lecture 12 – Slide 30 – J Chen
Summary of videoPresent key-frames (images) in a compact, visually pleasing display
Given :2D space constraints,Key-frame setimportance measures
What is the best display layout?Comic book concept
Issues:Time Order vs. Layout OrderPreserve high-level structuresImportance Measures
?
Video Manga (FXPAL)
COMP9519 Multimedia Systems – Lecture 12 – Slide 31 – J Chen
Discard Key-frames (in Video Manga)Key frame extraction
Starting pointmentioned earlier
Too many key-frames!Calculate an importance score for each segment based on its rarity and duration.Longer shots are preferred because they are likely to be important in the video.Repeated shots receive lower scores
They do not add much to the summary even if they are long. Segments with an importance score higher than a threshold are selected to generate a pictorial summary. For each segment chosen, the frame nearest the center of the segment is extracted as a representative key-frame.Frames are sized according to the importance measure of their originating segments
Higher importance segments are represented with larger key-frames.
COMP9519 Multimedia Systems – Lecture 12 – Slide 32 – J Chen
Key-frame PackingArrange the key frames in a logical order
Fit the available space efficiently
COMP9519 Multimedia Systems – Lecture 12 – Slide 33 – J Chen
Video Manga (FX-PAL)
COMP9519 Multimedia Systems – Lecture 12 – Slide 34 – J Chen
Web-based Interactive Video SummaryBrowse the video based on either key-frames or the timelinePop up captions as the mouse moves over an image
Space saving
COMP9519 Multimedia Systems – Lecture 12 – Slide 35 – J Chen
Playing videoClicking on a key-frame starts video playback from the beginning of that segment
COMP9519 Multimedia Systems – Lecture 12 – Slide 36 – J Chen
Demo of Video Mangahttp://www.fxpal.com/?p=mbase
COMP9519 Multimedia Systems – Lecture 12 – Slide 37 – J Chen
Video skimsA temporal, multimedia abstraction that incorporates both video and audio information from a longer source.Goal:
to communicate the essential content of a video in an order of magnitude less time.
COMP9519 Multimedia Systems – Lecture 12 – Slide 38 – J Chen
Generalized Video Skim Creation Process (CMU-Informedia)
COMP9519 Multimedia Systems – Lecture 12 – Slide 39 – J Chen
Audio/video alignment in skimsDefault skim (DEF)
Dropping video at regular intervalsImage centric skim (IMG)
Emphasizes visual content, decomposing the source into component shots, detecting “important” objects, such as faces and text, and identifying structural motion within a shot
Audio-centric skim (AUD) Derives solely from audio information. Automatic speech recognition and alignment techniques register the audio track to the video’s text transcript.
“integrated best” skim (BOTH) merges the image centric and audio-centric approaches while maintaining moderate audio/video synchrony. Top-rated audio regions are selected as in the AUD skimThe audio is then augmented with imagery selected using IMG heuristics from a temporal window extending five seconds before and after the audio region.
COMP9519 Multimedia Systems – Lecture 12 – Slide 40 – J Chen
“Skim Video”: Extracting Significant Content
Skim Video (78 frames)
Original Video (1100 frames)
COMP9519 Multimedia Systems – Lecture 12 – Slide 41 – J Chen
The informedia skimsTemporal abstraction
motivates viewersTime compression
preserve essential dataSegments with matching words are combinedEach segment is extended based on the “goodness scores” of the ending point, until the time budget is reachedIssues:
Choppy presentationTemporal syntax (e.g., dialog)Early cutout of sentence, scene, audio
*SundaramCOMP9519 Multimedia Systems – Lecture 12 – Slide 42 – J Chen
Thumbnail images
COMP9519 Multimedia Systems – Lecture 12 – Slide 43 – J Chen
Empirical Study Into Thumbnail Images
COMP9519 Multimedia Systems – Lecture 12 – Slide 44 – J Chen
Text-based Result List
COMP9519 Multimedia Systems – Lecture 12 – Slide 45 – J Chen
“Naïve” Thumbnail List (Uses First Shot Image)
COMP9519 Multimedia Systems – Lecture 12 – Slide 46 – J Chen
Query-based Thumbnail Result List
COMP9519 Multimedia Systems – Lecture 12 – Slide 47 – J Chen
Query-based Thumbnail Selection Process
1. Decompose video segment into shots.2. Compute representative frame for each shot.
3. Locate query scoring words (shown by arrows).4. Use frame from highest scoring shot.
COMP9519 Multimedia Systems – Lecture 12 – Slide 48 – J Chen
Thumbnail Study Results
0
500
1000
Text First Query
Time (secs.)
0
100
200
300
400
Text First Query
Score (max =400)
0
25
50
75
Text First Query
Titles Browsed
1
3
5
7
9
Text First Query
1(terrible)-9(wonderful)
© Copyright 2003 Michael G. Christel 48 CarnegieMellon
COMP9519 Multimedia Systems – Lecture 12 – Slide 49 – J Chen
Empirical Study Summary*
Significant performance improvements for query-based thumbnail treatment over other two treatmentsSubjective satisfaction significantly greater for query-based thumbnail treatmentSubjects could not identify differences between thumbnail treatments, but their performance definitely showed differences!
_____*Christel, M., Winkler, D., and Taylor, C.R. Improving
Access to a Digital Video Library. In Human-Computer Interaction: INTERACT97, Chapman & Hall, London, 1997, 524-531
COMP9519 Multimedia Systems – Lecture 12 – Slide 50 – J Chen
Thumbnail View with Query Relevance Bar
© Copyright 2003 Michael G. Christel 50 CarnegieMellon
COMP9519 Multimedia Systems – Lecture 12 – Slide 51 – J Chen
Close-up of Thumbnail with Relevance Bar
Relevance score of [0, 100]This document has score of 30
Color-coded scoring words:“Asylum” contributes some,
“rights” a bit,“refugee” contributes 50%
Query-based thumbnail
Shortcut to storyboard* Christel
COMP9519 Multimedia Systems – Lecture 12 – Slide 52 – J Chen
Match bars
COMP9519 Multimedia Systems – Lecture 12 – Slide 53 – J Chen
Using Match Info to Reduce Storyboard Size
COMP9519 Multimedia Systems – Lecture 12 – Slide 54 – J Chen
Browsing many files (FXPAL)
COMP9519 Multimedia Systems – Lecture 12 – Slide 55 – J Chen
Video editing user interface (FXPAL)The top display lets users select clips from the raw video.The bottom display lets the users organize the clips along the timeline and change the lengths of the clips.
COMP9519 Multimedia Systems – Lecture 12 – Slide 56 – J Chen
Flipping through images in a pileCluster all clips by the similarity of their color histograms
Place similar clips into the same pile
Each clip is represented by one key-frame in a pile
The clips are stacked in temporal order.
COMP9519 Multimedia Systems – Lecture 12 – Slide 57 – J Chen
Expanding a pile of video clipsTo see the additional images in a pile, the user can expand the pile by clicking on it. The current display is faded out and the images of the pile are shown in an area in the middle of the faded out display. The timeline displays the coverage of the expanded view in lightgray and the coverage of the pile in a darker color as before.
COMP9519 Multimedia Systems – Lecture 12 – Slide 58 – J Chen
Outline
StoryboardsHierarchical browserVideo directory browserVideo summaryVideo skimmingThumbnail imagesBrowsing many clips
Concept of video query
COMP9519 Multimedia Systems – Lecture 12 – Slide 59 – J Chen
Feature-based Similarity Search
Video Query
COMP9519 Multimedia Systems – Lecture 12 – Slide 60 – J Chen
Query typesPoint query
specifies a point in the data space and retrieves all point objects in the database with identical coordinates:
Range queryGiven a query point Q, a distance r, and a distance function M, retrieve all points P from the database, which have a distance smaller or equal to r from Q according to M:
Nearest neighbor queryGiven a query point Q, retrieve the nearest neighbor point P from the database, ie, find object
K-nearest neighbor queryGiven a query point, return the k nearest neighbor points
COMP9519 Multimedia Systems – Lecture 12 – Slide 61 – J Chen
Distance functionsEuclidean (L2) Manhattan (L1)Maximum (L∞)Weighted Euclidean Weighted maximum Ellipsoid where W is a positive definite similarity matrix
COMP9519 Multimedia Systems – Lecture 12 – Slide 62 – J Chen
Query without index
Sequential scanSequentially scan through all records in the database
Size of databaseStorage cost is O(dn), where d is the dimensionality of a record, n is the size of the DB, assuming floating point data
The time to process a query is O(dn)Infeasible for a large database with millions of records!
Q: a better solution to search? Index
COMP9519 Multimedia Systems – Lecture 12 – Slide 63 – J Chen
ConclusionStoryboardsHierarchical browserVideo directory browserVideo summaryVideo skimmingThumbnail imagesBrowsing many clips
Concept of video query
COMP9519 Multimedia Systems – Lecture 12 – Slide 64 – J Chen
Some referencesChapter 4 of Book Multimedia Information Retrieval and Management An Interactive Comic Book Presentation for Exploring Video. John Boreczky, Andreas Girgensohn, Gene Golovchinsky, and Shingo Uchihashi in CHI 2000 Conference Proceedings, ACM Press, pp. 185-192, 2000., April 1, 2000 Christel, M., Smith, M., Taylor, C.R., and Winkler, D. Evolving Video Skims into Useful Multimedia Abstractions. In Proc. ACM CHI ’98 (Los Angeles, CA, April 1998), ACM Press, 171-178Christel, M., Winkler, D., and Taylor, C.R. Improving Access to a Digital Video Library. In Human-Computer Interaction: INTERACT97, Chapman & Hall, London, 1997, 524-531