Upload
evan-jimenez
View
38
Download
1
Embed Size (px)
DESCRIPTION
Passive Capture and Structuring of Lectures. Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell University. Introduction. Multimedia Presentations Manual Labor-intensive Experience-on-Demand (EOD) of CMU Capture & abstract personal experiences (audio / video) - PowerPoint PPT Presentation
Citation preview
Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Passive Capture and Structuring of LecturesPassive Capture and
Structuring of Lectures
Sugata Mukhopadhyay, Brian Smith
Department of Computer Science Cornell University
2Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
IntroductionIntroduction
• Multimedia Presentations– Manual– Labor-intensive
• Experience-on-Demand (EOD) of CMU– Capture & abstract personal experiences
(audio / video)– Synchronization of Audio, Video & position
data
3Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Introduction Contd.Introduction Contd.
• Classroom 2000 (C2K, Georgia Tech)– Authoring multimedia documents from live
events– Data from white boards, cameras, etc. are
combined to create multimedia documents for classroom activities
• Similarity (EOD & C2K)– Automatically capture– Author Multimedia documents
4Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Introduction Contd.Introduction Contd.
• Dissimilarity :– C2K: Invasive capture (Explicitly start capture),
Structured environment (Specific)– EOD : Passive capture, unstructured
Unstructured Structured
Invasive C2K
Passive EOD Lecture
Browser
5Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
MotivationMotivation
• Structured Multimedia document from seminars, talk, or class
• Speaker can walk, press a button and give a presentation using blackboards, whiteboards, 35mm slides, overheads, or computer projection
• One hour later, structured presentation on web
6Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
OverviewOverview
• Cameras ( Encoded in MPEG format)– Overview camera (entire lecture)– Tracking camera (H/W built tracker), tracks
speaker, capture head & shoulders
• Upload slides to server (Speaker)
7Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
OverviewOverview• Video Region
– RealVideo• Index
– Title & duration of current slide
– Synchronized with video
– Prev / Next skip slides
• Timeline – Boxes represents
the duration of slide
Timeline Slides
Video
Index
8Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Problems HandledProblems Handled
• Synchronization– Transitive (position of event A in a timeline)
• A<->B => B can be add to same timeline• Synchronization error E1 = (A,B) and E2 = (B,C)
=> error (A,C) = E1 + E2
– Collected data• Timed (T-data, Video)• Untimed (U-data, Electronic slides)
9Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Problems Handled Contd.Problems Handled Contd.
• Synchronization– Time-timed Synchronization (TTS)
• Two video streams
– Timed-untimed Synchronization (TUS)• Slides with video
– Untimed-untimed Synchronization (UUS)• Slides titles : Parsing the HTML produced by
PowerPoint
• Automatic Editing– Rule based structuring of Synchronized data
10Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Timed Synchronization
Timed-Timed Synchronization
• Temporal link between streams captured from independent cameras
Δ1
Δ
Δ2V2(t)
V1(t)
Synchronization point
V1(t + Δ) V2(t ± ε) tolerance of To solve this, consider one or more Synchronization Points
Δ = Δ1 - Δ2 maximum uncertainty = 1 + 2
11Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Timed Synchronization Contd.
Timed-Timed Synchronization Contd.
• Artificial creation of Synchronization Point of duration 1 second
• One of the channel in MPEG streams• Sound card is used for tone generation• Later, detection of the positions of tones in each stream.
Camera Machine
MPEG Audio
Left Right
Camera Machine
MPEG AudioRight Left
SoundCard
Sync Tone
Wireless MicReceiver
Wireless MicReceiver
Speaker Audio
12Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Timed Synchronization Contd.
Timed-Timed Synchronization Contd.
• Detection of Synchronization Tone– Brute force approach (Fully decoding of MPEG
Audio)– Proposed Method
• Scale factors indicates overall volume of packets• Summing up Scale factors for volume estimation• Exceeds certain thresholds• Assuming MPEG-2 : worst error 26 ms (22.5 * 1152
microseconds) and max error 52 ms• Video: 30 FPS, e < 1/30 seconds
13Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Timed Synchronization Contd.
Timed-Timed Synchronization Contd.
• Tighter bound (22.5 kHz)– Error <= 1/22.5 <= 44 micro sec < 26 ms (max
error 26 ms)– For video of 15 FPS, max error 66 ms– Using this in MPEG System, a tone of 70
seconds can be located < 2 seconds
14Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Untimed SynchronizationTimed-Untimed Synchronization
• Synchronization of slides with one of the video
1 2{ , ,..., }, ( ) :[0, ] , Where d is the duration of ( )nS S S S f t d S V t
Use a tolerance of 0.5 sec for the synchronization
15Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Untimed Synchronization Contd.
Timed-Untimed Synchronization Contd.
• Segmentation of slides from video of V(t)
0 1 1{ , ,..., }: ( , ) No change of slide imagei ikT t t t V t t
• Color Histogram– Slide having same background– Low resolution
• Feature based Algorithm– Clipping frames, Low-pass filter, Adaptively
thresholded– Let B1 and B2 is the two consecutive processed
frames 1 2( 1, 2)1 2
Td dDist B Bb b
16Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Untimed Synchronization Contd.
Timed-Untimed Synchronization Contd.
– Assumption : Slides contain dark foreground and light background
– Applied to I-frame of MPEG video with 0.5 sec interval
• Matching– Matching performed with the original slides for
confirmation of slide change– Similarity > 95%, match declared & terminated– Similarity > 90%, highest similarity is returned– Too much noisy to match
17Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Untimed Synchronization Contd.
Timed-Untimed Synchronization Contd.
• Unwrapping– Video sequence contain foreshorten
version of slides– Quadrilateral F -> Rectangle (size
as original)– Camera & Projector fixed, corner
points of F are same– Perspective transform -> Rectangle– Bilinear Interpolation (Rectangle)
18Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Untimed Synchronization Contd.
Timed-Untimed Synchronization Contd.
19Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Untimed Synchronization Contd.
Timed-Untimed Synchronization Contd.
• Similarity– Hausdorff Distance– Dilation (radius 3) of pixels in original
binary images– Setting all pixel to black in the dilation
radius of any black pixels to count overlap (G)
– b # of black pixels dilation (for extracted one, F)
– b’ # of black pixels F & G– Forward match ratio = b’ / b– Similarly, reverse match ratio is
calculated by dilating the F & keeping G (without dilating)
20Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Timed-Untimed Synchronization Contd.
Timed-Untimed Synchronization Contd.
• Evaluation– 106 slides, 143 transitions– Accuracy 97.2 %– Need to be tuned for dark background and
light foreground
21Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Automatic EditingAutomatic Editing
• Combining captured videos into single stream• Constraints
– Footage from overview must be shown 3 sec before and 5 second after slide change
– 3 sec < any shot < 25 sec
• Heuristic algorithm Edit Decision List (EDL)– Shot taken from one video source– Consecutive shots come from different video source– Shot: Start time, duration, which video source– Concatenating the footage of shots (final edited video)
22Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Automatic Editing Contd.Automatic Editing Contd.
23Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Automatic Editing Contd.Automatic Editing Contd.
Shots from overview camera < 3 sec & separated from the tracking camera are merged
Short from tracking camera > 25 sec are broken to 5 sec shots
24Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
ConclusionConclusion
• Automatic Synchronization and Editing Systems• Classification of different kind of Synchronization• Slide change detection for dark foreground and
light background (Textual part)• Slide Identification confirms slide change
detection• Rotation and translation can affect the matching
25Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Future WorkFuture Work
• Motion vector analysis and scene cut detection (Trigger switch to overview camera)
• Automatic enhancement to poor lighting• Orientation and position of speaker for editing• Shots from more cameras• Use of blackboards, whiteboards and
transparencies
Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Looking at Projected Documents:
Looking at Projected Documents:
Event Detection & Document Identification:
Event Detection & Document Identification:
27Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
IntroductionIntroduction• Documents play major role in presentations, meetings,
lectures, etc.• Captured as a video stream or images
2. Identification of extracted low-resolution document images
1. Temporal segmentation of meetings based on documents events (projected): Inter-documents (slide change, etc) Intra-documents (animation, scrolling, etc) Extra-documents (sticks, beams, etc)
• Goal: annotation & retrieval using visible documents
28Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
MotivationMotivation
• Detection & identification from low-resolution devices
• Current focus on projected documents
• Extendable for documents on table
• Captured as a video stream (Web-cam)
29Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Slide Change DetectionSlide Change Detection
• Slides in a slideshow: same layout, background, pattern, etc.
• Web-cam is auto-focusing (nearly 400 ms for stable image)
• Variation of lighting condition
• Presentation slides as a video stream
30Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Slide Change Detection (Cont’d)
Slide Change Detection (Cont’d)
Different slides with similar text layout
Fading during auto-focusingDuring Auto-focusing period
31Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Slide Change Detection (Cont’d)
Slide Change Detection (Cont’d)
• Existing methods for scene cut detection– Histogram (color and gray)– Cornell method (Hausdorff Distance)
• Histogram methods fail due to: a) low-resolution b) low-contrast c) auto-focusing d) fading
• Cornell: Uses identification to validate the changes
• Fribourg method: Slide stability - Assumption : Slide visible 2 seconds slide
skipping
32Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Proposed Slide Change Detection
Proposed Slide Change Detection
x0 x1
xi-1xi xN-2
xN-1
210
xi+1
i +1ii -1 NN -1
Stability Confirmation
2 s2 s2 s
0.5 s0.5 s
Check for Stability
1{ ( ), ( 2)}Dist S t S t T 0,0.5,1,1.5,.....,t D
2 1
0 1 1
( )
( ){ , ,.., }N
E XT T
Var XX x x x
33Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Ground-Truth PreparationGround-Truth Preparation
• Based on SMIL
300 Slideshows collected from web
• Automatic generation of SMIL file: Random duration of each slide
• Contains slide id, start time, stop time and type (skip or normal)
34Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
EvaluationEvaluation
1. Ground-Truth: SMIL XML
2. Slideshow video Slide Change Detection XML
3. Evaluation: Compare 1 & 2
4. Metric used: Recall (R), Precision (P), F-measure (F)
<slide id="1" imagefile="Slide1.JPG" st="0000000" et="9.641000" type="normal" /><slide id="2" imagefile="Slide2.JPG" st="9.641000" et="12.787199" type="normal" /><slide id="3" imagefile="Slide15.JPG" st="12.787199" et="13.775500" type="skip" /> <slide id="4" imagefile="Slide11.JPG" st="13.775500" et="14.341699" type="skip" /><slide id="5" imagefile="Slide25.JPG" st="14.341699" et="15.885400" type="skip" /><slide id="6" imagefile="Slide20.JPG" st="15.885400" et="16.476199" type="skip" /><slide id="7" imagefile="Slide9.JPG" st="16.476199" et="18.094100" type="skip" /><slide id="8" imagefile="Slide3.JPG" st="18.094100" et="23.160102" type="normal" /><slide id="9" imagefile="Slide4.JPG" st="23.160102" et="26.523102" type="normal" />
……..
An example of Ground-Truth SMIL file
35Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
ResultsResults
Fribourg (R:0.84,P:0.82,F:0.83)Cornell (R:0.40, P:0.21, F:0.23)
1 Frame Tolerance
R:0.80, P:0.83, F:0.81
1 Frame Tolerance
R:0.92, P:0.96, F:0.93
4 Frame Tolerance
1 Frame Tolerance
Color Hist (R:0.07, P:0.04, F:0.05) Gray Hist (R:0.18, P:0.12, F:0.13)
36Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Results (Cont’d)Results (Cont’d)4 Frames Tolerance
Fribourg (R:0.93, P:0.91, F:0.92)
Cornell (R:0.80, P:0.51, F:0.54)
4 Frames Tolerance
Color Hist (R:0.13, P:0.09, F:0.10)
Gray Hist (R:0.27, P:0.17, F:0.19)
37Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Low-resolution Docs Identification
Low-resolution Docs Identification
• Difficulties in Identification– Hard to use existing DAS (50-100 dpi)– Performance of OCR is very bad
– Hard to extract complete layout (Physical, Logical)– Rotation, translation and resolution affect global image matching
– Captured images vary : lighting, flash, distance, auto-focusing, motion blur, occlusion, etc.
38Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Proposed Docs IdentificationProposed Docs Identification
• Based on Visual Signature– Shallow layout with zone labeling– hierarchically structured using features’ priority
• Identification : matching of signatures
• Matching : simple heuristics, following hierarchy of signature
39Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Visual Signature ExtractionVisual Signature Extraction
• Common resolution, RLSA
• Zone labeling (text, image, solid bars, etc.)
• Blocks separation: Projection Profiles
• Text blocks (One line per block)
• Bullet and vertical text line extraction
40Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Visual Signature ExtractionVisual Signature Extraction
• Feature vector for Image, Bars (Horizontal and Vertical), Bullets : min min max max( , , , , )Y X H W P
Bounding box of various features
min min max max( , , , , , ( , ), )word i i iY X H W N R Y X P
• Feature vector for each Text line and, Bar with text (Horizontal and Vertical):
41Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Structuring Visual SignatureStructuring Visual Signature– Hierarchy depends on extraction process & real world slideshow – Narrows the search path during matching
<VisualSign> <BoundingBox NoOfBb="10"> <Text NoOfLine="7"> <HasHorizontalText NoOfSentence="7"> <S y="53" x="123" width="436" height="25" NoOfWords="4" PixelRatio="0.40" /> … </HasHorizontalText> <HasVerticalText NoOfSentence="0" /> </Text> <HasImage NoOfImage="3"> <Image y="1" x="16" width="57" height="533" PixelRatio="0.88" /> … </HasImage> <HasBullet NoOfBullets="2"> <Bullet y="122" x="141" width="12" height="12" PixelRatio="1.0" /> .. </HasBullet> <Line NoOfLine="0"> <HasHLine NoOfLine="0" /> <HasVLine NoOfLine="0" /> </Line> <BarWithText NoOfBar="0"> <HBarWithText NoOfBar="0" /> <VBarWithText NoOfBar="0" /> </BarWithText></BoundingBox></VisualSign>
42Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Structured Signature-based Matching
Structured Signature-based Matching
• Search Technique:– Takes the advantage of hierarchical structure of
visual signature
– Higher level features compared lower-level features matched
Tree representation of features in visual signature
f3
f2
f7 f5 f4f1
H-Text
Image Bullets
H-Line V-LineV-Text
F
Text Line
f8 f6
HBarText VBarText
Text Bar
Bbox
43Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
ResultsResults• Evaluation based on Recall and Precision
• ~ 200 slide images (web-cam) queried (repository 300 slides) (R:0.94, P:0.80, F:0.86)
Matching Performance
44Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
ConclusionConclusion
• Proposed Slide Change Detection– Automatic evaluation
– Performance : best compared to state-of-the-art
– Lower time and computational complexity
– Overcomes: auto-focusing, fading nature of web-cam– Performance : accuracy improved compared to Cornell (low tolerance)
– Could be used for meeting indexing : high precision
45Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
ConclusionConclusion• Proposed Slide Identification:
– Based on Visual Signature– No need for any classifier– Fast : only Signature matching (without
global image matching)
– Without OCR– Could be helpful for real-time application
(translation, mobile OCR, etc.)
– Applicable for digital cameras and mobile phones• Finally: Documents as a way for indexing & retrieval
46Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Future WorksFuture Works
• Detection and identification : pointed and partially occluded documents
Identification with complex background structure
Evaluation: Digital cameras, mobile phones
Background pattern and color information to Visual Signature
Identification of documents on table
• Evaluation of animation
47Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Possible ProjectsPossible Projects
• Deformation correction (Perspective, Projective, etc.)
• Automatic detection of projected documents in the captured video
• Detection of occluded objects
• Background pattern recognition
Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg
Thank You !
Thank You !