Upload
ryohei-suzuki
View
542
Download
0
Tags:
Embed Size (px)
Citation preview
AnnoTone:Record-time Audio Watermarking
for Context-aware Video Editing
RYOHEI SUZUKI
DAISUKE SAKAMOTO
TAKEO IGARASHI
THE UNIVERSITY OF TOKYO
CHI 2015 @ Seoul
Session: What do I hear? Communicating with Sound
1
Video Editing is Still Difficult
4
Why?
1. Cost of learning video authoring tools is high
2. Context-aware editing requires much labor
for careful review and trial-and-error
• Adding visual effects
• Clipping scenes
• Adding captions and overlays
• Using additional information (e.g., GPS)
Our Objective
Annotating videos with contextual information
during recording to facilitate video editing
5
1. Automate & speed-up video editing activity
2. Enhance expressions using additional data
In this talk, we propose
1. A video-annotation technique requiring
no special equipment
2. A video-editing workflow that exploits
contextual information for efficient editing.
6
Core Ideas
• Encoding contextual information as
inaudible sound signals
• Embedding encoded annotations directly
into the audio track of video during recording
• Extracting the embedded information
while editing process on demand
7
1. Hardware Setup
• Attach smartphone to video camera
• Launch annotation-embedding application
Attaching Launching application9
2. Video Recording
• Gathering annotation from user input or sensors
• Converting them into inaudible audio signals
User Input Sensors
Scene
Annotation Signals
10
Workflow Overview
12
• Extract embedded annotation from audio track
• Remove annotation signals after editing
Editing Pipeline
Generally, video-editing involves
a line of pipelined processes.
AddingCaptions& Effects
ColorCorrection
Clipping… …
13
Editing Pipeline
Annotated audio track can pass through
the existing pipeline as ordinary one.
AddingCaptions& Effects
Color Correction
Clipping… …
14
Annotation Extraction
AddingCaptions& Effects
ColorCorrection
Clipping… …
15
Annotation data is extracted on demand
using our Watermark Extraction API
Watermark Extractor
Annotation Data
Annotation Removal
AddingCaptions& Effects
AudioMastering
Clipping… …
16
After the process, annotation signals
can be removed by applying an audio filter.
Audio Filter
Record-time Editing
Recording: information of Success/Failure
Editing: Automatic extraction of successful parts
Recording
Success Failure Success
Good! Bad! Good!
Success Success
Automatic extraction & combining
(time)
18
Video-editing with GPS
19
Recording: GPS positions
Editing: location-aware editing
Clipping movie by
sketching on a mapAutomatic map overlay
Automatic Overlaying
20
Recording: chess note of a game
Editing: automatic overlaying of board graphics
Notation UI Synthesized video20
Integrating with AfterEffects
AnnoTone plugin provides annotation data for AE
which can be used for generating effects
Exploiting annotations with existing practice21
Controlling AE animation
with sensor data
Integrating with AfterEffects
1. Analyzing footage to extract annotations
2. Generating a text layer containing JSON-
formatted annotation data at timeframe
3. Associating video effects/parameters with
annotations using expressions mechanism
22Footage
Effect control
(Javascript)JSON text layer
[{x: 138.0019,y: 38.13840},{x: 139.0133,y: 38.43405}]…
Human’s Hearing Characteristics
Human cannot perceive high-frequency sounds.
Sakamoto, Masayuki, et al. "Average thresholds in the 8 to 20 kHz range as a function of age.”
Scandinavian audiology 27.3 (1998): 189-192.24
Data-hiding as High-frequency
Audio Signals
25
Fre
quency (
Hz)
20
20k
22k
18k
High-frequency
RangeRecordable
RangeAudible
Range
We can hide information in the audio track
as high-frequency signals (audio watermarks).
Microphone Human
Spectrogram of audio track
High-frequency region
(almost inaudible)
26
Data-hiding as High-frequency
Audio Signals
Hidden
information
Benefit of Audio Watermarking
27
• Compatible with almost all video cameras
• Consistent synchronization between
annotations and video sequence
• Removable by applying low-pass filter
Watermarking Protocol
28
• Dual-Tone Multi-Frequency (DTMF)
– Representing 4-bits information by combination of
two single tones from 7 frequencies
• Packet representation
– Variable-length payload
– 400 bps gross data rate
Spectrogram of a watermark packet
ContextCam[Patel & Abowd, 2004]
Incompatible with existing video cameras.
Using special camera to record contexts of home videos
Storing annotations in frames by image watermarking
30
Cryptone (Ultra Sound Control)[Hirabayashi & Shimizu, 2012]
AnnoTone uses similar audio data-hiding method
for video editing support.
0100111010
Interaction between loudspeaker and smartphones
using high-frequency tones to convey information
31
0
20
40
60
80
100
667 571 500 444 400 364
Co
rrect
dete
cti
on
rate
(%
)
Gross bitrate (bps)
silent
public
rock
electronic
Data-rate vs. Reliability
~100% correct detection rate was achieved
with 400 bps annotation data rate. 34
Travel Distance
Watermark signal can travel up to 20cmthrough air from a smartphone speaker 35
0
20
40
60
80
100
0 5 10 15 20 25 30
Co
rrect
de
tecti
on
rate
(%
)
Distance between speaker and microphone (cm)
silent
public
rock
electronic
Durability against Conversion
36
Watermarks are preserved after conversion into
Ogg Vorbis, AC-3 and AAC with enough bitrate.
0
20
40
60
80
100
128 192 256 320
Co
rrect
de
tecti
on
rate
(%
)
Bit rate (kbps)
MP3
Ogg Vorbis
AC-3
AAC
Transparency for Human Ear
37
Measured noticeability of watermarks for human
• Click a button after notice of noise (6 participants)
0
20
40
60
80
100
silent public rock electronic
No
ticed
Wate
rmark
Rate
(%
)
Before Erasure
After Erasure
Limitations
38
• One-off development of
annotation-embedding applications
• Audio quality loss in watermark removal
• Limited data-rate of annotation
Embedding from Public Speaker
40
• Synchronization & integration of large number
of videos to create multi-view videos, etc.
• Entertainment use at amusement parks, etc.
“Sleeping Beauty Castle at Disneyland” by Lyght
Licensed under CC BY-SA 3.0
“Picture of Stadium” by Jazza5
Licensed under CC BY-SA 3.0
We proposed
42
a video annotation technique using audio watermarking,
and a video-editing workflow exploiting annotations.
BenefitAnnoTone can facilitate and enhance non-professional
video editing process without special equipment.
Compared with
Smartphone Recording
Some smartphone camera apps can record
annotation as metadata format (e.g., Adobe XMP)
– Of course, using such apps is clever for smartphone
recording occasions
What’s AnnoTone’s superiority?
• Dedicated video cameras are still superior to
smartphone camera
– In resolution, definition, lens quality, etc.
• No need of dealing with external metadata
– Because annotations are directly embedded as sound44