AnnoTone (CHI 2015)

AnnoTone:Record-time Audio Watermarking

for Context-aware Video Editing

RYOHEI SUZUKI

DAISUKE SAKAMOTO

TAKEO IGARASHI

THE UNIVERSITY OF TOKYO

CHI 2015 @ Seoul

Session: What do I hear? Communicating with Sound

1

Video recording and sharing have become

casual hobbies for everyone.

2

Camera Computer

Software Broadcasting

3

Video Editing is Still Difficult

4

Why?

1. Cost of learning video authoring tools is high

2. Context-aware editing requires much labor

for careful review and trial-and-error

• Adding visual effects

• Clipping scenes

• Adding captions and overlays

• Using additional information (e.g., GPS)

Our Objective

Annotating videos with contextual information

during recording to facilitate video editing

5

1. Automate & speed-up video editing activity

2. Enhance expressions using additional data

In this talk, we propose

1. A video-annotation technique requiring

no special equipment

2. A video-editing workflow that exploits

contextual information for efficient editing.

6

Core Ideas

• Encoding contextual information as

inaudible sound signals

• Embedding encoded annotations directly

into the audio track of video during recording

• Extracting the embedded information

while editing process on demand

7

Annotation Embedding

with Smartphone

8

1. Hardware Setup

• Attach smartphone to video camera

• Launch annotation-embedding application

Attaching Launching application9

2. Video Recording

• Gathering annotation from user input or sensors

• Converting them into inaudible audio signals

User Input Sensors

Scene

Annotation Signals

10

Editing Workflow with

Embedded Annotations

11

Workflow Overview

12

• Extract embedded annotation from audio track

• Remove annotation signals after editing

Editing Pipeline

Generally, video-editing involves

a line of pipelined processes.

AddingCaptions& Effects

ColorCorrection

Clipping… …

13

Editing Pipeline

Annotated audio track can pass through

the existing pipeline as ordinary one.


Color Correction

Clipping… …

14

Annotation Extraction


ColorCorrection

Clipping… …

15

Annotation data is extracted on demand

using our Watermark Extraction API

Watermark Extractor

Annotation Data

Annotation Removal


AudioMastering

Clipping… …

16

After the process, annotation signals

can be removed by applying an audio filter.

Audio Filter

Applications

17

Record-time Editing

Recording: information of Success/Failure

Editing: Automatic extraction of successful parts

Recording

Success Failure Success

Good! Bad! Good!

Success Success

Automatic extraction & combining

（time）

18

Video-editing with GPS

19

Recording: GPS positions

Editing: location-aware editing

Clipping movie by

sketching on a mapAutomatic map overlay

Automatic Overlaying

20

Recording: chess note of a game

Editing: automatic overlaying of board graphics

Notation UI Synthesized video20

Integrating with AfterEffects

AnnoTone plugin provides annotation data for AE

which can be used for generating effects

Exploiting annotations with existing practice21

Controlling AE animation

with sensor data

Integrating with AfterEffects

1. Analyzing footage to extract annotations

2. Generating a text layer containing JSON-

formatted annotation data at timeframe

3. Associating video effects/parameters with

annotations using expressions mechanism

22Footage

Effect control

(Javascript)JSON text layer

[{x: 138.0019,y: 38.13840},{x: 139.0133,y: 38.43405}]…

Annotation by

Audio Watermarking

23

Human’s Hearing Characteristics

Human cannot perceive high-frequency sounds.

Sakamoto, Masayuki, et al. "Average thresholds in the 8 to 20 kHz range as a function of age.”

Scandinavian audiology 27.3 (1998): 189-192.24

Data-hiding as High-frequency

Audio Signals

25

Fre

quency (

Hz)

20

20k

22k

18k

High-frequency

RangeRecordable

RangeAudible

Range

We can hide information in the audio track

as high-frequency signals (audio watermarks).

Microphone Human

Spectrogram of audio track

High-frequency region

(almost inaudible)

26

Data-hiding as High-frequency

Audio Signals

Hidden

information

Benefit of Audio Watermarking

27

• Compatible with almost all video cameras

• Consistent synchronization between

annotations and video sequence

• Removable by applying low-pass filter

Watermarking Protocol

28

• Dual-Tone Multi-Frequency (DTMF)

– Representing 4-bits information by combination of

two single tones from 7 frequencies

• Packet representation

– Variable-length payload

– 400 bps gross data rate

Spectrogram of a watermark packet

Related Work

29

ContextCam[Patel & Abowd, 2004]

Incompatible with existing video cameras.

Using special camera to record contexts of home videos

Storing annotations in frames by image watermarking

30

Cryptone (Ultra Sound Control)[Hirabayashi & Shimizu, 2012]

AnnoTone uses similar audio data-hiding method

for video editing support.

0100111010

Interaction between loudspeaker and smartphones

using high-frequency tones to convey information

31

Performance Evaluation

33

0

20

40

60

80

100

667 571 500 444 400 364

Co

rrect

dete

cti

on

rate

(%

)

Gross bitrate (bps)

silent

public

rock

electronic

Data-rate vs. Reliability

~100% correct detection rate was achieved

with 400 bps annotation data rate. 34

Travel Distance

Watermark signal can travel up to 20cmthrough air from a smartphone speaker 35

0

20

40

60

80

100

0 5 10 15 20 25 30

Co

rrect

de

tecti

on

rate

(%

)

Distance between speaker and microphone (cm)

silent

public

rock

electronic

Durability against Conversion

36

Watermarks are preserved after conversion into

Ogg Vorbis, AC-3 and AAC with enough bitrate.

0

20

40

60

80

100

128 192 256 320

Co

rrect

de

tecti

on

rate

(%

)

Bit rate (kbps)

MP3

Ogg Vorbis

AC-3

AAC

Transparency for Human Ear

37

Measured noticeability of watermarks for human

• Click a button after notice of noise (6 participants)

0

20

40

60

80

100

silent public rock electronic

No

ticed

Wate

rmark

Rate

(%

)

Before Erasure

After Erasure

Limitations

38

• One-off development of

annotation-embedding applications

• Audio quality loss in watermark removal

• Limited data-rate of annotation

Future Work

39

Embedding from Public Speaker

40

• Synchronization & integration of large number

of videos to create multi-view videos, etc.

• Entertainment use at amusement parks, etc.

“Sleeping Beauty Castle at Disneyland” by Lyght

Licensed under CC BY-SA 3.0

“Picture of Stadium” by Jazza5

Licensed under CC BY-SA 3.0

Conclusion

41

We proposed

42

a video annotation technique using audio watermarking,

and a video-editing workflow exploiting annotations.

BenefitAnnoTone can facilitate and enhance non-professional

video editing process without special equipment.

43

Compared with

Smartphone Recording

Some smartphone camera apps can record

annotation as metadata format (e.g., Adobe XMP)

– Of course, using such apps is clever for smartphone

recording occasions

What’s AnnoTone’s superiority?

• Dedicated video cameras are still superior to

smartphone camera

– In resolution, definition, lens quality, etc.

• No need of dealing with external metadata

– Because annotations are directly embedded as sound44

Science

AnnoTone (CHI 2015)