User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research

Preview:

Citation preview

User Benefits of Non-Linear Time Compression

Liwei He and Anoop Gupta

Microsoft Research

Introduction

Time compression: key to browse AV content

We focus on informational content

Audio time compression algorithms

Linear: speed up audio uniformly

Non-linear: exploit fine-grain structure of human speech (e.g. pause, phonemes)

How much more do users gain from more complex algorithms?

Methodology

Conduct user listening test

One Linear TC algorithm

Two Non-linear TC algorithms

Simple: Pause-removal followed by Linear TC

Sophisticated: Adaptive TC

Compare objective and subjective measurements

Time Compression Algorithms

Linear Time Compression

Classic algorithms

Overlap Add (OLA) and Synchronized OLA (SOLA)

We use SOLA

Non-Linear Time Compression

Algorithm 1: Pause removal plus TC

Energy and Zero Crossing Rate analysis

Leave 150ms untouched

Shorten >150ms to 150ms

Apply SOLA algorithm

PR shortens speech by 10-25%

Non-Linear Time Compression (cont.)

Algorithm 2: Adaptive TC

Mimics people when talking fast

Pauses and silences are compressed the most

Stressed vowels are compressed the least

Consonants are compressed more than vowels

Consonants are compressed based on neighboring vowels

System Implications

Computational complexity

Adaptive TC 10x more costly than Linear TC

Complexity in client-server implementation

Buffer management required for non-linear TC

Audio-video synchronization quality

User Study Method

User Study Goals

Highest intelligible speed

Comprehension

Subjective preference

Sustainable speed

Experiment Method

24 subjects

4 tasks for each subject

3 time compression algorithms

Linear TC using SOLA (Linear)

Pause removal plus Linear TC (PR-Lin)

Adaptive TC (Adapt)

Each test takes approximately 30 minutes

Highest Intelligible Speed Task

3 clips from technical talks

Find the highest speed when most of words are understandable

Comprehension Task

3 clips at 1.5x and 3 clips at 2.5x

Clips from TOEFL listening test

Answer 4 multiple choice questions

Subjective Preference Task

3 pairs of clips at 1.5x

3 pairs of clips at 2.5x

Each pair contains the same clip compressed with 2 of the 3 TC algorithms

Indicate preference on 3-point scale

Sustainable Speed Task

3 clips each 8 minute along

Clips from a CD audio book

Find the maximum comfortable speed

Write a 4-5 sentence summary at the end

User Study Results

Highest Intelligible Speed Task

PR-Lin is significantly better than Adapt (p<.01)

0

0.5

1

1.5

2

2.5

3

Linear PR-Lin Adapt

Co

mp

res

sio

n R

ate

Comprehension Task

0

10

20

30

40

50

60

70

80

90

Linear PR-Lin Adapt

Sc

ore

(%

)

1.5x

2.5x

Adapt is better than PR-Lin (p=.083) at 2.5x

Preference Task at 1.5x

Slight preference for PR-Lin (p=.093)

1.5xPrefer Former

Prefer None

Prefer Latter

Linear vs. PR-Lin

6 5 13

PR-Lin vs. Adapt

13 5 6

Adapt vs. Linear

8 8 8

Preference Task at 2.5x

PR-Lin and Adapt do significantly better than Linear

2.5xPrefer Former

Prefer None

Prefer Latter

Linear vs. PR-Lin

2 8 14

PR-Lin vs. Adapt

4 9 11

Adapt vs. Linear

21 3 0

Sustainable Speed Task

0

0.5

1

1.5

2

2.5

Linear PR-Lin Adapt

Co

mp

res

sio

n R

ate

Conclusions

Previous Works

Mach1 (Covell et. al. ICASSP 98)

Comprehension and preference tasks

Comparing Linear and Mach1 (Adapt) at 2.6-4.2x

Comprehension scores 17% better w/ Mach1

95% prefers Mach1 to Linear

No data on < 2.0x

Other works (Harrigan, Omoigui, Li, Foulke)

1.2-1.7x is the sustainable listening speed

Conclusions

Trade off in TC algorithms is task-related

Listening: Linear TC is sufficient

Fast Forwarding: Non-linear TC is more suitable

Adapt TC is close to the way people talk fast

Limit lies in the human-listening and comprehension

Recommended