75
1/75 Embedded Audio Coder Jin Li

Embedded Audio Coder

  • Upload
    wilmet

  • View
    85

  • Download
    0

Embed Size (px)

DESCRIPTION

Embedded Audio Coder. Jin Li. Outline. Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion. Introduction. Introduction – Audio Compression. Audio Waveform. - PowerPoint PPT Presentation

Citation preview

Page 1: Embedded Audio Coder

175

Embedded Audio Coder

Jin Li

275

Outline

IntroductionEmbedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design

Experimental results amp demosConclusion

375

Introduction

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 2: Embedded Audio Coder

275

Outline

IntroductionEmbedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design

Experimental results amp demosConclusion

375

Introduction

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 3: Embedded Audio Coder

375

Introduction

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 4: Embedded Audio Coder

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 5: Embedded Audio Coder

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 6: Embedded Audio Coder

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 7: Embedded Audio Coder

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 8: Embedded Audio Coder

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 9: Embedded Audio Coder

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 10: Embedded Audio Coder

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 11: Embedded Audio Coder

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 12: Embedded Audio Coder

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 13: Embedded Audio Coder

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 14: Embedded Audio Coder

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 15: Embedded Audio Coder

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 16: Embedded Audio Coder

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 17: Embedded Audio Coder

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 18: Embedded Audio Coder

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 19: Embedded Audio Coder

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 20: Embedded Audio Coder

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 21: Embedded Audio Coder

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 22: Embedded Audio Coder

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 23: Embedded Audio Coder

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 24: Embedded Audio Coder

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 25: Embedded Audio Coder

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 26: Embedded Audio Coder

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 27: Embedded Audio Coder

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 28: Embedded Audio Coder

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 29: Embedded Audio Coder

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 30: Embedded Audio Coder

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 31: Embedded Audio Coder

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 32: Embedded Audio Coder

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 33: Embedded Audio Coder

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 34: Embedded Audio Coder

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 35: Embedded Audio Coder

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 36: Embedded Audio Coder

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 37: Embedded Audio Coder

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 38: Embedded Audio Coder

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 39: Embedded Audio Coder

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 40: Embedded Audio Coder

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 41: Embedded Audio Coder

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 42: Embedded Audio Coder

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 43: Embedded Audio Coder

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 44: Embedded Audio Coder

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 45: Embedded Audio Coder

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 46: Embedded Audio Coder

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 47: Embedded Audio Coder

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 48: Embedded Audio Coder

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 49: Embedded Audio Coder

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 50: Embedded Audio Coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 51: Embedded Audio Coder

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 52: Embedded Audio Coder

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 53: Embedded Audio Coder

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 54: Embedded Audio Coder

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 55: Embedded Audio Coder

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 56: Embedded Audio Coder

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 57: Embedded Audio Coder

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 58: Embedded Audio Coder

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 59: Embedded Audio Coder

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 60: Embedded Audio Coder

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 61: Embedded Audio Coder

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 62: Embedded Audio Coder

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 63: Embedded Audio Coder

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 64: Embedded Audio Coder

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 65: Embedded Audio Coder

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 66: Embedded Audio Coder

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 67: Embedded Audio Coder

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 68: Embedded Audio Coder

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 69: Embedded Audio Coder

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 70: Embedded Audio Coder

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 71: Embedded Audio Coder

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 72: Embedded Audio Coder

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 73: Embedded Audio Coder

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 74: Embedded Audio Coder

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions
Page 75: Embedded Audio Coder

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions