ITU-T G.722.1 ANNEX C A NEW LOW-COMPLEXITY 14 KHZ AUDIO CODING STANDARD Minjie Xie, Dave Lindbergh, and Peter Chu ICASSP 2006

ITU-T G.722.1 ANNEX CITU-T G.722.1 ANNEX C

A NEW LOW-COMPLEXITYA NEW LOW-COMPLEXITY

14 KHZ AUDIO CODING STANDARD14 KHZ AUDIO CODING STANDARD

Minjie Xie, Dave Lindbergh, and Peter Chu

ICASSP 2006

G.722.1C: First ITU-T Super-wideband G.722.1C: First ITU-T Super-wideband Audio Coding StandardAudio Coding Standard

Audio bandwidth: 14 kHzAudio bandwidth: 14 kHzSample rate:Sample rate: 32 kHz 32 kHzBit rate:Bit rate: 24, 32, and 48 kbit/s 24, 32, and 48 kbit/sAlgorithm:Algorithm: Transform coding (Siren14 Transform coding (Siren14TMTM))Frame size:Frame size: 20 ms 20 msAlgorithmic delay: 40 msAlgorithmic delay: 40 msComplexity:Complexity: <11 WMOPS (encoder+decoder) <11 WMOPS (encoder+decoder)

Very high audio qualityVery high audio qualitySuitable for video and teleconferencing and Internet Suitable for video and teleconferencing and Internet streamingstreamingAvailable on royalty-free licensing termsAvailable on royalty-free licensing terms

ICASSP 2006

Overview of Main G.722.1 ModeOverview of Main G.722.1 Mode

Wideband coding standard approved by ITU-T in 1998 Wideband coding standard approved by ITU-T in 1998

Provides 50-7000 Hz audio bandwidth at 24 and 32 kbit/sProvides 50-7000 Hz audio bandwidth at 24 and 32 kbit/s

Based on transform coding, using a Modulated Lapped Based on transform coding, using a Modulated Lapped Transform (MLT)Transform (MLT)

Operates on frames of 20 ms corresponding to 320 Operates on frames of 20 ms corresponding to 320 samples at a 16 kHz sampling ratesamples at a 16 kHz sampling rate

A Look-ahead of 20 ms due to 50% overlap between A Look-ahead of 20 ms due to 50% overlap between framesframes

Total algorithmic delay of 40 msTotal algorithmic delay of 40 ms

Very low computational complexity (about 5.3 WMOPS)Very low computational complexity (about 5.3 WMOPS)

ICASSP 2006

G.722.1C : Extension Mode of G.722.1G.722.1C : Extension Mode of G.722.1

Audio signal sampled at 32 kHzAudio signal sampled at 32 kHz

Double the audio bandwidth from 7 kHz to 14 kHzDouble the audio bandwidth from 7 kHz to 14 kHz

Same algorithmic steps as the main mode of G.722.1Same algorithmic steps as the main mode of G.722.1

Same frame size as G.722.1 – 20 msSame frame size as G.722.1 – 20 ms

Total algorithmic delay of 40 msTotal algorithmic delay of 40 ms

ICASSP 2006

Block Diagram of the G.722.1C EncoderBlock Diagram of the G.722.1C Encoder

ICASSP 2006

bit count

code bits

bit count

code bits

bit count

code bits

region power code bits

bit count

audioinput

bit count per frame (480, 640, or 960)

MLTcode bits

categorizationcontrol bits

bit streamMUX

Region powerquantizationand coding

Categorizationprocedure

Categorizationselection forrate control

SQVH

MLT coefficientquantizationand coding

(categorization 0)


(categorization 1)


(categorization 15)

rms_index

MLT

Switch

Block Diagram of the G.722.1C DecoderBlock Diagram of the G.722.1C Decoder

ICASSP 2006

DEMUX

Region powerdecoding andreconstruction

region powercode bits Categorization

procedure

bit count

rms_index

Identification ofcategorization

selection

Reconstruction anddenormalization ofMLT coefficients

(Region 0)


(Region 1)


(Region 27)

IMLT

audiooutput

Variable bit-lengthdecoding of

vector indices(Region 0)





Separation ofvector indices

into scalar indices(Region 0)





bit stream

categorizationcontrol bits

MLTcode bits

category_assignment0



rms_index0

rms_index1

rms_index27

Encoder of G.722.1 Annex CEncoder of G.722.1 Annex C

Double the MLT transform length from 320 to 640 Double the MLT transform length from 320 to 640

samplessamples

Double the number of frequency regions from 14 to 28Double the number of frequency regions from 14 to 28

Double the Huffman coding tables for encoding Double the Huffman coding tables for encoding

quantized region power indicesquantized region power indices

Double the threshold for adjusting the number of Double the threshold for adjusting the number of

available bits from 320 to 640available bits from 320 to 640

ICASSP 2006

Decoder of G.722.1 Annex CDecoder of G.722.1 Annex C

Double the number of frequency regions from 14 to 28Double the number of frequency regions from 14 to 28

Double the threshold for adjusting the number of Double the threshold for adjusting the number of

available bits from 320 to 640available bits from 320 to 640

Extend the centroid table for reconstruction of MLT Extend the centroid table for reconstruction of MLT

coefficientscoefficients

Double the IMLT transform length from 320 to 640 Double the IMLT transform length from 320 to 640

samplessamples

ICASSP 2006

Computational Complexity andComputational Complexity andMemory Requirements of G.722.1CMemory Requirements of G.722.1C

Bit rateBit rate(kbit/s)(kbit/s)

EncoderEncoder(WMOPS)(WMOPS)

DecoderDecoder(WMOPS)(WMOPS)

Enc.+Dec.Enc.+Dec.(WMOPS)(WMOPS)

2424 4.54.5 5.35.3 9.79.7

3232 4.84.8 5.55.5 10.310.3

4848 5.15.1 5.95.9 10.910.9

ICASSP 2006

Computational complexityComputational complexity

RAM (K bytes)RAM (K bytes) 1818

ROM (K bytes)ROM (K bytes) 3030

Memory requirementsMemory requirements

Computational Complexity of G.722.1CComputational Complexity of G.722.1C versus the 3GPP Audio Codecs versus the 3GPP Audio Codecs

Bit rateBit rate

(kbit/s)(kbit/s)

G.722.1CG.722.1C

(WMOPS)(WMOPS)

eAAC+eAAC+

(WMOPS)(WMOPS)

AMR-WB+AMR-WB+

(WMOPS)(WMOPS)

2424 9.79.7 40.840.8 80.180.1

3232 10.310.3 42.642.6 86.786.7

ICASSP 2006

Algorithmic Delay of G.722.1CAlgorithmic Delay of G.722.1C versus the 3GPP Audio Codecs versus the 3GPP Audio Codecs

G.722.1CG.722.1C

(ms)(ms)

eAAC+eAAC+

(ms)(ms)

AMR-WB+AMR-WB+

(ms)(ms)

40.040.0 129.9129.9[1][1] 113. 8113. 8[2][2]

ICASSP 2006

Note 1: Without bit-reservoir (see 3GPP TR 26.936 V6.1.0)Note 2: ISF = 25.6 kHz (see 3GPP TR 26.936 V6.1.0)

ITU-T Subjective Characterization TestsITU-T Subjective Characterization Tests

Subjective tests performed by France Telecom Subjective tests performed by France Telecom according to a test plan designed by ITU-T SG12 SQEGaccording to a test plan designed by ITU-T SG12 SQEG

Characterization test Phase 1 : Speech Characterization test Phase 1 : Speech - ACR for clean speech and DCR for noisy speech- ACR for clean speech and DCR for noisy speech

Characterization test Phase 2 : Music and mixed contentCharacterization test Phase 2 : Music and mixed content- MUSHRA method- MUSHRA method

Reference codec : MPEG-4 AAC-LD Reference codec : MPEG-4 AAC-LD PCEnc/DecProPCEnc/DecPro

Additional reference Codecs : 3GPP eAAC+ and Additional reference Codecs : 3GPP eAAC+ and

AMR-WB+AMR-WB+

Requirements : Requirements : Not worse than the reference codecNot worse than the reference codec for for a 99% confidence intervala 99% confidence interval

ICASSP 2006

ITU-T Subjective Test Results (Phase 1)ITU-T Subjective Test Results (Phase 1)

Clean Speech

0

1

2

3

4

5

G.722.1C AAC-LD AMR-WB+ eAAC+

24kbps

32kbps

48kbps

64kbps

ICASSP 2006

(MO

S)


Reverberant Speech with Office Noise

0

1

2

3

4

5


24kbps

32kbps

48kbps

64kbps

ICASSP 2006

(DM

OS

)


Reverberant Speech with Interfering Talkers

0

1

2

3

4

5


24kbps

32kbps

48kbps

64kbps

ICASSP 2006

(DM

OS

)


Reverberant Speech with Office Noise and Interfering Talkers

0

1

2

3

4

5


24kbps

32kbps

48kbps

64kbps

ICASSP 2006

(DM

OS

)


Music and Mixed Content (24 kbps)

0

20

40

60

80

100


ICASSP 2006

(MU

SH

RA

)



0

20

40

60

80

100


ICASSP 2006

(MU

SH

RA

)



0

20

40

60

80

100

G.722.1C AAC-LD AAC-LD at 64kbps

ICASSP 2006

(MU

SH

RA

)

ConclusionConclusion

G.722.1C met all performance requirementsG.722.1C met all performance requirements

Phase 1 (clean and noisy speech)Phase 1 (clean and noisy speech)- 24 kbit/s: - 24 kbit/s: BetterBetter than AAC-LD and than AAC-LD and Not WorseNot Worse than eAAC+ than eAAC+- 32 kbit/s: - 32 kbit/s: BetterBetter than AAC-LD, than AAC-LD, Not WorseNot Worse than eAAC+, and than eAAC+, and Not WorseNot Worse than AMR-WB+ in most of tests than AMR-WB+ in most of tests- 48 kbit/s: - 48 kbit/s: Not WorseNot Worse than AAC-LD at 48 and 64 kbit/s than AAC-LD at 48 and 64 kbit/s

Phase 2 (music and mixed content)Phase 2 (music and mixed content)- 24 kbit/s: - 24 kbit/s: BetterBetter than AAC-LD than AAC-LD- 32 kbit/s: - 32 kbit/s: BetterBetter than AAC-LD than AAC-LD- 48 kbit/s: - 48 kbit/s: BetterBetter than AAC-LD at 48 and 64 kbit/s than AAC-LD at 48 and 64 kbit/s

Executables, audio samples, and more information Executables, audio samples, and more information available at : available at : http://www.polycom.com/Siren14http://www.polycom.com/Siren14

ICASSP 2006

AcknowledgmentAcknowledgment

The authors would like to acknowledge Claude The authors would like to acknowledge Claude

Lamblin, ITU-T Q.10/SG16 Rapporteur, and Lamblin, ITU-T Q.10/SG16 Rapporteur, and

Catherine Quinquis, ITU-T Q.7/SG12 Catherine Quinquis, ITU-T Q.7/SG12

Rapporteur, for their great work guiding this Rapporteur, for their great work guiding this

project to a completion. In addition, the authors project to a completion. In addition, the authors

would like to thank the speech quality experts would like to thank the speech quality experts

and staff who performed the subjective and staff who performed the subjective

characterization tests at France Telecom. characterization tests at France Telecom.

ICASSP 2006

Documents

ITU-T G.722.1 ANNEX C A NEW LOW-COMPLEXITY 14 KHZ AUDIO CODING STANDARD Minjie Xie, Dave Lindbergh, and Peter Chu ICASSP 2006