Upload
jasmin-wiggins
View
228
Download
1
Embed Size (px)
Citation preview
ITU-T G.722.1 ANNEX CITU-T G.722.1 ANNEX C
A NEW LOW-COMPLEXITYA NEW LOW-COMPLEXITY
14 KHZ AUDIO CODING STANDARD14 KHZ AUDIO CODING STANDARD
Minjie Xie, Dave Lindbergh, and Peter Chu
ICASSP 2006
G.722.1C: First ITU-T Super-wideband G.722.1C: First ITU-T Super-wideband Audio Coding StandardAudio Coding Standard
Audio bandwidth: 14 kHzAudio bandwidth: 14 kHzSample rate:Sample rate: 32 kHz 32 kHzBit rate:Bit rate: 24, 32, and 48 kbit/s 24, 32, and 48 kbit/sAlgorithm:Algorithm: Transform coding (Siren14 Transform coding (Siren14TMTM))Frame size:Frame size: 20 ms 20 msAlgorithmic delay: 40 msAlgorithmic delay: 40 msComplexity:Complexity: <11 WMOPS (encoder+decoder) <11 WMOPS (encoder+decoder)
Very high audio qualityVery high audio qualitySuitable for video and teleconferencing and Internet Suitable for video and teleconferencing and Internet streamingstreamingAvailable on royalty-free licensing termsAvailable on royalty-free licensing terms
ICASSP 2006
Overview of Main G.722.1 ModeOverview of Main G.722.1 Mode
Wideband coding standard approved by ITU-T in 1998 Wideband coding standard approved by ITU-T in 1998
Provides 50-7000 Hz audio bandwidth at 24 and 32 kbit/sProvides 50-7000 Hz audio bandwidth at 24 and 32 kbit/s
Based on transform coding, using a Modulated Lapped Based on transform coding, using a Modulated Lapped Transform (MLT)Transform (MLT)
Operates on frames of 20 ms corresponding to 320 Operates on frames of 20 ms corresponding to 320 samples at a 16 kHz sampling ratesamples at a 16 kHz sampling rate
A Look-ahead of 20 ms due to 50% overlap between A Look-ahead of 20 ms due to 50% overlap between framesframes
Total algorithmic delay of 40 msTotal algorithmic delay of 40 ms
Very low computational complexity (about 5.3 WMOPS)Very low computational complexity (about 5.3 WMOPS)
ICASSP 2006
G.722.1C : Extension Mode of G.722.1G.722.1C : Extension Mode of G.722.1
Audio signal sampled at 32 kHzAudio signal sampled at 32 kHz
Double the audio bandwidth from 7 kHz to 14 kHzDouble the audio bandwidth from 7 kHz to 14 kHz
Same algorithmic steps as the main mode of G.722.1Same algorithmic steps as the main mode of G.722.1
Same frame size as G.722.1 – 20 msSame frame size as G.722.1 – 20 ms
Total algorithmic delay of 40 msTotal algorithmic delay of 40 ms
ICASSP 2006
Block Diagram of the G.722.1C EncoderBlock Diagram of the G.722.1C Encoder
ICASSP 2006
bit count
code bits
bit count
code bits
bit count
code bits
region power code bits
bit count
audioinput
bit count per frame (480, 640, or 960)
MLTcode bits
categorizationcontrol bits
bit streamMUX
Region powerquantizationand coding
Categorizationprocedure
Categorizationselection forrate control
SQVH
MLT coefficientquantizationand coding
(categorization 0)
MLT coefficientquantizationand coding
(categorization 1)
MLT coefficientquantizationand coding
(categorization 15)
rms_index
MLT
Switch
Block Diagram of the G.722.1C DecoderBlock Diagram of the G.722.1C Decoder
ICASSP 2006
DEMUX
Region powerdecoding andreconstruction
region powercode bits Categorization
procedure
bit count
rms_index
Identification ofcategorization
selection
Reconstruction anddenormalization ofMLT coefficients
(Region 0)
Reconstruction anddenormalization ofMLT coefficients
(Region 1)
Reconstruction anddenormalization ofMLT coefficients
(Region 27)
IMLT
audiooutput
Variable bit-lengthdecoding of
vector indices(Region 0)
Variable bit-lengthdecoding of
vector indices(Region 1)
Variable bit-lengthdecoding of
vector indices(Region 27)
Separation ofvector indices
into scalar indices(Region 0)
Separation ofvector indices
into scalar indices(Region 1)
Separation ofvector indices
into scalar indices(Region 27)
bit stream
categorizationcontrol bits
MLTcode bits
category_assignment0
category_assignment1
category_assignment27
rms_index0
rms_index1
rms_index27
Encoder of G.722.1 Annex CEncoder of G.722.1 Annex C
Double the MLT transform length from 320 to 640 Double the MLT transform length from 320 to 640
samplessamples
Double the number of frequency regions from 14 to 28Double the number of frequency regions from 14 to 28
Double the Huffman coding tables for encoding Double the Huffman coding tables for encoding
quantized region power indicesquantized region power indices
Double the threshold for adjusting the number of Double the threshold for adjusting the number of
available bits from 320 to 640available bits from 320 to 640
ICASSP 2006
Decoder of G.722.1 Annex CDecoder of G.722.1 Annex C
Double the number of frequency regions from 14 to 28Double the number of frequency regions from 14 to 28
Double the threshold for adjusting the number of Double the threshold for adjusting the number of
available bits from 320 to 640available bits from 320 to 640
Extend the centroid table for reconstruction of MLT Extend the centroid table for reconstruction of MLT
coefficientscoefficients
Double the IMLT transform length from 320 to 640 Double the IMLT transform length from 320 to 640
samplessamples
ICASSP 2006
Computational Complexity andComputational Complexity andMemory Requirements of G.722.1CMemory Requirements of G.722.1C
Bit rateBit rate(kbit/s)(kbit/s)
EncoderEncoder(WMOPS)(WMOPS)
DecoderDecoder(WMOPS)(WMOPS)
Enc.+Dec.Enc.+Dec.(WMOPS)(WMOPS)
2424 4.54.5 5.35.3 9.79.7
3232 4.84.8 5.55.5 10.310.3
4848 5.15.1 5.95.9 10.910.9
ICASSP 2006
Computational complexityComputational complexity
RAM (K bytes)RAM (K bytes) 1818
ROM (K bytes)ROM (K bytes) 3030
Memory requirementsMemory requirements
Computational Complexity of G.722.1CComputational Complexity of G.722.1C versus the 3GPP Audio Codecs versus the 3GPP Audio Codecs
Bit rateBit rate
(kbit/s)(kbit/s)
G.722.1CG.722.1C
(WMOPS)(WMOPS)
eAAC+eAAC+
(WMOPS)(WMOPS)
AMR-WB+AMR-WB+
(WMOPS)(WMOPS)
2424 9.79.7 40.840.8 80.180.1
3232 10.310.3 42.642.6 86.786.7
ICASSP 2006
Algorithmic Delay of G.722.1CAlgorithmic Delay of G.722.1C versus the 3GPP Audio Codecs versus the 3GPP Audio Codecs
G.722.1CG.722.1C
(ms)(ms)
eAAC+eAAC+
(ms)(ms)
AMR-WB+AMR-WB+
(ms)(ms)
40.040.0 129.9129.9[1][1] 113. 8113. 8[2][2]
ICASSP 2006
Note 1: Without bit-reservoir (see 3GPP TR 26.936 V6.1.0)Note 2: ISF = 25.6 kHz (see 3GPP TR 26.936 V6.1.0)
ITU-T Subjective Characterization TestsITU-T Subjective Characterization Tests
Subjective tests performed by France Telecom Subjective tests performed by France Telecom according to a test plan designed by ITU-T SG12 SQEGaccording to a test plan designed by ITU-T SG12 SQEG
Characterization test Phase 1 : Speech Characterization test Phase 1 : Speech - ACR for clean speech and DCR for noisy speech- ACR for clean speech and DCR for noisy speech
Characterization test Phase 2 : Music and mixed contentCharacterization test Phase 2 : Music and mixed content- MUSHRA method- MUSHRA method
Reference codec : MPEG-4 AAC-LD Reference codec : MPEG-4 AAC-LD PCEnc/DecProPCEnc/DecPro
Additional reference Codecs : 3GPP eAAC+ and Additional reference Codecs : 3GPP eAAC+ and
AMR-WB+AMR-WB+
Requirements : Requirements : Not worse than the reference codecNot worse than the reference codec for for a 99% confidence intervala 99% confidence interval
ICASSP 2006
ITU-T Subjective Test Results (Phase 1)ITU-T Subjective Test Results (Phase 1)
Clean Speech
0
1
2
3
4
5
G.722.1C AAC-LD AMR-WB+ eAAC+
24kbps
32kbps
48kbps
64kbps
ICASSP 2006
(MO
S)
ITU-T Subjective Test Results (Phase 1)ITU-T Subjective Test Results (Phase 1)
Reverberant Speech with Office Noise
0
1
2
3
4
5
G.722.1C AAC-LD AMR-WB+ eAAC+
24kbps
32kbps
48kbps
64kbps
ICASSP 2006
(DM
OS
)
ITU-T Subjective Test Results (Phase 1)ITU-T Subjective Test Results (Phase 1)
Reverberant Speech with Interfering Talkers
0
1
2
3
4
5
G.722.1C AAC-LD AMR-WB+ eAAC+
24kbps
32kbps
48kbps
64kbps
ICASSP 2006
(DM
OS
)
ITU-T Subjective Test Results (Phase 1)ITU-T Subjective Test Results (Phase 1)
Reverberant Speech with Office Noise and Interfering Talkers
0
1
2
3
4
5
G.722.1C AAC-LD AMR-WB+ eAAC+
24kbps
32kbps
48kbps
64kbps
ICASSP 2006
(DM
OS
)
ITU-T Subjective Test Results (Phase 2)ITU-T Subjective Test Results (Phase 2)
Music and Mixed Content (24 kbps)
0
20
40
60
80
100
G.722.1C AAC-LD AMR-WB+ eAAC+
ICASSP 2006
(MU
SH
RA
)
ITU-T Subjective Test Results (Phase 2)ITU-T Subjective Test Results (Phase 2)
Music and Mixed Content (32 kbps)
0
20
40
60
80
100
G.722.1C AAC-LD AMR-WB+ eAAC+
ICASSP 2006
(MU
SH
RA
)
ITU-T Subjective Test Results (Phase 2)ITU-T Subjective Test Results (Phase 2)
Music and Mixed Content (48 kbps)
0
20
40
60
80
100
G.722.1C AAC-LD AAC-LD at 64kbps
ICASSP 2006
(MU
SH
RA
)
ConclusionConclusion
G.722.1C met all performance requirementsG.722.1C met all performance requirements
Phase 1 (clean and noisy speech)Phase 1 (clean and noisy speech)- 24 kbit/s: - 24 kbit/s: BetterBetter than AAC-LD and than AAC-LD and Not WorseNot Worse than eAAC+ than eAAC+- 32 kbit/s: - 32 kbit/s: BetterBetter than AAC-LD, than AAC-LD, Not WorseNot Worse than eAAC+, and than eAAC+, and Not WorseNot Worse than AMR-WB+ in most of tests than AMR-WB+ in most of tests- 48 kbit/s: - 48 kbit/s: Not WorseNot Worse than AAC-LD at 48 and 64 kbit/s than AAC-LD at 48 and 64 kbit/s
Phase 2 (music and mixed content)Phase 2 (music and mixed content)- 24 kbit/s: - 24 kbit/s: BetterBetter than AAC-LD than AAC-LD- 32 kbit/s: - 32 kbit/s: BetterBetter than AAC-LD than AAC-LD- 48 kbit/s: - 48 kbit/s: BetterBetter than AAC-LD at 48 and 64 kbit/s than AAC-LD at 48 and 64 kbit/s
Executables, audio samples, and more information Executables, audio samples, and more information available at : available at : http://www.polycom.com/Siren14http://www.polycom.com/Siren14
ICASSP 2006
AcknowledgmentAcknowledgment
The authors would like to acknowledge Claude The authors would like to acknowledge Claude
Lamblin, ITU-T Q.10/SG16 Rapporteur, and Lamblin, ITU-T Q.10/SG16 Rapporteur, and
Catherine Quinquis, ITU-T Q.7/SG12 Catherine Quinquis, ITU-T Q.7/SG12
Rapporteur, for their great work guiding this Rapporteur, for their great work guiding this
project to a completion. In addition, the authors project to a completion. In addition, the authors
would like to thank the speech quality experts would like to thank the speech quality experts
and staff who performed the subjective and staff who performed the subjective
characterization tests at France Telecom. characterization tests at France Telecom.
ICASSP 2006