Upload
lemien
View
229
Download
1
Embed Size (px)
Citation preview
1 © NOKIA 2005
Speech Coding
Jari Hagqvist et al.Multimedia Technologies Laboratory
Nokia Research CenterTampere, Finland
2006
2 © NOKIA 2005
Outline• Introduction to speech coding• Speech Processing in GSM Mobiles • GSM Air Interface Overview• GSM Full Rate• GSM Half Rate• GSM Enhanced Full Rate• Adaptive Multi Rate (AMR) • AMR Wideband Codec• AMR-WB+ Audio Codec
3 © NOKIA 2005
What is coding for?• Coding = signal compression• Reduces the bit rate of a digital signal for efficient transport
or storage
• The (en)coder input is a digital signal (after A/D conversion)Speech codecs:
8 kHz sampling for narrowband (200-3400 Hz audio bandwidth, 128 kbit/s)
16 kHz sampling for wideband (50-7000 Hz audio bandwidth, 256 kbit/s)
Audio codecs: usually 44.1 or 48 kHz sampling (full audio band, 705.6 or 768
kbit/s/ch)
• The (en)coder output is a lower rate (smaller) digital signalSpeech codecs: 1 … 16 kbit/sAudio codecs: 16 … 256…1411 kbit/s
• The decoder reverses the encoding process and provides the original digital signal (or an approximation thereof)
EncoderBit- Communication
Channel orStorage Medium
DecoderStream
Bit-Stream
SynthesizedSignal
InputSignal
4 © NOKIA 2005
• Speech signals contain a lot of redundancy (repetitive waveforms, correlation)
• Speech codecs are used to pack the signal for efficient digital transmission or storage
• Codecs utilize speech signal properties and properties of the human hearing
• Usage areas for speech codecs:• Telephony (landline PSTN, mobile, VoIP, satellite)• Streaming• Video conferencing• MMS• Voice recording• Games, Toys
Why speech coding?
5 © NOKIA 2005
Human Speech Production
Lungs
PharynxCavity
OralCavity
Nasal Cavity
Glottis
Trachea
Larynx Tube Velum
MouthOutput
NoseOutput
6 © NOKIA 2005
Speech signal properties
0 0.5 1 1.5 2 2.5 3 3.5 4−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Time (s)
Am
plitu
de
“At twilight of the twelfth day…”
s….. a……
Unvoiced (noise) Voiced (periodical
7 © NOKIA 2005
Source-filter speech production model
g1
g2
Glottal PulseModel G(z)
Impulse TrainGenerator
∗
+
∗Random Noise
Generator
Vocal TractModel V(z)
RadiationModel R(z)
S(z)
I I I I I I …
Speechoutput
8 © NOKIA 2005
Generic speech coding approach1. Predict/model the input signal (spectrum shape)
(LPC analysis)2. Inverse filter (subtract) result from the input signal3. Code / further predict residual (r’) (LTP
analysis)4. Inverse filter from r’5. Residual r’’ processing / coding6. Quantize parameters / residual
-+ +
-LPCprediction
LTP (pitch)prediction
1.2. r’ r’’
3.4. Residual
coding
Quantization
Residual/parametersto decoder
5.
6.
Input speech
LPC parametersLTP parameters
9 © NOKIA 2005
• Also called short term prediction• Models the shape of the spectrum (formants)• Predict current sample s’(n) from previous p samples
Linear Prediction (LPC)
)(...)2()1()()(' 211
pnsansansaknsans p
p
kk −++−+−=−=∑
=
10 © NOKIA 2005
LPC spectrum
0 0.5 1 1.5 2 2.5 3 3.5 4−60
−50
−40
−30
−20
−10
0
10
20
30
Am
plitu
de (
dB)
Frequency (kHz)
Speech Spectrum6th Order LP10th Order LP14th Order LP
11 © NOKIA 2005
Result of LPC analysis
0 2 4 6 8 10 12 14 16 18 20−1
−0.5
0
0.5
1
Am
plitu
de
Time (ms)
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
Am
plitu
de
Time (ms)
0 0.5 1 1.5 2 2.5 3 3.5 420
40
60
80
100
Am
plitu
de (
dB)
Frequency (kHz)
Signal before LPC filtering
Residual after LPC filtering
Spectrum of LPC residual
12 © NOKIA 2005
Long Term Prediction (LTP)• Also called pitch prediction• Models the spectral fine structure• The LPC residual is still quite periodical (spikes)• The spikes correspond to the pitch frequency of
speech(glottal pulse excitation)
• Modeled usually with a FIR filter (delay D and gain g)
• Best delay found via correlation analysis+
-gD
LPC residual LTP residual
13 © NOKIA 2005
Speech coding methods• Pulse / multi-pulse excitation coders• Sinusoidal transform coding• Mixed excitation linear prediction (MELP)• CELP (Code Excited Linear Prediction)• ACELP (Algebraic CELP)
- dominant speech coding technology
14 © NOKIA 2005
Principle of CELP• Analysis-by-synthesis coding• Search the best codebook entry to maximize
correlation between synthesized speech and input speech
• Codebook with Gaussian noise or sparse pulsesInput speech
LTP synthesis
LPC synthesis Correlate
Maximizecorrelation
LTP analysis
LPC analysis
gain
Excitationcodebook
Codebook index
Synthesized speechLTP LPCparameters
15 © NOKIA 2005
Quality evaluation• Speech codecs distort the signal, so usage of simple objective
methods, such as SNR, is not feasible• Subjective listening testing practically the only way to evaluate
quality• Untrained listeners usually used (‘people from the street’)• MOS and DMOS tests the most common ones• MOS: speech samples played in random order. Scored
by the listeners with scale 1…5• DMOS: a high quality reference sample played first
followed by the test sample. Quality difference scored, scale 1…5
• Tests need to be designed carefully to maximize statistical reliability
• NRC Tampere has an internationally recognized test lab• New objective methods (e.g. PESQ) should be used with care
(not necessarily reliable)
17 © NOKIA 2005
Speech Processing in GSM Mobiles
A/D Speech CodingVAD/DTX
Channel CodingInterleaving
Burst Building Modulation (GMSK)
RF
UPLINK:
D/A Bad Frame HandlingSpeech Decoding
Comfort Noise
DemodulationDe-interleaving
Channel DecodingDOWNLINK:
Echo Cancellation
18 © NOKIA 2005
GSM Air Interface Overview• GSM is a digital mobile system standard operating on
900 and/or 1800 MHz bands in Europe, Asia and Africa. The North American version is a single frequency standard operating on 1900 MHz.
• GSM is a combination of FDMA and TDMA:- FDMA structure:
• 124 radio carriers with 200 kHz channelsEg. on 900 MHz: Uplink: 890 - 915 MHz (MS -> BTS)Downlink: 935 - 960 MHz (BTS -> MS)
- TDMA structure: • 8 bursts (users) per radio carrier
-> Efficient channel allocation 200 kHz / 8 = 25 kHz
19 © NOKIA 2005
GSM Air Interface Overview (cont.)• Consists of optimized (traffic) channels for speech and
data as well as a number of control channels• Interleaving is used to randomise the effect of transmission
errors. Bits will be spread over several TDMA frames.• Slow frequency hopping (transmitting consecutive bursts on
different frequencies) to protect against multipath fading• Gaussian minimum shift keying (GMSK) is used for
modulation into carrier• Discontinuous Transmission (DTX) for lower power
consumption and decreased interference
20 © NOKIA 2005
GSM Speech Traffic Channels• GSM has four (five) standardized speech traffic channels:
Full Rate (TCH/FS), gross rate 22.8 kbit/sHalf Rate (TCH/HS), gross rate 11.4 kbit/sEnhanced Full Rate (TCH/EFS), gross rate 22.8 kbit/sAdaptive Multi Rate (AMR) at full rate and half rate (TCH/AFS, TCH/AHS), gross rate 22.8 / 11.4
kbit/sAMR Wideband (AMR-WB) gross rate 22.8 kbit/s
• Each channel has an optimized channel coding (error protection) for the corresponding speech codec(s) used in that channel
• Unequal error protection: not all the speech bits are protected by channel coding
• Bad frame compensation by repeating and muting previous good frame
21 © NOKIA 2005
Speech Codecs in GSM• GSM Full Rate (FR, standardized in 1987)
- RPE-LTP (Regular Pulse Excitation-Long Term Prediction)
• GSM Half Rate (HR, 1994)- VSELP (Vector Sum Excited Linear Prediction)
• GSM Enhanced Full Rate (EFR, 1995)- ACELP (Algebraic Code Excited Linear Prediction)
• GSM Adaptive Multi Rate (AMR-NB, 1999)- ACELP
• Adaptive Multi Rate – Wideband (AMR-WB, 2001)- ACELP
23 © NOKIA 2005
GSM Full Rate Codec• 13.0 kbit/s RPE-LTP (Regular Pulse Excitation -
Long Term Prediction)• Basic methodology: residual excited linear
prediction • Short-term modelling/filtering with 8th order Linear
Predictive Coding (LPC). Inverse filtering employed in the encoder to form prediction error (residual).
• Long-term modelling/filtering with 1st order Long Term Prediction (LTP)
• Modelling of prediction error by Regular Pulse Excitation (RPE)
• RPE-LTP operates on 20 ms speech frames; for each frame 260 speech parameter bits are produced (13.0 kbit/s)
24 © NOKIA 2005
GSM FR Encoder
Offsetcompensation
Preemphasis
Auto-correlation
Segmentation
Log AreaRatios
Schurrecursion
Quantizer/coder
s 0s
0f
s
Preprocessing
ACF
r
LAR
Reflectioncoefficients
Inversefilter A(z)
LARdecoder
Inter-polation
s
LAR"
LAR'
r'
LARc
LPC analysisShort term
analysis filtering
LTPparameter
LTP parameterdecoder
Quantizer/coder
Nc
Xz
N-
d
d'
b N
bc
N' b'
+
+
d'
d"
Weightingfilter H(z)
e
Long term Prediction
d"
-
RPE gridselection
APCMquantizer
Mc
xm
InverseAPCM
xm c
x
maxc
RPE gridposition
xm
'
e'
RPE encoding
25 © NOKIA 2005
Bit Allocation of GSM FR
Bits per5 ms
Bits per20 ms
Bit-ratekbit/s
LPC filter 8 coefficients 36 1.8
LTP lag 7 28filter gain 2 8 1.8
Excitation gain 6 24signal pulse amplitudes 39 156
phase 2 8 9.4
Total 260 13.0
• The output bits are classified according to their subjective importance into three classes: 1A (50 bits), 1B (132 bits) and class 2 (78 bits)
26 © NOKIA 2005
GSM FR Channel Coding• For error detection a 3-bit CRC is
calculated over the 50 most important bits (Class 1A)
• Error correction coding of 9.8 kbit/s is added using a 1/2-rate convolutionalcoding for Class 1A and 1B bits (182 most sensitive bits)
• Interleaving over 8 TDMA frames
1B132
278
SPEECH
ENCODER
1/2-rateconvolutional
encoding378 bits
260 bits/20 ms
1A50
0000 tail bits
Inter-leavingfor 456bits
3 bitCRC
Bit ordering according to subjective importance
G(D) = D3+D+1
G0=1+D3+D4
G1=1+D+D3+D4
27 © NOKIA 2005
GSM FR Interleaving
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
57bits
Bursts1 2 3 4 5 6 7 8
Channel coded speech frame N
Channel coded speech frame N+1
Channel coded speech frame N+2
29 © NOKIA 2005
GSM Half Rate Codec• 5.6 kbit/s VSELP (Vector Sum Excited Linear Prediction)• Analysis-by-synthesis CELP (Code Excited Linear Prediction)
operating on 20 ms speech frames• Short-term modelling/filtering with 10th order LPC • Long term modelling/filtering (1st order) is employed as an adaptive
codebook using the analysis-by-synthesis approach with fractional lag-values and non-uniform resolution
• Excitation formed by analysis-by-synthesis approach from a linear combination of basis vectors (=> Vector Sum Excitation)
• Four modes used: 1 unvoiced mode and 3 voiced modes• Adaptive post enhancement filtering is applied in the decoder
30 © NOKIA 2005
GSM HR Bit Allocation
VOICED MODES 1,2,3 Bits per5 ms
Bits per20 ms
LPC filter 10 coefficients 28
Frame energy 5
Soft interpolation bit 1
Voicing mode 2
LTP filter lag 8,4,4,4 20(adaptivecodebook)
gain 5(joint
Excitation gain quantiser) 20(fixedcodebook)
codebook inxed 9 36
Total 112
UNVOICED MODE Bits per5 ms
Bits per20 ms
LPC f ilter 10 coeff icients 28
Frame energy 5
Soft interpolat ion bit 1
Voicing mode 2
Excitat ionsignal
codebook index 7 28
(1st f ixedcodebook)
gain 5(joint
Excitat ionsignal
gain quant iser) 20
(2nd f ixedcodebook)
codebook inxed 7 28
Total 112
31 © NOKIA 2005
GSM Half Rate Channel Coding• 5.8 kbit/s using 1/2-rate convolutional coding with 3-
bit CRC (CRC is protected by 1/3-rate convolutionalcode)
• Class 1A: 22 bits (convolutional code + 3 bit CRC)Class 1B: 73 bits (convolutional code)Class II: 17 bits (unprotected)
• Note: more bits are used for error protection than for source coding !
• Polynomials:1+D2+D3+D5+D6,
1+D+D4+D6,1+D+D2+D3+D4+D6
CRC: X3+X+1 (same as in full rate)• Interleaving depth: 4 (block length 57 bits)
33 © NOKIA 2005
GSM Enhanced Full Rate Codec• First codec with quality comparable to
landline phone quality (better than 32 kbit/sADPCM)
• 12.2 kbit/s ACELP (Algebraic Code Excited Linear Predictive Coding)
• CELP type coding with a fixed pulse codebook enabling a fast excitation search procedure
• The same codec as was earlier chosen as the US1 codec for the EFR channels in the PCS1900 system in North America
34 © NOKIA 2005
GSM EFR Block Diagramframe
windowingand
autocorrelationR[ ]
Levinson-Durbin
R[ ] A(z)
A(z)
LSPquantisation
update filtermemories fornext subframe
LP analysis Open-loop lag search Algebraic codebook
Filter memory
interpolation
subframesLSP A(z)
LSP
computeweightedspeech
(4 subframes)
findopen-loop lag
find best algebraicexcitation
adaptivecodebook
gain quantisation
computeexcitation
codeindex
subframe
computeimpulseresponseA(z)^
A(z)
h(n)h(n)
A(z)
(twice per frame)
A(z)
(twice per frame)
LSPindices
interpolationfor the 4subframesLSP
for the 4
x(n)
update
compute targetfor algebraic
codebook
Tol
compute targetfor adaptivecodebook
find best lagand gain
lagindex compute
adaptivecodebook
contribution
gainindex
2x (n)
gain indexfixed codebook
fixedcodebook
gain quantisation
search
high-pass filter
Pre-processing
inputspeechsamples
A(z)^
A(z)^
Closed-loop lag search
35 © NOKIA 2005
GSM EFR Details• Short-term analysis (Linear Prediction)
-10th order LP analysis twice for each 20 ms frame
- Two 30 ms asymmetric windows (no lookahead)
- Represented in LSPs (Line Spectral Pairs)- Joint predictive split matrix quantisation with
38 bits / 20 ms-frame
• Adaptive codebook (ACB)- Combined open-loop/closed-loop search
- open-loop lag search every 10 ms- closed-loop lag search every 5 ms (analysis-by-synthesis)
- Fractional lag with 1/6th resolution- Residual extended virtual lag for high pitch
voices
36 © NOKIA 2005
GSM EFR Details (cont.)• Fixed codebook (FCB) excitation search
- Algebraic codebook with 10 pulses / 5 ms subframe- Predefined interlaced sets of pulse positions- Non-exhaustive search with low complexity- Pitch sharpening to improve coding of high pitch
voices• Bad frame handling
- Reliable bad frame detection with 8-bit CRC- Partial replacement for parameters of received bad
frames- Muting adjusted dynamically according to channel
error conditions
37 © NOKIA 2005
GSM EFR Bit Allocation
Bits per20 ms
Bit-rate kbit/s
LPC filter 8 coefficients 38 1.9
Adaptive lag 30
codebook gain 16 2.3
Fixed gain 20
codebook index 140 8
Total 244 12.2
• An additional 0.8 kbit/s used for internal error protection (8-bit CRC and repetition coding) -> 13 kbit/s
38 © NOKIA 2005
EFR Channel Coding• The EFR codec was designed to use the GSM full
rate channel coding as is, with an additional 8-bit CRC for improved error detection
-> Only the speech codec needs to be updated in
upgrading from full rate to enhanced full ratenetworks
• This provides a fast time-to-market for the EFR system, with implementation advantages and minimal additional costs
• Some internal error protection (repetition) is added for more robust operation in channel errors
39 © NOKIA 2005
EFR Channel Error Performance
C/I = 13 dB: 2% channel bit error-rate (well inside a cell)C/I = 10 dB: 5% channel bit error-rate (inside a cell)C/I = 7 dB: 8% channel bit error-rate (at a cell boundary)
Channel Error Test (ACR, MOS)
1
1.5
2
2.5
3
3.5
4
4.5
Error-free C/I=13 dB C/I=10 dB C/I=7dB
MO
S
GSM EFR GSM FR ADPCM (error-free)
41 © NOKIA 2005
Adaptive Multi Rate Codec for GSM and 3G
• ETSI SMG11 and 3GPP TSG-SA WG4 have standardized the Adaptive Multi Rate (AMR) codec for GSM and 3G WCDMA systems
• AMR is a versatile codec "toolbox" operating at several bit rates for robust operation in mobile channels
• AMR has been selected as the mandatory speech codec for the 3G WCDMA system by 3GPP
• AMR is also the mandatory speech codec for 3G-H.324M video telephony
• For more information and specifications see: www.3gpp.org
ftp://ftp.3gpp.org/TSG_SA/WG4_CODEC/
42 © NOKIA 2005
AMR codec standardisation
• Technical work in ETSI SMG11 (GSM Speech and Quality Aspects)
• Feasibility study of the novel AMR codec concept during October 96 - October 97
• Standardisation program launched in October 97- Codec selection through competitive selection process - 11 candidate codecs, two selection phases- Joint Ericsson/Nokia/Siemens codec chosen in
October 98- Finalisation and characterisation of the standard by early
99• AMR standard approved in ETSI in February 99 and June 99• AMR adopted to 3rd generation WCDMA system in April 99
by 3GPP
43 © NOKIA 2005
What is AMR?• AMR contains a set of fixed-rate speech codec modes each of
which have a different error protection level (amount of channelcoding)
• The codec dynamically adapts its error protection level to the channel error and traffic conditions (link adaptation)
» Uses lower speech coding bit rate and more error protection in bad channel conditions
• This gives substantial improvements to the robustness against channel errors (especially in GSM)
• Also capasity benefits (eg. using the GSM half rate channel)
• Due to the fast power control in the 3G WCDMA system, the AMR link adaptation is not as useful in WCDMA as in GSM
44 © NOKIA 2005
AMR OperationCodec mode change
Good Bad
Speech quality
Good
Poor
Channel qualityA B C
• Link adaptation "switches" the to the best curve in A, B and C
45 © NOKIA 2005
AMR Codec Bit Rates• The AMR codec contains 8 source codecs:
12.2 kbit/s (=GSM EFR) FR channel
10.2 kbit/s FR7.95 kbit/s FR+HR7.40 kbit/s (=US-TDMA IS-641 EFR) FR+HR6.70 kbit/s (=PDC EFR) FR+HR5.90 kbit/s FR+HR5.15 kbit/s FR+HR4.75 kbit/s FR+HR
• 8 codec modes operate in the GSM FR channel and 6 in the HR channel
• All modes are used in 3G WCDMA
46 © NOKIA 2005
Channel Coding• In GSM each AMR mode has an optimized channel
codec for operation in Full Rate (22.8 kbit) and/or Half Rate (11.4 kbit) channels
• Recursive Systematic Convolutional (RSC) codes and 6-bit CRC used
• Mode bits are protected by a separate block code
• 3G WCDMA is based on generic channels (Layer 1 "toolbox"), hence mode specific channel coding cannot be used for AMR in 3G like in GSM
- Individual rate matching for the modes (by puncturing and repetition)
- Unequal error protection is implemented using differentRadio Access Bearers (RABs) with different QoSrequirements
47 © NOKIA 2005
Ratio of Speech and Channel Coding (GSM)
12.2
010
.60
10.2
012
.60
7.95
14.8
5
7.4
015
.40
6.7
01
6.10
5.90
16.9
0
5.1
51
7.65
4.75
18.0
5
7.95
3.45
7.40
4.00
6.7
04.
70
5.90
5.50
5.1
56.
25
4.75
6.65
0.0
5.0
10.0
15.0
20.0
25.0
Cha
nnel
Gro
ss B
it R
ate
[kbi
t/s]
FS 12.2
FS 10.2
FS 7.95
FS 7.40
FS 6.70
FS 5.90
FS 5.15
FS 4.75
HS 7.95
HS 7.40
HS 6.70
HS 5.90
HS 5.15
HS 4.75
Channel Mode [FS/HS] / Codec Mode [kbit/s]
ChannelcodingSpeechcoding
Full-Rate
Half-Rate
Channel Mode:Codec Mode:
48 © NOKIA 2005
Codec Mode Adaptation (GSM)• The codec mode adaptation chooses the optimum codec mode
as a function of channel quality (or eg. network load)
• The most robust mode is chosen in poor channel conditions, whilethe codec mode providing best clean-channel quality is chosen in good error conditions
• Codec mode adaptation based on channel quality measurements(C/I estimates) done in the mobile and network
• Based on the measurements a Codec Mode Command (over downlink to MS) or Codec Mode Request (over uplink to network) is sent over the air interface (using inband signaling)
• In GSM, the inband signaling supports a set of up to 4 codec modes (2 bits) selected at call set-up or handover. In 3G all modes can be used.
49 © NOKIA 2005
AMR Adaptation Example
0
5
10
15
20
25
0.0 1.4 2.8 4.2 5.5 6.9 8.3 9.7 11.1 12.5
Time [s]
C/I
[dB
]
C/I AMR Mode
12.2; GSM EFR
7.95
5.90
AM
R M
ode
[kbi
t/s]
50 © NOKIA 2005
AMR System Block Diagram (GSM)S - Speech DataQ - Channel QualityIn-Band Signalling:MC - Mode CommandMR - Mode RequestMI - Mode Indicator
SpeechIn
SpeechOut
Multi-Rate
SpeechEncoder
Multi-Rate
ChannelEncoder
DownlinkQuality
Estimator
Multi-Rate
ChannelDecoder
Multi-Rate
SpeechDecoder
MRd
MId
S
Qd
S
MRd
MIu
RadioChannel
(DOWNLINK)
RadioChannel(UPLINK)
Multi-Rate
ChannelDecoder
Multi-Rate
ChannelEncoder
S
S
MIu
MCu
MId
UplinkQuality
Estimator
S
Qu
MIu
S
MId
MRd
Abis/ter
Abis/ter
SpeechOut
Multi-Rate
SpeechDecoder
MIuS
MCd
Multi-Rate
SpeechEncoder
S
MId
Speech In
Mobile System (MS)
MCu
MCu
UplinkModeAdator
MCu
MRu
MRd
MIuS
TC
TransCoder (TC)Base Tranceiver Station (BTS)
Link
Ada
ptat
ion
Link
Ada
ptat
ion
u = Uplinkd = Downlink
DownlinkMode
Adaptor
MCd
51 © NOKIA 2005
AMR Speech Codec Key Features• Frame length: 20 ms with four subframes in all modes, 5 ms
lookahead (except in 12.2 kbit/s mode)
• LPC analysis: Four different LSF-tables; 38 bits (EFR), 27 bits, 26 bits (IS-641) and 23 bits
• Adaptive codebook: GSM EFR and IS-641 based open-loop/closed-loop search in all modes
• Fixed codebook: ACELP codebooks in all modes using 10, 8, 4, 3 and 2 pulses
• Quantisation: GSM EFR (scalar) or IS-641 (joint VQ) based codebook gain quantisation in all modes
• Post-processing: Formant postfilter with tilt compensation and 60 Hz HP filter in all modes (also EFR-type post processing of excitation elements included) . Anti-sparseness processing for 7.95 kbit/s and lower modes.
52 © NOKIA 2005
Comparison of AMR modesEncoder Decoder
Mode Pre-processing(20 ms)
LPC(20 ms)
Open-looppitch analysis
(10 ms)
Adaptivecodebook
(5 ms)
Fixedcodebook
(5ms)
Codebook gainquantisation
(5 ms)
Post-processing
(20 ms)
12.2 (GSM EFR) Two asymmetric windows.Split matrix quantisation (LSFswith 38 bits).No lookahead (5 ms "dummylookahead")
Resolution 1/6 [17 3/6- 94 3/6] and 1 [95-143].2nd and 4th subframe1/6 always, searchedaround previous lag.
10 pulsessearched in 5tracks.Quantised to 35bits.
Scalar quantisation with5 bits (fixed codebook)and 4 bits (adaptivecodebook) bits
10.2 8 pulsessearched in 4tracks.Quantised to 31bits.
Joint VQ with 7 bits
Short-term with tiltcompensation and 60Hz HP.
7.95 4 pulsessearched in 4tracks.Quantised to 17bits.
Scalar quantisation with5 and 4 bits
7.4 (DAMPSEFR)
Resolution 1/3 [19 1/3– 84 2/3] and 1 [85-143].2nd and 4th subframe1/3 always, searchedaround previous lag.
4 pulsessearched in 4tracks.Quantised to 17bits.
6.7 3 pulsessearched in 3tracks.Quantised to 14bits.
Joint VQ with 7 bits
5.9
Resolution 1/3 [19 1/3– 84 2/3] and 1 [85-143].2nd and 4th subframeinteger or 1/3,searched aroundprevious lag.
2 pulsessearched in 2tracks.Quantised to 11bits.
5.15Joint VQ with 6 bits
4.75
HP filter (80 Hz) in allmodes
One asymmetric windowVector quantisation (LSFs with38, 26, 27, 26, 26, 26, 23 and 23bits)5 ms lookahead
EFR-type with 3 lagranges.
Except in the 10.2mode which employsweighting ofcorrelation functionwith preference for lowlag values and valuesneighbouring previouslags.
Resolution 1/3 [19 1/3– 84 2/3] and 1 [85-143].2nd, 3rd and 4th
subframe integer or1/3, searched aroundprevious lag.
2 pulsessearched in 2tracks.Quantised to 9bits.
Joint VQ with 8 bitsover 10 ms.
+ anti-sparsenessprocessing
53 © NOKIA 2005
AMR codec complexity: WMOPS
*) "Complexity verification report of the AMR codec, Version: 2.0" Source:Alcatel, Philips, ST Microelectronics, Texas Instruments, Tdoc ETSI SMG11117/99 (SMG11#10, Sophia Antipolis, 3-7 May 1999)
**) AMR permanent document "Complexity and Delay Assessment", v1.2,ETSI SMG11 1998
Table I: Theoretical Worst Case complexity in wMOPS12.2 10.2 7.95 7.4 6.7 5.9 5.15 4.75 TWC*
AMRTWC**EFR
TWC**FR
TWC**HR
AMR source encoder 13.95 13.56 14.08 12.93 13.93 11.25 9.55 11.53 14.08 - - -
AMR source decoder 2.27 2.29 2.49 2.20 2.45 2.53 2.52 2.52 2.53 - - -
Total AMR source codec 16.22 15.85 16.58 15.12 16.37 13.77 12.07 14.05 16.61 15.21 2.95 18.47
(Calculatated from C-code implemented with ETSI TCH-HS basic operations.)
54 © NOKIA 2005
AMR codec complexity: RAM, ROM
*) "Complexity verification report of the AMR codec, Version: 2.0" Source:Alcatel, Philips, ST Microelectronics, Texas Instruments, Tdoc ETSI SMG11117/99 (SMG11#10, Sophia Antipolis, 3-7 May 1999)
**) AMR permanent document "Complexity and Delay Assessment", v1.2,ETSI SMG11 1998
Table II: RAM
Static
(16 bits words)
Dynamic
(16 bits words)
Total
(16 bits words)
AMR source encoder * 1429 3039 4468
AMR source decoder * 811 946 1757
Total AMR source codec * 2240 3039 5819
Total EFR source codec ** - - 4711
Total FR source codec ** - - 1201
Total HR source codec ** - - 4636
Table III: ROM
ROM(tables)
(16 bitswords)
Program ROM(source code size)
(# of operators)
Total AMR source codec 14343 4830
Total EFR source codec ** 5267 -
Total FR source codec ** 80 -
Total HR source codec ** 7881 -
55 © NOKIA 2005
Performance requirements: clean speech
GSM FR channel≥ 13 dB C/I: quality equal to EFR in error-free
channelAt 4 dB C/I: quality equal to EFR at 10 dB C/I
GSM HR channel≥ 16 dB C/I: quality equal to G.728 in error-free
channel< 16 dB C/I: quality equal to GSM FR codec
C/I FR requirement HR requirementNo Errors EFR No Errors G.728 No Errors
19 dB EFR No Errors G.728 No Errors16 dB EFR No Errors G.728 No Errors13 dB EFR No Errors FR at 13 dB10 dB G.728 No Errors FR at 10 dB7 dB G.728 No Errors FR at 7 dB4 dB EFR at 10 dB FR at 4 dB
(Clean speech = no background noise)
56 © NOKIA 2005
Performane requirements: speech in background noise
GSM FR channel≥ 13 dB C/I: quality equal to EFR in error-free channelAt 4 dB C/I: quality equal to FR at 10 dB C/I
GSM HR channel≥ 16 dB C/I: quality equal to G.729 and FR in error-free
channel< 16 dB C/I: quality equal to GSM FR codec
C/I FR requirement HR requirementNo Errors EFR No Errors EFR No Errors
19 dB EFR No Errors G.729 and FR No Errors16 dB EFR No Errors G.729 and FR No Errors13 dB EFR No Errors FR at 13 dB10 dB G.729 and FR No Errors FR at 10 dB7 dB G.729 and FR No Errors FR at 7 dB4 dB FR at 10 dB FR at 4 dB
57 © NOKIA 2005
AMR PerformancePerformance in GSM FR channel
At low errors (C/I ≥ 13dB): equivalent to EFR no errorsAt 4 dB C/I: → Still equivalent or close to EFR at 10 dB C/I (i.e. about 2 MOS
improvement over EFR), and → In background noise, equivalent or close to FR at 10 dB C/I
Performance in GSM HR channelAt low errors (C/I ≥ 16dB):→ Equivalent to G.728 (“wireline”), and → In background noise, equivalent to G.729/FR
At high errors:→ Equivalent to FR, and → In background noise, FR or somewhat lower
58 © NOKIA 2005
AMR FR for clean speech
1.0
2.0
3.0
4.0
5.0 MOS
RequirementAMR-FREFR
No Errors C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB
59 © NOKIA 2005
AMR FR for clean speech: performance curves for each codec mode
1.0
2.0
3.0
4.0
5.0MOS
EFR12.210.27.957.46.75.95.154.75
No Errors C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB
60 © NOKIA 2005
AMR FR for background noise: speech with 15 dB SNR car noise
1.0
2.0
3.0
4.0
5.0
No Errors C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB
DMOS
RequirementAMR-FREFRFRG.729
61 © NOKIA 2005
AMR HR for background noise: speech with 20 dB SNR office noise
1.0
2.0
3.0
4.0
5.0
No Errors C/I=19 dB C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB
DMOS
RequirementAMR-HREFRFRHR
62 © NOKIA 2005
AMR HR for clean speech
1.0
2.0
3.0
4.0
5.0MOS
RequirementAMR-HREFRFRHR
No Errors C/I=19 dB C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB
63 © NOKIA 2005
AMR Codec Performance Dynamic channel
• Dynamic channel test designed to evaluate AMR performance in realistic radio environment with codec adaptation turned on
Experiment 4a - Combined Test Results
Performance in Full Rate and Dynamic C/I Conditions
1.0
2.0
3.0
4.0
5.0
DEC1 DEC2 DEC3 DEC4 DEC5 DEC1+DTX
Test Condition
MOS
EFRAMR
Experiment 4b - Combined Test Results
Performance in Half Rate and Dynamic C/I Conditions
1.0
2.0
3.0
4.0
5.0
DEC1 DEC2 DEC3 DEC4 DEC5 DEC1+DTX
Test Condition
MOS
G.729/FRAMR
FR channel HR channel
64 © NOKIA 2005
AMR performance summary
• AMR provides substantial performance improvement over the previous GSM codecs in terms of error robustness and channel capacity.
• In the FR channel, • AMR provides error-free EFR quality down to 13 dB C/I,
and about 2 MOS improvement over EFR at 4 dB C/I• AMR extends the wireline quality operating region from
about C/I ≥ 10 dB in EFR to about C/I ≥ 4-7 dB• In typical dynamic error conditions AMR gives up to over 1
MOS improvement in speech quality (with adaptation on) • In the HR channel,
• AMR provides quality comparable to wireline down to 16 dB C/I, and at least FR quality for higher error-rates
66 © NOKIA 2005
AMR conclusions
• The GSM AMR codec standardisation was carried out during 1997-1999 as a competitive selection process involving several phases
• Joint Ericsson/Nokia/Siemens AMR codec selected
• AMR gives substantial improvement over the previous GSM codecs, in particular
in error robustness in the full-rate channelin providing high speech quality in the half-rate channel
• AMR provides high granularity of bit-rates between 12.2 and 4.75 kbit/swith seamless switching between modes
• Good performance and high granularity of bit-rates makes AMR attractive also for other systems and applications than GSM
• AMR has been adopted as the mandatory speech codec for 3rd generation WCDMA system
67 © NOKIA 2005
AMR in WCDMA
• Generic channel coding toolbox- Flexibility in design- Unequal or equal error protection
• Fast power control- No need for fast AMR link adaptation
• Adaptation used for optimising the cell capacity(When too much interference (users) drop all or part of the users to lower (= more robust) rates -> new users can be accomodated)
• Adaptation can also extend the cell radius(When users approach the cell limit, drop to
lower rates)
68 © NOKIA 2005
WCDMA generic channel coding toolbox
• Radio network operator has great flexibility to design the error protection scheme and QoS parameters
• Unequal or equal error protection is available (UEP / EEP)
- QoS, speech quality and capacity issues drive the selection between UEP and EEP
- UEP enables lower transmission power (higher capacity) with same QoS and speech quality to EEP
• Same channel coding algorithm for each AMR mode- Lower AMR modes provide higher capacity
69 © NOKIA 2005
Fast power control• Inner power control loop
- Controls the transmission power based on the channel quality
- Objective is to maintain the QoS (FER) parameters• Outer power control adjusts the AMR mode
- AMR mode is decreased if FER target is not achieved and transmission power is saturating
- Objective is to maintain the target capacity• Fast AMR link adaptation is not necessarily needed
- Power control minimises the effect of channel fading etc.
- Mode requests using out of band signalling
70 © NOKIA 2005
List of 3G AMR Speech Coding Specifications
TS 26.071 AMR speech Codec; General descriptionTS 26.073 AMR speech Codec; C-source codeTS 26.074 AMR speech Codec; Test sequencesTS 26.090 AMR speech Codec; Transcoding FunctionsTS 26.091 AMR speech Codec; Error concealment of lost framesTS 26.092 AMR speech Codec; comfort noise for AMR Speech Traffic ChannelsTS 26.093 AMR speech Codec; Source Controlled Rate operationTS 26.094 AMR Speech Codec; Voice Activity Detector for AMR Speech Traffic
ChannelsTS 26.101 AMR speech Codec; Frame StructureTS 26.102 AMR speech Codec; Interface to Iu and UuTS 26.103 Codec listsTS 26.104 AMR speech Codec; Floating point C-Code
• Available from www.3gpp.org
72 © NOKIA 2005
Why Wideband Speech ?• Superior speech quality over current
narrow band speech services-> Exceeds the quality of PSTN
phones• Current mobile systems are based on
narrowband speech (100-3600 Hz band)-> Important high frequency
components lost (eg. in 's'-sounds)
• Wideband uses 50 - 7000 Hz band -> Improved naturalness, presence
and intelligibility
• Especially suitable for applications with high quality audio parts
73 © NOKIA 2005
Wideband vs. Narrowband
Time0 0.5 1 1.5 2 2.5
0
1000
2000
3000
4000
5000
6000
7000
8000
Freq
uenc
y
Wide band0 - 8 kHz(no coding)
Narrow band 0 - 4 kHz(no coding)
74 © NOKIA 2005
AMR WB Standardization in 3GPP• Initiated based on feasibility study in ETSI SMG11 (2Q-
1999)• Initially 9 candidates, 5 companies were qualified into the
selection phase: Ericsson, FDNS-consortium (FT, DT, Nortel, Siemens), Motorola, Nokia and Texas Instruments
• Nokia WB codec selected as the best codec in 3GPP TSG S4 meeting in October 2000
• The final specifications were approved in March 2001 (Release 4)
• The AMR-WB codec has also been selected as the ITU WB codec G.722.2
-> First common codec between wireless and fixed networks
75 © NOKIA 2005
AMR WB Codec Components• The selected codec consists of following parts:
- Speech codec- VAD (integrated into the speech codec)- Comfort noise generation (using the same
number of bits (35) as the AMR NB Comfort noise)
- DTX-logic (same as used in the AMR NB)- Link adaptation (AMR NB methods can be
used)- 7 channel codecs for GSM FR channel (for
speech codec modes below 22.8 kbit/s)- Example channel codecs for EDGE FR/HR
channels (for all 9 speech codec modes)
76 © NOKIA 2005
Nokia AMR WB Speech Codec• Collaboration with VoiceAge (Univ. of Sherbrooke, Canada)
• ACELP technology very similar to AMR NB and GSM EFR
• Multirate codec: 9+1 modesSame coding algorithm in each mode
- Very high code and data ROM reuse between modes (much better than in the AMR NB codec)
• Link AdaptationSimilar functionality to AMR NB in GSMThe same Link adaptation can be applied
• AMR NB DTX-handling and DTX-frames are used
77 © NOKIA 2005
AMR-WB Technology• ACELP technology employed in AMR-NB and EFR is utilized• Multirate codec: 9+1 modes
Nine modes for speech (23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85, 6.6
kbit/s)One mode for comfort noise (DTX)
(1.75 kbit/s)
• Innovative two-band approach:50-6400 Hz coded with ACELP algorithm
(12.8 kHz sampling rate)6400-7000 Hz reconstructed based on lower band
parameters
78 © NOKIA 2005
AMR-WB block diagram
Decimation to 12.8 kHz
InputSpeech
LPanalysis
adaptivecodebook
search
algebraiccodebook
search
adaptivecodebook
reconstructionLP
synthesisalgebraiccodebook
reconstruction
+
SPEECHENCODER
SPEECHDECODER
+
16 kHzrandom
excitation
Higher banddecoder
High-passfiltering
Interpolationto 16 kHz
Lower bandparameters
Transmissionchannel
Higher bandgain
(23.85 mode)
ReconstructedSpeech
79 © NOKIA 2005
Nokia AMR WB Speech Codec
• Low rate modes for GSM FR channel:-> 6.60, 8.85, 12.65, 14.25 kbit/s(15.85, 18.25, 19.85 kbit/s, with more than 16kbs multiplexing in Abis)
- lowest rates are only used in bad channel conditions
• High bit rate modes for EDGE and WCDMA channels
-> 15.85, 18.25, 19.85, 23.05, 23.85 kbit/s
80 © NOKIA 2005
AMR WB Speech Quality vs. ITU G722 WB
3
23.85 23.05 19.85 18.25 15.85 14.25 12.65 8.85 6.6
AMR WB Bit rates kbit/s
Spee
ch Q
ualit
y
G722 64 kbit/sG722 56 kbit/sG722 48 kbit/sNokia AMR WB
81 © NOKIA 2005
Subjective Test Results
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Error-free C/I 13dB C/I 10dB C/I 7dB C/I 4dB Error-free Error-free Error-free
MO
S
AM
R-W
B 1
4.25
Req
uire
men
t
ITU
-T G
722
@ 6
4
ITU
-T G
722
@ 4
8
ITU
-T G
722
@ 5
6
Req
Req
uire
men
t
Req
uire
men
t
Req
uire
men
t
AM
R-W
B 1
4.25
AM
R-W
B 1
2.65
AM
R-W
B 8
.85
AM
R-W
B 6
.6
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Error-free FER 1.0%RBER 0.1%
Uplink
FER 1.0%RBER 0.1%
Downlink
Error-free Error-free Error-free
MO
S
ITU
-TG
722
@64
ITU
-T G
722
@ 5
6
ITU
-T G
722
@ 4
8
Req
uire
men
t
Req
uire
men
t
Req
uire
men
t
AM
R-W
B 2
3.05
AM
R-W
B 2
3.05
AM
R-W
B 2
3.05
• All the 160 3GPP speech quality requirements were met
• The high granularity of bit rates and mode adaptation enable high quality in erroneous channels
• 3GPP test results (in Japanese)indicate quality equal to G.722@64 kbit/s even in erroneous channels
• 3GPP test results (in English) indicate high quality of lower modes even on errors
83 © NOKIA 2005
Applications for Wide Band Speech• Wide band telephony
AMR NB quality equal to PSTN speech qualityAMR WB improves the quality and provides naturalness
• Conferencing (conversational multimedia)Quality improvement over the current codecs (G.722 at 48 and
56 kbit/s)Bit rate drops to half or less compared to G.722
• StreamingLow complexity, low bit rate solution for browsing type of
applications
84 © NOKIA 2005
AMR WB applications
Kbits/sKbits/s
GSMGSMFS channelFS channel
6.60 6.60 -- 14.25 (19.85) kbit/s14.25 (19.85) kbit/s
6.06.0 12.012.0 16.016.0 24.024.0
UTRANUTRANWCDMA channelWCDMA channel
15.85 15.85 -- 23.85 kbit/s23.85 kbit/s
ITUITU ITUITU
Speech qualitySpeech quality
EDGE EDGE FR/HR channelsFR/HR channels
15.85 15.85 -- 23.85 kbit/s23.85 kbit/s
VoIPVoIP
85 © NOKIA 2005
Introduction of AMR WB into existing and future systems
• 16 kHz sampling frequency in A/D and D/A • Acoustic design of handset
• Other modifications are similar to changes required when adopting a new narrowband codec
• Circuit switched wide band speech serviceFor existing GSM FR, EDGE, WCDMARequires Tandem Free Operation (TFO)
• Dedicated packet switched speech service
• Packet switched conversational multimedia to be standardized
86 © NOKIA 2005
Complexity of Nokia AMR WB Speech Codec
• Algorithmic delay 25 ms = AMR NB• No common parts with AMR NB
AMRWB
AMRNB
WB / NB[%]
WMOPS 35.4 16.75 ~ 200 %Data RAM 6.42 5.28 ~ 120 %Data ROM 9.94 14.57 ~ 70 %ProgramROM (ETSIbasicops)
3771 4851 ~ 80 %
87 © NOKIA 2005
Complexity of Nokia AMR WB channel Codec (GSM FR only)
AMRWB
AMRNB
WB / NB[%]
WMOPS 3.45 5.20 ~ 65 %Data RAM 2.88 2.43 ~ 120 %Data ROM 3.18 4.28 ~ 75 %ProgramROM (ETSIbasicops)
579 1366 ~ 40 %
88 © NOKIA 2005
Nokia AMR WB Speech Codec
Bit rates with importance classification- Class A subjectively most important- Bit errors in class A => frame error concealment
Mode Class A Class B Total(bits/frame)
Bit-rate(kbit/s)
0 54 78 132 6.601 64 113 177 8.852 72 181 253 12.653 72 213 285 14.254 72 245 317 15.855 72 293 365 18.256 72 325 397 19.857 72 389 461 23.058 72 405 477 23.85
DTX 35 0 35 -
89 © NOKIA 2005
Bitrate Allocation Between Speech and Channel Coding
19.8
52.
95
18.2
54.
55
15.8
56.
95
14.2
58.
55
12.6
510
.15
8.85
13.9
5
6.60
16.2
0
0.0
5.0
10.0
15.0
20.0
25.0
Bit R
ate
[kbi
t/s]
6 5 4 3 2 1 0Codec Mode
GMSK FS channel
ChannelcodingSpeechcoding
90 © NOKIA 2005
Conclusions• Nokia’s AMR WB codec standardised by 3GPP and ITU-T• Bit-exact fixed-point C-code for AMR WB codec is already
available for implementation (still subject to small changes)• AMR WB codec offers superior voice quality over the existing
narrow band services (cellular systems, PSTN)• High quality wide band speech service can be offered with bit
rates ranging from 12.65 up to 23.85 kbit/s
- The voice quality improvement is possible even in the existing GSM FR traffic channel with 16 kbit/s submultiplexing in Abis (12.65, 14.25 kbit/s)
- The best voice quality can be offered in EDGE FR/HR and WCDMA channels (15.85 - 23.85 kbit/s)
For specifications and C-code see: www.3gpp.orgwww.3gpp.org
91 © NOKIA 2005
3GPP AMR-WB Specifications26.171
AMR speech codec, wideband; General description
26.173
ANSI-C code for the Adaptive Multi-Rate - Wideband (AMR-W) speech codec
26.174
AMR speech codec, wideband; Test sequences
26.190
Mandatory Speech Codec speech processing functions AMR Wideband speech codec; Transcoding functions
26.191
AMR speech codec, wideband; Error concealment of lost frames
26.192
Mandatory Speech Codec speech processing functions AMR Wideband Speech Codec; Comfort noise aspects
26.193
AMR speech codec, wideband; Source Controlled Rate operation
26.194
Mandatory Speech Codec speech processing functions AMR Wideband speech codec; Voice Activity Detector (VAD)
26.201
AMR speech codec, wideband; Frame structure
26.202
AMR speech codec, wideband; Interface to Iu and Uu
26.204 ANSI-C code for the floating-point Adaptive Multi-Rate - Wideband (AMR-W) speech codec