91
1 © NOKIA 2005 Speech Coding Jari Hagqvist et al. Multimedia Technologies Laboratory Nokia Research Center Tampere, Finland 2006

Speech Coding - TUT Coding Jari Hagqvist et al. Multimedia Technologies Laboratory Nokia Research Center Tampere, Finland ... • Adaptive Multi Rate – Wideband (AMR-WB, 2001

  • Upload
    lemien

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

1 © NOKIA 2005

Speech Coding

Jari Hagqvist et al.Multimedia Technologies Laboratory

Nokia Research CenterTampere, Finland

2006

2 © NOKIA 2005

Outline• Introduction to speech coding• Speech Processing in GSM Mobiles • GSM Air Interface Overview• GSM Full Rate• GSM Half Rate• GSM Enhanced Full Rate• Adaptive Multi Rate (AMR) • AMR Wideband Codec• AMR-WB+ Audio Codec

3 © NOKIA 2005

What is coding for?• Coding = signal compression• Reduces the bit rate of a digital signal for efficient transport

or storage

• The (en)coder input is a digital signal (after A/D conversion)Speech codecs:

8 kHz sampling for narrowband (200-3400 Hz audio bandwidth, 128 kbit/s)

16 kHz sampling for wideband (50-7000 Hz audio bandwidth, 256 kbit/s)

Audio codecs: usually 44.1 or 48 kHz sampling (full audio band, 705.6 or 768

kbit/s/ch)

• The (en)coder output is a lower rate (smaller) digital signalSpeech codecs: 1 … 16 kbit/sAudio codecs: 16 … 256…1411 kbit/s

• The decoder reverses the encoding process and provides the original digital signal (or an approximation thereof)

EncoderBit- Communication

Channel orStorage Medium

DecoderStream

Bit-Stream

SynthesizedSignal

InputSignal

4 © NOKIA 2005

• Speech signals contain a lot of redundancy (repetitive waveforms, correlation)

• Speech codecs are used to pack the signal for efficient digital transmission or storage

• Codecs utilize speech signal properties and properties of the human hearing

• Usage areas for speech codecs:• Telephony (landline PSTN, mobile, VoIP, satellite)• Streaming• Video conferencing• MMS• Voice recording• Games, Toys

Why speech coding?

5 © NOKIA 2005

Human Speech Production

Lungs

PharynxCavity

OralCavity

Nasal Cavity

Glottis

Trachea

Larynx Tube Velum

MouthOutput

NoseOutput

6 © NOKIA 2005

Speech signal properties

0 0.5 1 1.5 2 2.5 3 3.5 4−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (s)

Am

plitu

de

“At twilight of the twelfth day…”

s….. a……

Unvoiced (noise) Voiced (periodical

7 © NOKIA 2005

Source-filter speech production model

g1

g2

Glottal PulseModel G(z)

Impulse TrainGenerator

+

∗Random Noise

Generator

Vocal TractModel V(z)

RadiationModel R(z)

S(z)

I I I I I I …

Speechoutput

8 © NOKIA 2005

Generic speech coding approach1. Predict/model the input signal (spectrum shape)

(LPC analysis)2. Inverse filter (subtract) result from the input signal3. Code / further predict residual (r’) (LTP

analysis)4. Inverse filter from r’5. Residual r’’ processing / coding6. Quantize parameters / residual

-+ +

-LPCprediction

LTP (pitch)prediction

1.2. r’ r’’

3.4. Residual

coding

Quantization

Residual/parametersto decoder

5.

6.

Input speech

LPC parametersLTP parameters

9 © NOKIA 2005

• Also called short term prediction• Models the shape of the spectrum (formants)• Predict current sample s’(n) from previous p samples

Linear Prediction (LPC)

)(...)2()1()()(' 211

pnsansansaknsans p

p

kk −++−+−=−=∑

=

10 © NOKIA 2005

LPC spectrum

0 0.5 1 1.5 2 2.5 3 3.5 4−60

−50

−40

−30

−20

−10

0

10

20

30

Am

plitu

de (

dB)

Frequency (kHz)

Speech Spectrum6th Order LP10th Order LP14th Order LP

11 © NOKIA 2005

Result of LPC analysis

0 2 4 6 8 10 12 14 16 18 20−1

−0.5

0

0.5

1

Am

plitu

de

Time (ms)

0 2 4 6 8 10 12 14 16 18 20−0.2

0

0.2

0.4

0.6

Am

plitu

de

Time (ms)

0 0.5 1 1.5 2 2.5 3 3.5 420

40

60

80

100

Am

plitu

de (

dB)

Frequency (kHz)

Signal before LPC filtering

Residual after LPC filtering

Spectrum of LPC residual

12 © NOKIA 2005

Long Term Prediction (LTP)• Also called pitch prediction• Models the spectral fine structure• The LPC residual is still quite periodical (spikes)• The spikes correspond to the pitch frequency of

speech(glottal pulse excitation)

• Modeled usually with a FIR filter (delay D and gain g)

• Best delay found via correlation analysis+

-gD

LPC residual LTP residual

13 © NOKIA 2005

Speech coding methods• Pulse / multi-pulse excitation coders• Sinusoidal transform coding• Mixed excitation linear prediction (MELP)• CELP (Code Excited Linear Prediction)• ACELP (Algebraic CELP)

- dominant speech coding technology

14 © NOKIA 2005

Principle of CELP• Analysis-by-synthesis coding• Search the best codebook entry to maximize

correlation between synthesized speech and input speech

• Codebook with Gaussian noise or sparse pulsesInput speech

LTP synthesis

LPC synthesis Correlate

Maximizecorrelation

LTP analysis

LPC analysis

gain

Excitationcodebook

Codebook index

Synthesized speechLTP LPCparameters

15 © NOKIA 2005

Quality evaluation• Speech codecs distort the signal, so usage of simple objective

methods, such as SNR, is not feasible• Subjective listening testing practically the only way to evaluate

quality• Untrained listeners usually used (‘people from the street’)• MOS and DMOS tests the most common ones• MOS: speech samples played in random order. Scored

by the listeners with scale 1…5• DMOS: a high quality reference sample played first

followed by the test sample. Quality difference scored, scale 1…5

• Tests need to be designed carefully to maximize statistical reliability

• NRC Tampere has an internationally recognized test lab• New objective methods (e.g. PESQ) should be used with care

(not necessarily reliable)

16 © NOKIA 2005

Major speech coding standardsbelow 15 kbit/s

17 © NOKIA 2005

Speech Processing in GSM Mobiles

A/D Speech CodingVAD/DTX

Channel CodingInterleaving

Burst Building Modulation (GMSK)

RF

UPLINK:

D/A Bad Frame HandlingSpeech Decoding

Comfort Noise

DemodulationDe-interleaving

Channel DecodingDOWNLINK:

Echo Cancellation

18 © NOKIA 2005

GSM Air Interface Overview• GSM is a digital mobile system standard operating on

900 and/or 1800 MHz bands in Europe, Asia and Africa. The North American version is a single frequency standard operating on 1900 MHz.

• GSM is a combination of FDMA and TDMA:- FDMA structure:

• 124 radio carriers with 200 kHz channelsEg. on 900 MHz: Uplink: 890 - 915 MHz (MS -> BTS)Downlink: 935 - 960 MHz (BTS -> MS)

- TDMA structure: • 8 bursts (users) per radio carrier

-> Efficient channel allocation 200 kHz / 8 = 25 kHz

19 © NOKIA 2005

GSM Air Interface Overview (cont.)• Consists of optimized (traffic) channels for speech and

data as well as a number of control channels• Interleaving is used to randomise the effect of transmission

errors. Bits will be spread over several TDMA frames.• Slow frequency hopping (transmitting consecutive bursts on

different frequencies) to protect against multipath fading• Gaussian minimum shift keying (GMSK) is used for

modulation into carrier• Discontinuous Transmission (DTX) for lower power

consumption and decreased interference

20 © NOKIA 2005

GSM Speech Traffic Channels• GSM has four (five) standardized speech traffic channels:

Full Rate (TCH/FS), gross rate 22.8 kbit/sHalf Rate (TCH/HS), gross rate 11.4 kbit/sEnhanced Full Rate (TCH/EFS), gross rate 22.8 kbit/sAdaptive Multi Rate (AMR) at full rate and half rate (TCH/AFS, TCH/AHS), gross rate 22.8 / 11.4

kbit/sAMR Wideband (AMR-WB) gross rate 22.8 kbit/s

• Each channel has an optimized channel coding (error protection) for the corresponding speech codec(s) used in that channel

• Unequal error protection: not all the speech bits are protected by channel coding

• Bad frame compensation by repeating and muting previous good frame

21 © NOKIA 2005

Speech Codecs in GSM• GSM Full Rate (FR, standardized in 1987)

- RPE-LTP (Regular Pulse Excitation-Long Term Prediction)

• GSM Half Rate (HR, 1994)- VSELP (Vector Sum Excited Linear Prediction)

• GSM Enhanced Full Rate (EFR, 1995)- ACELP (Algebraic Code Excited Linear Prediction)

• GSM Adaptive Multi Rate (AMR-NB, 1999)- ACELP

• Adaptive Multi Rate – Wideband (AMR-WB, 2001)- ACELP

22 © NOKIA 2005

GSM Full-Rate (FR) Codec

23 © NOKIA 2005

GSM Full Rate Codec• 13.0 kbit/s RPE-LTP (Regular Pulse Excitation -

Long Term Prediction)• Basic methodology: residual excited linear

prediction • Short-term modelling/filtering with 8th order Linear

Predictive Coding (LPC). Inverse filtering employed in the encoder to form prediction error (residual).

• Long-term modelling/filtering with 1st order Long Term Prediction (LTP)

• Modelling of prediction error by Regular Pulse Excitation (RPE)

• RPE-LTP operates on 20 ms speech frames; for each frame 260 speech parameter bits are produced (13.0 kbit/s)

24 © NOKIA 2005

GSM FR Encoder

Offsetcompensation

Preemphasis

Auto-correlation

Segmentation

Log AreaRatios

Schurrecursion

Quantizer/coder

s 0s

0f

s

Preprocessing

ACF

r

LAR

Reflectioncoefficients

Inversefilter A(z)

LARdecoder

Inter-polation

s

LAR"

LAR'

r'

LARc

LPC analysisShort term

analysis filtering

LTPparameter

LTP parameterdecoder

Quantizer/coder

Nc

Xz

N-

d

d'

b N

bc

N' b'

+

+

d'

d"

Weightingfilter H(z)

e

Long term Prediction

d"

-

RPE gridselection

APCMquantizer

Mc

xm

InverseAPCM

xm c

x

maxc

RPE gridposition

xm

'

e'

RPE encoding

25 © NOKIA 2005

Bit Allocation of GSM FR

Bits per5 ms

Bits per20 ms

Bit-ratekbit/s

LPC filter 8 coefficients 36 1.8

LTP lag 7 28filter gain 2 8 1.8

Excitation gain 6 24signal pulse amplitudes 39 156

phase 2 8 9.4

Total 260 13.0

• The output bits are classified according to their subjective importance into three classes: 1A (50 bits), 1B (132 bits) and class 2 (78 bits)

26 © NOKIA 2005

GSM FR Channel Coding• For error detection a 3-bit CRC is

calculated over the 50 most important bits (Class 1A)

• Error correction coding of 9.8 kbit/s is added using a 1/2-rate convolutionalcoding for Class 1A and 1B bits (182 most sensitive bits)

• Interleaving over 8 TDMA frames

1B132

278

SPEECH

ENCODER

1/2-rateconvolutional

encoding378 bits

260 bits/20 ms

1A50

0000 tail bits

Inter-leavingfor 456bits

3 bitCRC

Bit ordering according to subjective importance

G(D) = D3+D+1

G0=1+D3+D4

G1=1+D+D3+D4

27 © NOKIA 2005

GSM FR Interleaving

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

57bits

Bursts1 2 3 4 5 6 7 8

Channel coded speech frame N

Channel coded speech frame N+1

Channel coded speech frame N+2

28 © NOKIA 2005

GSM Half-Rate (HR) Codec

29 © NOKIA 2005

GSM Half Rate Codec• 5.6 kbit/s VSELP (Vector Sum Excited Linear Prediction)• Analysis-by-synthesis CELP (Code Excited Linear Prediction)

operating on 20 ms speech frames• Short-term modelling/filtering with 10th order LPC • Long term modelling/filtering (1st order) is employed as an adaptive

codebook using the analysis-by-synthesis approach with fractional lag-values and non-uniform resolution

• Excitation formed by analysis-by-synthesis approach from a linear combination of basis vectors (=> Vector Sum Excitation)

• Four modes used: 1 unvoiced mode and 3 voiced modes• Adaptive post enhancement filtering is applied in the decoder

30 © NOKIA 2005

GSM HR Bit Allocation

VOICED MODES 1,2,3 Bits per5 ms

Bits per20 ms

LPC filter 10 coefficients 28

Frame energy 5

Soft interpolation bit 1

Voicing mode 2

LTP filter lag 8,4,4,4 20(adaptivecodebook)

gain 5(joint

Excitation gain quantiser) 20(fixedcodebook)

codebook inxed 9 36

Total 112

UNVOICED MODE Bits per5 ms

Bits per20 ms

LPC f ilter 10 coeff icients 28

Frame energy 5

Soft interpolat ion bit 1

Voicing mode 2

Excitat ionsignal

codebook index 7 28

(1st f ixedcodebook)

gain 5(joint

Excitat ionsignal

gain quant iser) 20

(2nd f ixedcodebook)

codebook inxed 7 28

Total 112

31 © NOKIA 2005

GSM Half Rate Channel Coding• 5.8 kbit/s using 1/2-rate convolutional coding with 3-

bit CRC (CRC is protected by 1/3-rate convolutionalcode)

• Class 1A: 22 bits (convolutional code + 3 bit CRC)Class 1B: 73 bits (convolutional code)Class II: 17 bits (unprotected)

• Note: more bits are used for error protection than for source coding !

• Polynomials:1+D2+D3+D5+D6,

1+D+D4+D6,1+D+D2+D3+D4+D6

CRC: X3+X+1 (same as in full rate)• Interleaving depth: 4 (block length 57 bits)

32 © NOKIA 2005

GSM Enhanced Full-Rate (EFR) Codec

33 © NOKIA 2005

GSM Enhanced Full Rate Codec• First codec with quality comparable to

landline phone quality (better than 32 kbit/sADPCM)

• 12.2 kbit/s ACELP (Algebraic Code Excited Linear Predictive Coding)

• CELP type coding with a fixed pulse codebook enabling a fast excitation search procedure

• The same codec as was earlier chosen as the US1 codec for the EFR channels in the PCS1900 system in North America

34 © NOKIA 2005

GSM EFR Block Diagramframe

windowingand

autocorrelationR[ ]

Levinson-Durbin

R[ ] A(z)

A(z)

LSPquantisation

update filtermemories fornext subframe

LP analysis Open-loop lag search Algebraic codebook

Filter memory

interpolation

subframesLSP A(z)

LSP

computeweightedspeech

(4 subframes)

findopen-loop lag

find best algebraicexcitation

adaptivecodebook

gain quantisation

computeexcitation

codeindex

subframe

computeimpulseresponseA(z)^

A(z)

h(n)h(n)

A(z)

(twice per frame)

A(z)

(twice per frame)

LSPindices

interpolationfor the 4subframesLSP

for the 4

x(n)

update

compute targetfor algebraic

codebook

Tol

compute targetfor adaptivecodebook

find best lagand gain

lagindex compute

adaptivecodebook

contribution

gainindex

2x (n)

gain indexfixed codebook

fixedcodebook

gain quantisation

search

high-pass filter

Pre-processing

inputspeechsamples

A(z)^

A(z)^

Closed-loop lag search

35 © NOKIA 2005

GSM EFR Details• Short-term analysis (Linear Prediction)

-10th order LP analysis twice for each 20 ms frame

- Two 30 ms asymmetric windows (no lookahead)

- Represented in LSPs (Line Spectral Pairs)- Joint predictive split matrix quantisation with

38 bits / 20 ms-frame

• Adaptive codebook (ACB)- Combined open-loop/closed-loop search

- open-loop lag search every 10 ms- closed-loop lag search every 5 ms (analysis-by-synthesis)

- Fractional lag with 1/6th resolution- Residual extended virtual lag for high pitch

voices

36 © NOKIA 2005

GSM EFR Details (cont.)• Fixed codebook (FCB) excitation search

- Algebraic codebook with 10 pulses / 5 ms subframe- Predefined interlaced sets of pulse positions- Non-exhaustive search with low complexity- Pitch sharpening to improve coding of high pitch

voices• Bad frame handling

- Reliable bad frame detection with 8-bit CRC- Partial replacement for parameters of received bad

frames- Muting adjusted dynamically according to channel

error conditions

37 © NOKIA 2005

GSM EFR Bit Allocation

Bits per20 ms

Bit-rate kbit/s

LPC filter 8 coefficients 38 1.9

Adaptive lag 30

codebook gain 16 2.3

Fixed gain 20

codebook index 140 8

Total 244 12.2

• An additional 0.8 kbit/s used for internal error protection (8-bit CRC and repetition coding) -> 13 kbit/s

38 © NOKIA 2005

EFR Channel Coding• The EFR codec was designed to use the GSM full

rate channel coding as is, with an additional 8-bit CRC for improved error detection

-> Only the speech codec needs to be updated in

upgrading from full rate to enhanced full ratenetworks

• This provides a fast time-to-market for the EFR system, with implementation advantages and minimal additional costs

• Some internal error protection (repetition) is added for more robust operation in channel errors

39 © NOKIA 2005

EFR Channel Error Performance

C/I = 13 dB: 2% channel bit error-rate (well inside a cell)C/I = 10 dB: 5% channel bit error-rate (inside a cell)C/I = 7 dB: 8% channel bit error-rate (at a cell boundary)

Channel Error Test (ACR, MOS)

1

1.5

2

2.5

3

3.5

4

4.5

Error-free C/I=13 dB C/I=10 dB C/I=7dB

MO

S

GSM EFR GSM FR ADPCM (error-free)

40 © NOKIA 2005

Adaptive Multi-Rate (AMR) Speech Codec

41 © NOKIA 2005

Adaptive Multi Rate Codec for GSM and 3G

• ETSI SMG11 and 3GPP TSG-SA WG4 have standardized the Adaptive Multi Rate (AMR) codec for GSM and 3G WCDMA systems

• AMR is a versatile codec "toolbox" operating at several bit rates for robust operation in mobile channels

• AMR has been selected as the mandatory speech codec for the 3G WCDMA system by 3GPP

• AMR is also the mandatory speech codec for 3G-H.324M video telephony

• For more information and specifications see: www.3gpp.org

ftp://ftp.3gpp.org/TSG_SA/WG4_CODEC/

42 © NOKIA 2005

AMR codec standardisation

• Technical work in ETSI SMG11 (GSM Speech and Quality Aspects)

• Feasibility study of the novel AMR codec concept during October 96 - October 97

• Standardisation program launched in October 97- Codec selection through competitive selection process - 11 candidate codecs, two selection phases- Joint Ericsson/Nokia/Siemens codec chosen in

October 98- Finalisation and characterisation of the standard by early

99• AMR standard approved in ETSI in February 99 and June 99• AMR adopted to 3rd generation WCDMA system in April 99

by 3GPP

43 © NOKIA 2005

What is AMR?• AMR contains a set of fixed-rate speech codec modes each of

which have a different error protection level (amount of channelcoding)

• The codec dynamically adapts its error protection level to the channel error and traffic conditions (link adaptation)

» Uses lower speech coding bit rate and more error protection in bad channel conditions

• This gives substantial improvements to the robustness against channel errors (especially in GSM)

• Also capasity benefits (eg. using the GSM half rate channel)

• Due to the fast power control in the 3G WCDMA system, the AMR link adaptation is not as useful in WCDMA as in GSM

44 © NOKIA 2005

AMR OperationCodec mode change

Good Bad

Speech quality

Good

Poor

Channel qualityA B C

• Link adaptation "switches" the to the best curve in A, B and C

45 © NOKIA 2005

AMR Codec Bit Rates• The AMR codec contains 8 source codecs:

12.2 kbit/s (=GSM EFR) FR channel

10.2 kbit/s FR7.95 kbit/s FR+HR7.40 kbit/s (=US-TDMA IS-641 EFR) FR+HR6.70 kbit/s (=PDC EFR) FR+HR5.90 kbit/s FR+HR5.15 kbit/s FR+HR4.75 kbit/s FR+HR

• 8 codec modes operate in the GSM FR channel and 6 in the HR channel

• All modes are used in 3G WCDMA

46 © NOKIA 2005

Channel Coding• In GSM each AMR mode has an optimized channel

codec for operation in Full Rate (22.8 kbit) and/or Half Rate (11.4 kbit) channels

• Recursive Systematic Convolutional (RSC) codes and 6-bit CRC used

• Mode bits are protected by a separate block code

• 3G WCDMA is based on generic channels (Layer 1 "toolbox"), hence mode specific channel coding cannot be used for AMR in 3G like in GSM

- Individual rate matching for the modes (by puncturing and repetition)

- Unequal error protection is implemented using differentRadio Access Bearers (RABs) with different QoSrequirements

47 © NOKIA 2005

Ratio of Speech and Channel Coding (GSM)

12.2

010

.60

10.2

012

.60

7.95

14.8

5

7.4

015

.40

6.7

01

6.10

5.90

16.9

0

5.1

51

7.65

4.75

18.0

5

7.95

3.45

7.40

4.00

6.7

04.

70

5.90

5.50

5.1

56.

25

4.75

6.65

0.0

5.0

10.0

15.0

20.0

25.0

Cha

nnel

Gro

ss B

it R

ate

[kbi

t/s]

FS 12.2

FS 10.2

FS 7.95

FS 7.40

FS 6.70

FS 5.90

FS 5.15

FS 4.75

HS 7.95

HS 7.40

HS 6.70

HS 5.90

HS 5.15

HS 4.75

Channel Mode [FS/HS] / Codec Mode [kbit/s]

ChannelcodingSpeechcoding

Full-Rate

Half-Rate

Channel Mode:Codec Mode:

48 © NOKIA 2005

Codec Mode Adaptation (GSM)• The codec mode adaptation chooses the optimum codec mode

as a function of channel quality (or eg. network load)

• The most robust mode is chosen in poor channel conditions, whilethe codec mode providing best clean-channel quality is chosen in good error conditions

• Codec mode adaptation based on channel quality measurements(C/I estimates) done in the mobile and network

• Based on the measurements a Codec Mode Command (over downlink to MS) or Codec Mode Request (over uplink to network) is sent over the air interface (using inband signaling)

• In GSM, the inband signaling supports a set of up to 4 codec modes (2 bits) selected at call set-up or handover. In 3G all modes can be used.

49 © NOKIA 2005

AMR Adaptation Example

0

5

10

15

20

25

0.0 1.4 2.8 4.2 5.5 6.9 8.3 9.7 11.1 12.5

Time [s]

C/I

[dB

]

C/I AMR Mode

12.2; GSM EFR

7.95

5.90

AM

R M

ode

[kbi

t/s]

50 © NOKIA 2005

AMR System Block Diagram (GSM)S - Speech DataQ - Channel QualityIn-Band Signalling:MC - Mode CommandMR - Mode RequestMI - Mode Indicator

SpeechIn

SpeechOut

Multi-Rate

SpeechEncoder

Multi-Rate

ChannelEncoder

DownlinkQuality

Estimator

Multi-Rate

ChannelDecoder

Multi-Rate

SpeechDecoder

MRd

MId

S

Qd

S

MRd

MIu

RadioChannel

(DOWNLINK)

RadioChannel(UPLINK)

Multi-Rate

ChannelDecoder

Multi-Rate

ChannelEncoder

S

S

MIu

MCu

MId

UplinkQuality

Estimator

S

Qu

MIu

S

MId

MRd

Abis/ter

Abis/ter

SpeechOut

Multi-Rate

SpeechDecoder

MIuS

MCd

Multi-Rate

SpeechEncoder

S

MId

Speech In

Mobile System (MS)

MCu

MCu

UplinkModeAdator

MCu

MRu

MRd

MIuS

TC

TransCoder (TC)Base Tranceiver Station (BTS)

Link

Ada

ptat

ion

Link

Ada

ptat

ion

u = Uplinkd = Downlink

DownlinkMode

Adaptor

MCd

51 © NOKIA 2005

AMR Speech Codec Key Features• Frame length: 20 ms with four subframes in all modes, 5 ms

lookahead (except in 12.2 kbit/s mode)

• LPC analysis: Four different LSF-tables; 38 bits (EFR), 27 bits, 26 bits (IS-641) and 23 bits

• Adaptive codebook: GSM EFR and IS-641 based open-loop/closed-loop search in all modes

• Fixed codebook: ACELP codebooks in all modes using 10, 8, 4, 3 and 2 pulses

• Quantisation: GSM EFR (scalar) or IS-641 (joint VQ) based codebook gain quantisation in all modes

• Post-processing: Formant postfilter with tilt compensation and 60 Hz HP filter in all modes (also EFR-type post processing of excitation elements included) . Anti-sparseness processing for 7.95 kbit/s and lower modes.

52 © NOKIA 2005

Comparison of AMR modesEncoder Decoder

Mode Pre-processing(20 ms)

LPC(20 ms)

Open-looppitch analysis

(10 ms)

Adaptivecodebook

(5 ms)

Fixedcodebook

(5ms)

Codebook gainquantisation

(5 ms)

Post-processing

(20 ms)

12.2 (GSM EFR) Two asymmetric windows.Split matrix quantisation (LSFswith 38 bits).No lookahead (5 ms "dummylookahead")

Resolution 1/6 [17 3/6- 94 3/6] and 1 [95-143].2nd and 4th subframe1/6 always, searchedaround previous lag.

10 pulsessearched in 5tracks.Quantised to 35bits.

Scalar quantisation with5 bits (fixed codebook)and 4 bits (adaptivecodebook) bits

10.2 8 pulsessearched in 4tracks.Quantised to 31bits.

Joint VQ with 7 bits

Short-term with tiltcompensation and 60Hz HP.

7.95 4 pulsessearched in 4tracks.Quantised to 17bits.

Scalar quantisation with5 and 4 bits

7.4 (DAMPSEFR)

Resolution 1/3 [19 1/3– 84 2/3] and 1 [85-143].2nd and 4th subframe1/3 always, searchedaround previous lag.

4 pulsessearched in 4tracks.Quantised to 17bits.

6.7 3 pulsessearched in 3tracks.Quantised to 14bits.

Joint VQ with 7 bits

5.9

Resolution 1/3 [19 1/3– 84 2/3] and 1 [85-143].2nd and 4th subframeinteger or 1/3,searched aroundprevious lag.

2 pulsessearched in 2tracks.Quantised to 11bits.

5.15Joint VQ with 6 bits

4.75

HP filter (80 Hz) in allmodes

One asymmetric windowVector quantisation (LSFs with38, 26, 27, 26, 26, 26, 23 and 23bits)5 ms lookahead

EFR-type with 3 lagranges.

Except in the 10.2mode which employsweighting ofcorrelation functionwith preference for lowlag values and valuesneighbouring previouslags.

Resolution 1/3 [19 1/3– 84 2/3] and 1 [85-143].2nd, 3rd and 4th

subframe integer or1/3, searched aroundprevious lag.

2 pulsessearched in 2tracks.Quantised to 9bits.

Joint VQ with 8 bitsover 10 ms.

+ anti-sparsenessprocessing

53 © NOKIA 2005

AMR codec complexity: WMOPS

*) "Complexity verification report of the AMR codec, Version: 2.0" Source:Alcatel, Philips, ST Microelectronics, Texas Instruments, Tdoc ETSI SMG11117/99 (SMG11#10, Sophia Antipolis, 3-7 May 1999)

**) AMR permanent document "Complexity and Delay Assessment", v1.2,ETSI SMG11 1998

Table I: Theoretical Worst Case complexity in wMOPS12.2 10.2 7.95 7.4 6.7 5.9 5.15 4.75 TWC*

AMRTWC**EFR

TWC**FR

TWC**HR

AMR source encoder 13.95 13.56 14.08 12.93 13.93 11.25 9.55 11.53 14.08 - - -

AMR source decoder 2.27 2.29 2.49 2.20 2.45 2.53 2.52 2.52 2.53 - - -

Total AMR source codec 16.22 15.85 16.58 15.12 16.37 13.77 12.07 14.05 16.61 15.21 2.95 18.47

(Calculatated from C-code implemented with ETSI TCH-HS basic operations.)

54 © NOKIA 2005

AMR codec complexity: RAM, ROM

*) "Complexity verification report of the AMR codec, Version: 2.0" Source:Alcatel, Philips, ST Microelectronics, Texas Instruments, Tdoc ETSI SMG11117/99 (SMG11#10, Sophia Antipolis, 3-7 May 1999)

**) AMR permanent document "Complexity and Delay Assessment", v1.2,ETSI SMG11 1998

Table II: RAM

Static

(16 bits words)

Dynamic

(16 bits words)

Total

(16 bits words)

AMR source encoder * 1429 3039 4468

AMR source decoder * 811 946 1757

Total AMR source codec * 2240 3039 5819

Total EFR source codec ** - - 4711

Total FR source codec ** - - 1201

Total HR source codec ** - - 4636

Table III: ROM

ROM(tables)

(16 bitswords)

Program ROM(source code size)

(# of operators)

Total AMR source codec 14343 4830

Total EFR source codec ** 5267 -

Total FR source codec ** 80 -

Total HR source codec ** 7881 -

55 © NOKIA 2005

Performance requirements: clean speech

GSM FR channel≥ 13 dB C/I: quality equal to EFR in error-free

channelAt 4 dB C/I: quality equal to EFR at 10 dB C/I

GSM HR channel≥ 16 dB C/I: quality equal to G.728 in error-free

channel< 16 dB C/I: quality equal to GSM FR codec

C/I FR requirement HR requirementNo Errors EFR No Errors G.728 No Errors

19 dB EFR No Errors G.728 No Errors16 dB EFR No Errors G.728 No Errors13 dB EFR No Errors FR at 13 dB10 dB G.728 No Errors FR at 10 dB7 dB G.728 No Errors FR at 7 dB4 dB EFR at 10 dB FR at 4 dB

(Clean speech = no background noise)

56 © NOKIA 2005

Performane requirements: speech in background noise

GSM FR channel≥ 13 dB C/I: quality equal to EFR in error-free channelAt 4 dB C/I: quality equal to FR at 10 dB C/I

GSM HR channel≥ 16 dB C/I: quality equal to G.729 and FR in error-free

channel< 16 dB C/I: quality equal to GSM FR codec

C/I FR requirement HR requirementNo Errors EFR No Errors EFR No Errors

19 dB EFR No Errors G.729 and FR No Errors16 dB EFR No Errors G.729 and FR No Errors13 dB EFR No Errors FR at 13 dB10 dB G.729 and FR No Errors FR at 10 dB7 dB G.729 and FR No Errors FR at 7 dB4 dB FR at 10 dB FR at 4 dB

57 © NOKIA 2005

AMR PerformancePerformance in GSM FR channel

At low errors (C/I ≥ 13dB): equivalent to EFR no errorsAt 4 dB C/I: → Still equivalent or close to EFR at 10 dB C/I (i.e. about 2 MOS

improvement over EFR), and → In background noise, equivalent or close to FR at 10 dB C/I

Performance in GSM HR channelAt low errors (C/I ≥ 16dB):→ Equivalent to G.728 (“wireline”), and → In background noise, equivalent to G.729/FR

At high errors:→ Equivalent to FR, and → In background noise, FR or somewhat lower

58 © NOKIA 2005

AMR FR for clean speech

1.0

2.0

3.0

4.0

5.0 MOS

RequirementAMR-FREFR

No Errors C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB

59 © NOKIA 2005

AMR FR for clean speech: performance curves for each codec mode

1.0

2.0

3.0

4.0

5.0MOS

EFR12.210.27.957.46.75.95.154.75

No Errors C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB

60 © NOKIA 2005

AMR FR for background noise: speech with 15 dB SNR car noise

1.0

2.0

3.0

4.0

5.0

No Errors C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB

DMOS

RequirementAMR-FREFRFRG.729

61 © NOKIA 2005

AMR HR for background noise: speech with 20 dB SNR office noise

1.0

2.0

3.0

4.0

5.0

No Errors C/I=19 dB C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB

DMOS

RequirementAMR-HREFRFRHR

62 © NOKIA 2005

AMR HR for clean speech

1.0

2.0

3.0

4.0

5.0MOS

RequirementAMR-HREFRFRHR

No Errors C/I=19 dB C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB

63 © NOKIA 2005

AMR Codec Performance Dynamic channel

• Dynamic channel test designed to evaluate AMR performance in realistic radio environment with codec adaptation turned on

Experiment 4a - Combined Test Results

Performance in Full Rate and Dynamic C/I Conditions

1.0

2.0

3.0

4.0

5.0

DEC1 DEC2 DEC3 DEC4 DEC5 DEC1+DTX

Test Condition

MOS

EFRAMR

Experiment 4b - Combined Test Results

Performance in Half Rate and Dynamic C/I Conditions

1.0

2.0

3.0

4.0

5.0

DEC1 DEC2 DEC3 DEC4 DEC5 DEC1+DTX

Test Condition

MOS

G.729/FRAMR

FR channel HR channel

64 © NOKIA 2005

AMR performance summary

• AMR provides substantial performance improvement over the previous GSM codecs in terms of error robustness and channel capacity.

• In the FR channel, • AMR provides error-free EFR quality down to 13 dB C/I,

and about 2 MOS improvement over EFR at 4 dB C/I• AMR extends the wireline quality operating region from

about C/I ≥ 10 dB in EFR to about C/I ≥ 4-7 dB• In typical dynamic error conditions AMR gives up to over 1

MOS improvement in speech quality (with adaptation on) • In the HR channel,

• AMR provides quality comparable to wireline down to 16 dB C/I, and at least FR quality for higher error-rates

66 © NOKIA 2005

AMR conclusions

• The GSM AMR codec standardisation was carried out during 1997-1999 as a competitive selection process involving several phases

• Joint Ericsson/Nokia/Siemens AMR codec selected

• AMR gives substantial improvement over the previous GSM codecs, in particular

in error robustness in the full-rate channelin providing high speech quality in the half-rate channel

• AMR provides high granularity of bit-rates between 12.2 and 4.75 kbit/swith seamless switching between modes

• Good performance and high granularity of bit-rates makes AMR attractive also for other systems and applications than GSM

• AMR has been adopted as the mandatory speech codec for 3rd generation WCDMA system

67 © NOKIA 2005

AMR in WCDMA

• Generic channel coding toolbox- Flexibility in design- Unequal or equal error protection

• Fast power control- No need for fast AMR link adaptation

• Adaptation used for optimising the cell capacity(When too much interference (users) drop all or part of the users to lower (= more robust) rates -> new users can be accomodated)

• Adaptation can also extend the cell radius(When users approach the cell limit, drop to

lower rates)

68 © NOKIA 2005

WCDMA generic channel coding toolbox

• Radio network operator has great flexibility to design the error protection scheme and QoS parameters

• Unequal or equal error protection is available (UEP / EEP)

- QoS, speech quality and capacity issues drive the selection between UEP and EEP

- UEP enables lower transmission power (higher capacity) with same QoS and speech quality to EEP

• Same channel coding algorithm for each AMR mode- Lower AMR modes provide higher capacity

69 © NOKIA 2005

Fast power control• Inner power control loop

- Controls the transmission power based on the channel quality

- Objective is to maintain the QoS (FER) parameters• Outer power control adjusts the AMR mode

- AMR mode is decreased if FER target is not achieved and transmission power is saturating

- Objective is to maintain the target capacity• Fast AMR link adaptation is not necessarily needed

- Power control minimises the effect of channel fading etc.

- Mode requests using out of band signalling

70 © NOKIA 2005

List of 3G AMR Speech Coding Specifications

TS 26.071 AMR speech Codec; General descriptionTS 26.073 AMR speech Codec; C-source codeTS 26.074 AMR speech Codec; Test sequencesTS 26.090 AMR speech Codec; Transcoding FunctionsTS 26.091 AMR speech Codec; Error concealment of lost framesTS 26.092 AMR speech Codec; comfort noise for AMR Speech Traffic ChannelsTS 26.093 AMR speech Codec; Source Controlled Rate operationTS 26.094 AMR Speech Codec; Voice Activity Detector for AMR Speech Traffic

ChannelsTS 26.101 AMR speech Codec; Frame StructureTS 26.102 AMR speech Codec; Interface to Iu and UuTS 26.103 Codec listsTS 26.104 AMR speech Codec; Floating point C-Code

• Available from www.3gpp.org

71 © NOKIA 2005

3G/GSM/ITU AMR Wideband 3G/GSM/ITU AMR Wideband Speech CodecSpeech Codec

72 © NOKIA 2005

Why Wideband Speech ?• Superior speech quality over current

narrow band speech services-> Exceeds the quality of PSTN

phones• Current mobile systems are based on

narrowband speech (100-3600 Hz band)-> Important high frequency

components lost (eg. in 's'-sounds)

• Wideband uses 50 - 7000 Hz band -> Improved naturalness, presence

and intelligibility

• Especially suitable for applications with high quality audio parts

73 © NOKIA 2005

Wideband vs. Narrowband

Time0 0.5 1 1.5 2 2.5

0

1000

2000

3000

4000

5000

6000

7000

8000

Freq

uenc

y

Wide band0 - 8 kHz(no coding)

Narrow band 0 - 4 kHz(no coding)

74 © NOKIA 2005

AMR WB Standardization in 3GPP• Initiated based on feasibility study in ETSI SMG11 (2Q-

1999)• Initially 9 candidates, 5 companies were qualified into the

selection phase: Ericsson, FDNS-consortium (FT, DT, Nortel, Siemens), Motorola, Nokia and Texas Instruments

• Nokia WB codec selected as the best codec in 3GPP TSG S4 meeting in October 2000

• The final specifications were approved in March 2001 (Release 4)

• The AMR-WB codec has also been selected as the ITU WB codec G.722.2

-> First common codec between wireless and fixed networks

75 © NOKIA 2005

AMR WB Codec Components• The selected codec consists of following parts:

- Speech codec- VAD (integrated into the speech codec)- Comfort noise generation (using the same

number of bits (35) as the AMR NB Comfort noise)

- DTX-logic (same as used in the AMR NB)- Link adaptation (AMR NB methods can be

used)- 7 channel codecs for GSM FR channel (for

speech codec modes below 22.8 kbit/s)- Example channel codecs for EDGE FR/HR

channels (for all 9 speech codec modes)

76 © NOKIA 2005

Nokia AMR WB Speech Codec• Collaboration with VoiceAge (Univ. of Sherbrooke, Canada)

• ACELP technology very similar to AMR NB and GSM EFR

• Multirate codec: 9+1 modesSame coding algorithm in each mode

- Very high code and data ROM reuse between modes (much better than in the AMR NB codec)

• Link AdaptationSimilar functionality to AMR NB in GSMThe same Link adaptation can be applied

• AMR NB DTX-handling and DTX-frames are used

77 © NOKIA 2005

AMR-WB Technology• ACELP technology employed in AMR-NB and EFR is utilized• Multirate codec: 9+1 modes

Nine modes for speech (23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85, 6.6

kbit/s)One mode for comfort noise (DTX)

(1.75 kbit/s)

• Innovative two-band approach:50-6400 Hz coded with ACELP algorithm

(12.8 kHz sampling rate)6400-7000 Hz reconstructed based on lower band

parameters

78 © NOKIA 2005

AMR-WB block diagram

Decimation to 12.8 kHz

InputSpeech

LPanalysis

adaptivecodebook

search

algebraiccodebook

search

adaptivecodebook

reconstructionLP

synthesisalgebraiccodebook

reconstruction

+

SPEECHENCODER

SPEECHDECODER

+

16 kHzrandom

excitation

Higher banddecoder

High-passfiltering

Interpolationto 16 kHz

Lower bandparameters

Transmissionchannel

Higher bandgain

(23.85 mode)

ReconstructedSpeech

79 © NOKIA 2005

Nokia AMR WB Speech Codec

• Low rate modes for GSM FR channel:-> 6.60, 8.85, 12.65, 14.25 kbit/s(15.85, 18.25, 19.85 kbit/s, with more than 16kbs multiplexing in Abis)

- lowest rates are only used in bad channel conditions

• High bit rate modes for EDGE and WCDMA channels

-> 15.85, 18.25, 19.85, 23.05, 23.85 kbit/s

80 © NOKIA 2005

AMR WB Speech Quality vs. ITU G722 WB

3

23.85 23.05 19.85 18.25 15.85 14.25 12.65 8.85 6.6

AMR WB Bit rates kbit/s

Spee

ch Q

ualit

y

G722 64 kbit/sG722 56 kbit/sG722 48 kbit/sNokia AMR WB

81 © NOKIA 2005

Subjective Test Results

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

Error-free C/I 13dB C/I 10dB C/I 7dB C/I 4dB Error-free Error-free Error-free

MO

S

AM

R-W

B 1

4.25

Req

uire

men

t

ITU

-T G

722

@ 6

4

ITU

-T G

722

@ 4

8

ITU

-T G

722

@ 5

6

Req

Req

uire

men

t

Req

uire

men

t

Req

uire

men

t

AM

R-W

B 1

4.25

AM

R-W

B 1

2.65

AM

R-W

B 8

.85

AM

R-W

B 6

.6

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

Error-free FER 1.0%RBER 0.1%

Uplink

FER 1.0%RBER 0.1%

Downlink

Error-free Error-free Error-free

MO

S

ITU

-TG

722

@64

ITU

-T G

722

@ 5

6

ITU

-T G

722

@ 4

8

Req

uire

men

t

Req

uire

men

t

Req

uire

men

t

AM

R-W

B 2

3.05

AM

R-W

B 2

3.05

AM

R-W

B 2

3.05

• All the 160 3GPP speech quality requirements were met

• The high granularity of bit rates and mode adaptation enable high quality in erroneous channels

• 3GPP test results (in Japanese)indicate quality equal to G.722@64 kbit/s even in erroneous channels

• 3GPP test results (in English) indicate high quality of lower modes even on errors

83 © NOKIA 2005

Applications for Wide Band Speech• Wide band telephony

AMR NB quality equal to PSTN speech qualityAMR WB improves the quality and provides naturalness

• Conferencing (conversational multimedia)Quality improvement over the current codecs (G.722 at 48 and

56 kbit/s)Bit rate drops to half or less compared to G.722

• StreamingLow complexity, low bit rate solution for browsing type of

applications

84 © NOKIA 2005

AMR WB applications

Kbits/sKbits/s

GSMGSMFS channelFS channel

6.60 6.60 -- 14.25 (19.85) kbit/s14.25 (19.85) kbit/s

6.06.0 12.012.0 16.016.0 24.024.0

UTRANUTRANWCDMA channelWCDMA channel

15.85 15.85 -- 23.85 kbit/s23.85 kbit/s

ITUITU ITUITU

Speech qualitySpeech quality

EDGE EDGE FR/HR channelsFR/HR channels

15.85 15.85 -- 23.85 kbit/s23.85 kbit/s

VoIPVoIP

85 © NOKIA 2005

Introduction of AMR WB into existing and future systems

• 16 kHz sampling frequency in A/D and D/A • Acoustic design of handset

• Other modifications are similar to changes required when adopting a new narrowband codec

• Circuit switched wide band speech serviceFor existing GSM FR, EDGE, WCDMARequires Tandem Free Operation (TFO)

• Dedicated packet switched speech service

• Packet switched conversational multimedia to be standardized

86 © NOKIA 2005

Complexity of Nokia AMR WB Speech Codec

• Algorithmic delay 25 ms = AMR NB• No common parts with AMR NB

AMRWB

AMRNB

WB / NB[%]

WMOPS 35.4 16.75 ~ 200 %Data RAM 6.42 5.28 ~ 120 %Data ROM 9.94 14.57 ~ 70 %ProgramROM (ETSIbasicops)

3771 4851 ~ 80 %

87 © NOKIA 2005

Complexity of Nokia AMR WB channel Codec (GSM FR only)

AMRWB

AMRNB

WB / NB[%]

WMOPS 3.45 5.20 ~ 65 %Data RAM 2.88 2.43 ~ 120 %Data ROM 3.18 4.28 ~ 75 %ProgramROM (ETSIbasicops)

579 1366 ~ 40 %

88 © NOKIA 2005

Nokia AMR WB Speech Codec

Bit rates with importance classification- Class A subjectively most important- Bit errors in class A => frame error concealment

Mode Class A Class B Total(bits/frame)

Bit-rate(kbit/s)

0 54 78 132 6.601 64 113 177 8.852 72 181 253 12.653 72 213 285 14.254 72 245 317 15.855 72 293 365 18.256 72 325 397 19.857 72 389 461 23.058 72 405 477 23.85

DTX 35 0 35 -

89 © NOKIA 2005

Bitrate Allocation Between Speech and Channel Coding

19.8

52.

95

18.2

54.

55

15.8

56.

95

14.2

58.

55

12.6

510

.15

8.85

13.9

5

6.60

16.2

0

0.0

5.0

10.0

15.0

20.0

25.0

Bit R

ate

[kbi

t/s]

6 5 4 3 2 1 0Codec Mode

GMSK FS channel

ChannelcodingSpeechcoding

90 © NOKIA 2005

Conclusions• Nokia’s AMR WB codec standardised by 3GPP and ITU-T• Bit-exact fixed-point C-code for AMR WB codec is already

available for implementation (still subject to small changes)• AMR WB codec offers superior voice quality over the existing

narrow band services (cellular systems, PSTN)• High quality wide band speech service can be offered with bit

rates ranging from 12.65 up to 23.85 kbit/s

- The voice quality improvement is possible even in the existing GSM FR traffic channel with 16 kbit/s submultiplexing in Abis (12.65, 14.25 kbit/s)

- The best voice quality can be offered in EDGE FR/HR and WCDMA channels (15.85 - 23.85 kbit/s)

For specifications and C-code see: www.3gpp.orgwww.3gpp.org

91 © NOKIA 2005

3GPP AMR-WB Specifications26.171

AMR speech codec, wideband; General description

26.173

ANSI-C code for the Adaptive Multi-Rate - Wideband (AMR-W) speech codec

26.174

AMR speech codec, wideband; Test sequences

26.190

Mandatory Speech Codec speech processing functions AMR Wideband speech codec; Transcoding functions

26.191

AMR speech codec, wideband; Error concealment of lost frames

26.192

Mandatory Speech Codec speech processing functions AMR Wideband Speech Codec; Comfort noise aspects

26.193

AMR speech codec, wideband; Source Controlled Rate operation

26.194

Mandatory Speech Codec speech processing functions AMR Wideband speech codec; Voice Activity Detector (VAD)

26.201

AMR speech codec, wideband; Frame structure

26.202

AMR speech codec, wideband; Interface to Iu and Uu

26.204 ANSI-C code for the floating-point Adaptive Multi-Rate - Wideband (AMR-W) speech codec

92 © NOKIA 2005

Demo• AMR and AMR-WB Demo available at http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_17/Docs/S4-

010416.zip