7
Final DRAFT Key themes: T1,T3,T4 1 THE ADAPTIVE MULTI-RATE SPEECH CODER - THE NEW FLEXIBLE WORLD-STANDARD FOR VOICE COMPRESSION Erik Ekudden*, Stefan Bruhn*, and Patrik Sörqvist** {erik.ekudden, stefan.bruhn, patrik.sorqvist}@ericsson.com *Ericsson Research **Datacom Networks & IP Services Ericsson Radio Systems AB Ericsson Telecom AB SE-164 80 Stockholm, Sweden SE-126 25 Stockholm, Sweden ABSTRACT In this paper, we describe the recently standardised Adaptive Multi-Rate (AMR) speech coder, its implementation and performance in cellular systems, such as GSM and 3G UMTS/WCDMA, fixed circuit switched networks and IP-based networks. The coder is a multi-rate ACELP coder with 8 modes operating at bit rates from 12.2 kbit/s down to 4.75 kbit/s. In addition, an efficient high quality source controlled rate functionality which lowers the average source rate by voice activity detection is specified. The wide range of bit rates and the high speech quality makes it suitable not only for cellular applications where the rate can be controlled based on e.g. radio channel quality or cell load, but also to fixed network voice trunking applications and high quality voice over IP (VoIP) applications. 1. INTRODUCTION The speech service, and extensions to realtime multimedia, are important applications for network operators. Thus, there is a continued push for increased quality and capacity as technology advances and new transmission techniques emerge[1]. For the GSM system, ETSI conducted a feasibility study for next generation speech services in 1996. The goal was to provide wireline quality in the halfrate traffic channel and highly error robust operation in the fullrate traffic channel. The result of the study was an Adaptive Multi-Rate concept, where the speech coder bit rate was continuously adapted to radio channel conditions - no fixed rate solution would meet all the requirements. After a series of subjective tests, the AMR coder was selected in October 1998. Subsequently, AMR was adopted by 3GPP as the mandatory speech coder for UMTS/IMT-2000. Traditionally, voice compression in the circuit switched networks have been used on a per link basis. When low bandwidth links are connected, transcoding is applied for each link. These additional transcoding stages degrade quality significantly for low rate speech coders. In addition to the higher coding distortion, the transmission delay is usually increased whereby the overall quality is further reduced. Moreover, transmission capacity in the network is wasted. With the all-digital networks, end-to-end coding from one phone (end-point) to another, i.e. tandem-free operation is now a realistic goal for increased quality and transmission efficiency. The outline of the paper is as follows. Section 2 discusses transmission network aspects, and Section 3 the basic

AMR Ericsson

Embed Size (px)

Citation preview

Page 1: AMR Ericsson

Final DRAFT Key themes: T1,T3,T4

1

THE ADAPTIVE MULTI-RATE SPEECH CODER - THE NEW FLEXIBLE WORLD-STANDARD FOR VOICE COMPRESSION

Erik Ekudden*, Stefan Bruhn*, and Patrik Sörqvist**{erik.ekudden, stefan.bruhn, patrik.sorqvist}@ericsson.com

*Ericsson Research **Datacom Networks & IP ServicesEricsson Radio Systems AB Ericsson Telecom ABSE-164 80 Stockholm, Sweden SE-126 25 Stockholm, Sweden

ABSTRACTIn this paper, we describe the recently standardised AdaptiveMulti-Rate (AMR) speech coder, its implementation andperformance in cellular systems, such as GSM and 3G UMTS/WCDMA,fixed circuit switched networks and IP-based networks. Thecoder is a multi-rate ACELP coder with 8 modes operating at bitrates from 12.2 kbit/s down to 4.75 kbit/s. In addition, anefficient high quality source controlled rate functionalitywhich lowers the average source rate by voice activitydetection is specified. The wide range of bit rates and thehigh speech quality makes it suitable not only for cellularapplications where the rate can be controlled based on e.g.radio channel quality or cell load, but also to fixed networkvoice trunking applications and high quality voice over IP(VoIP) applications.

1. INTRODUCTIONThe speech service, and extensions to realtime multimedia, areimportant applications for network operators. Thus, there is acontinued push for increased quality and capacity as technologyadvances and new transmission techniques emerge[1].For the GSM system, ETSI conducted a feasibility study for nextgeneration speech services in 1996. The goal was to providewireline quality in the halfrate traffic channel and highlyerror robust operation in the fullrate traffic channel. Theresult of the study was an Adaptive Multi-Rate concept, wherethe speech coder bit rate was continuously adapted to radiochannel conditions - no fixed rate solution would meet all therequirements. After a series of subjective tests, the AMR coderwas selected in October 1998. Subsequently, AMR was adopted by3GPP as the mandatory speech coder for UMTS/IMT-2000.Traditionally, voice compression in the circuit switchednetworks have been used on a per link basis. When low bandwidthlinks are connected, transcoding is applied for each link.These additional transcoding stages degrade qualitysignificantly for low rate speech coders. In addition to thehigher coding distortion, the transmission delay is usuallyincreased whereby the overall quality is further reduced.Moreover, transmission capacity in the network is wasted. Withthe all-digital networks, end-to-end coding from one phone(end-point) to another, i.e. tandem-free operation is now arealistic goal for increased quality and transmissionefficiency. The outline of the paper is as follows. Section 2discusses transmission network aspects, and Section 3 the basic

Page 2: AMR Ericsson

Final DRAFT Key themes: T1,T3,T4

2

building blocks of the AMR speech coder. Section 4, exampleapplications. Section 5 provides performance data, and Section6 the conclusions.

2. TRANSMISSION NETWORKSTo fully optimise the connections in terms of quality andcapacity, the end-to-end transmission should be considered. InFig. 1. example connections are given, including calls frommobile, via IP transport to either PSTN/ISDN, a LAN PC phone ora second mobile network, showing that several interconnectednetworks are likely also for short distance (local) calls.

Figure 1. Network scenario involving GWs for voice transcoding

In Fig. 1, the need for media gateways (GWs), with theadditional delay, complexity, cost and loss of quality in thetranscoding, would be significantly reduced if the intermediatetransport carried the compressed voice as far as possible inthe transport networks.In wireless environments, speech coders must withstand higherror rates caused by interference, fading etc. to obtain highnetwork capacity and still provide the wireline speech qualityexpected by customers. As a result of this, speech codersdesigned for cellular networks are well suited also for e.g.fixed networks and IP networks with high jitter and packet lossrates.With the demand for higher capacity and lower cost fortransmission, networks are more often operated at a point withrelatively poor network quality. Only robust speech coders suchas the AMR coder can provide satisfactory performance undersuch network conditions.

3. AMR SPEECH CODINGAMR speech coding is specified in GSM 06.90 together with therelated voice activity detector (VAD), source controlled ratesystem (SCR), and error concealment (ECU) of lost frames.The AMR coder can operate at 8 different source rates forspeech, given in Table 1.

Table 1

Mode Rate (bits/s) RemarksAMR-12.2 12200 GSM EFR 06.60AMR-10.2 10200

PSTN/ISDNIP NetworkGW

GW

GW

GW

Page 3: AMR Ericsson

Final DRAFT Key themes: T1,T3,T4

3

AMR-7.95 7950AMR-7.40 7400 TDMA EFR IS-641AMR-6.70 6700 PDC-EFR RCR 27HAMR-5.90 5900AMR-5.15 5150AMR-4.75 4750AMR-SID 220*Note: * Approximate average rate during non-speech

3.1 Speech Encoder and DecoderThe AMR coder is a scaleable Multi-Rate Algebraic CELP (MR-ACELP) coder[2], capable of seamless switching between any ofthe 8 bit rates every frame. The frame size is 20 ms and thelookahead 5 ms, giving a total algorithmic delay of 25 ms.Interworking with existing high quality codecs is ensured sincethree of the modes are existing state-of-the-art coders. The12.2 kbit/s mode is equivalent to the GSM Enhanced Fullrate(EFR) coder, the 7.40 kbit/s mode is the EFR coder for the IS-136 system, and the 6.70 kbit/s mode is the EFR coder for theJapanese PDC system.The relatively low complexity of the algorithm, typically lessthan 15 MIPS in a DSP for encoder and decoder, and less than10% of a Pentium PC, allows cost effective implementations.

3.2 Error ConcealmentThe ECU algorithm is integrated in the decoder and uses astate-machine structure with extrapolation and graduallyattenuated output when consecutive frames are erased. A featureof the algorithm is the source signal dependent actions, whichprovide enhancements for speech in background noise.The ECU is designed to handle both detected frame losses andnon-detected bit errors in the least significant bits of theframe. Hence, the coder provides high performance for circuitand packet switched connections with FER and/or BER.

3.3 Source Controlled RateThe coder includes an SCR scheme to lower the average sourcerate by detecting speech pauses and encoding non-speechsegments with a lower rate. During non-speech, every 8th frameis encoded giving an average rate of approximately 220 bits/s.For typical conversations the activity factor ranges from 35%to 80%, with an average somewhat below 50% depending on thesituation. The AMR source rate is thus reduced to on average45%-55% of the maximum rate used with maintained speechquality. The rate may therefore be as low as 2.2 - 2.8 kbit/s.The VAD is designed to provide low activity factors withoutclipping the speech, and still detect complex signals such asmusic-on-hold to avoid disturbing switching effects[5].

3.4 External Rate ControlThe rate can be changed on a frame basis. This feature can beused to control the used rate externally based on staticconfiguration parameters or dynamically during a call.The rate is controlled via an inband control channel runningbetween the end-points. The inband channel has two functions.

Page 4: AMR Ericsson

Final DRAFT Key themes: T1,T3,T4

4

In the forward direction, it is used to indicate the presentlyused mode, while in the backward direction it is used to signalrequests for mode changes in the opposite direction. Requestscan be issued by end-points as well as by network entities.

4. APPLICATIONSExamples using AMR in GSM, UMTS/WCDMA and IP networks are givenbelow. Other AMR applications are expected in the future.

4.1 GSM SYSTEMSThe AMR circuit switched speech service is defined both forhalfrate (HR) and fullrate (FR) traffic channels. In the FRchannel, a sub-set of up to 4 of the 8 modes can be used at anytime. In HR, a sub-set of 4 of the 6 lowest modes may be used.During a call, the channel quality varies significantly, due tofading, interference variations and path loss variations. Thesevariations can not be compensated by the slow power control.However, the fast (less than 150 ms latency) AMR modeadaptation, is able to track many of these changes. Hence, highsource rate is used when the channel is good, and lower sourcerate (and higher channel coding rate) is used when the qualityof the channel gets worse [3,4], see Fig. 2.

Figure 2 The Multi-Rate trade-off using 3 modes

The configurability, in terms of used mode sets and adaptationlogic, of the AMR speech service provides full flexibility fornetwork operators to tailor the service to the networkcharacteristics or other operator specific needs, such ascapacity, highest possible quality etc.

4.1.1 Channel Coding and Link AdaptationUnequally punctured convolutional codes are used, providingoptimised error protection for each mode. The inband channel isprotected with a block code, and has a lower error rate thanthe speech FER.The link adaptation is based on receiver link qualityestimates, and allows the receiver to request the most suitablemode by a return link request[4]. The link quality estimationscheme is open, but suitable measures are C/I estimates,channel BER estimates or FER estimates. An example solutionusing C/I measurements is standardised. The adaptation is basedon comparing the measurements to thresholds. Hysteresis isapplied to avoid too frequent mode switching.

Speech quality

Channel quality

Page 5: AMR Ericsson

Final DRAFT Key themes: T1,T3,T4

5

4.2 UMTS / WCDMA SYSTEMSThe AMR speech coder with its SCR system is identical for UMTSand GSM. The compressed voice stream is thus fullyinteroperable between the two systems. The differences for theservice are that for UMTS, a more generic radio access beareris used, and that the inband channel has a differentrealisation. Additional flexibility is given since all 8 ratesare usable at any time.

4.2.1 Maximum Rate ControlFor wideband wireless systems with fast power control, such asWCDMA, the radio channel quality variations are generallysignificantly lower than for e.g. GSM. The rate controltherefore has a different purpose. The main usage is as arelatively slow maximum rate control function to i) increasesystem capacity under high network load conditions; e.g. atcertain time of day, in a geographical area, or dynamicallybased on measured network load, and ii) increase quality forMSs with power limitations at the cell border.In tandem-free connections with GSM, the system still acceptrapid mode changes, initiated by the GSM system, between modesbelow the maximum allowed rate.

4.3. IP-BASED NETWORKSIP can be used over various networks to gain higher flexibilityand save cost, and the concept is applicable to both wired andwireless IP networks.Voice may be transported over RTP/UDP/IP as a single mediastream in a dedicated configuration. However, it is common touse existing general multimedia protocols, such as H.323 andSIP. For this purpose, and also to support AMR as part ofH.324/H.324M, AMR capability is currently being standardised inthe control protocol H.245.

4.3.1 Rate ControlThe quality of IP networks vary significantly, and applicationsshould provide reasonable quality also under packet lossconditions. The robustness of the AMR coder to lost framesmakes it well suited. The multiple rates of AMR can be used byoperators to increase capacity in high traffic networks, ordifferentiate quality between user groups (business andprivate), e.g. by coupling pricing and user rate. These ratechanges are expected to be relatively slow. More rapid ratechanges may be beneficial under severe congestion conditions,when the rate can be lowered to reduce the required bandwidth.This may typically be initiated by an end-point which detectshigh packet loss or increased delay and jitter.

5. PERFORMANCEThe performance has been assessed for clean speech, speech withbackground noise, channel impairments, tandem connections,

Page 6: AMR Ericsson

Final DRAFT Key themes: T1,T3,T4

6

input level variations etc. The conclusions from the extensivetesting is that the higher modes provide quality equivalent to,or exceeding that of, wireline G.726 at 32 kbit/s. There is agraceful reduction in quality as the rate is lowered, and ate.g. 5.9 kbit/s, the quality is still at the level of G.723.1operating at 6.3 kbit/s. The lowest modes provide competitivequality at the level of G.723.1 at 5.3 kbit/s.

Results from MOS tests on GSM FR channels are given in Fig. 3.The increase in C/I tolerance is approximately 6 dB for thelower rates. The increase in quality over the present EFR fordynamically varying channels is often exceeding 1 MOS (5-pointscale) which is highly significant. In terms of FER,undistorted speech is obtained up to 1% FER, and acceptablequality is obtained for FER up to 10%.

Figure 3 Left: Performance for C/I from 19 dB (0.5% channel BER) to 1 dB(30% BER) for three modes. Right: Performance for 5 dynamic channel (DEC)conditions.

6. CONCLUSIONSThe main features of the AMR speech coder have been described.The coder provides a flexible toolbox for operators to designhigh quality, high efficiency voice services for 2G/3G mobilesystems as well as fixed and IP-based networks.The use of AMR in 2G/3G mobile networks, as well as in fixedand IP-based networks with full interoperability enhancesquality, and at the same time reduces transmission costssignificantly. The wide range of bit rates spanned by the 8rates, from 12.2 kbit/s down to an average rate below 2.5kbit/s for the lowest rate using SCR, and the robustness toerrors allows operators to optimise both QoS and capacity formobile, fixed and VoIP applications.The AMR coder has been extensively characterised ininternationally coordinated tests. The quality for the highermodes exceed that of wireline G.726 ADPCM at 32 kbit/s, with agraceful reduction of quality for the lower modes. The lowestmodes provide competitive communication quality.

7. REFERENCES[1] T.B. Minde, S. Bruhn, E. Ekudden, and H. Hermansson,“Requirements on Speech Coders Imposed by Speech ServiceSolutions in Cellular Systems”, in Proc. IEEE Workshop onSpeech Coding, Pocono, PA, pp.7-10, 1997.

1,00

1,50

2,00

2,50

3,00

3,50

4,00

DEC1 DEC2 DEC3 DEC4 DEC5

MOS

AMREFR

1.5

2.0

2.5

3.0

3.5

4.0

4.5

No Errors 16 dB 13 dB 10 dB 7 dB 4 dB 1 dB C/I

MOS

12.2

6.7

4.75

Page 7: AMR Ericsson

Final DRAFT Key themes: T1,T3,T4

7

[2] E. Ekudden, R. Hagen, I. Johansson, and J. Svedberg, “TheAMR Speech Coder”, in Proc. IEEE Workshop on speech coding,Poorvo, Finland, pp. 117-119, 1999.[3] O. Corbun, M. Almgren, and K. Svanbro, “Capacity and SpeechQuality Aspects Using Adaptive Multi-Rate (AMR)”, in Proc.PIMRC-98, Boston, MA, 1998.[4] S. Bruhn, P. Blöcher, K. Hellwig, and J. Sjöberg, “Conceptsand Solutions for Link Adaptation and Inband Signaling for theGSM AMR Speech Coding Standard”, in Proc. IEEE VTC-99, Houston,TX, pp. 2451-2455, 1999.[5] A. Vähätalo, I. Johansson, “Voice Activity Detection forGSM Adaptive Multi-Rate”, in Proc. IEEE Workshop on speechcoding, Poorvo, Finland, pp. 55-57, 1999.